Ciencias QuímicasInglésHugo

Depth-First

Depth-First
Recent content on Depth-First
Página de inicio
language
Ciencias QuímicasInglés
Publicado

Parsers are crucial for many data processing tasks. Contrary to what appearances might imply, writing a parser from scratch is not difficult given the right starting point. This article presents a flexible system for writing custom parsers for a wide range of languages. It assumes some experience with Rust, but no experience with language theory. More experienced readers might want to skip directly to the Lyn crate.

Ciencias QuímicasInglés
Publicado

The one guarantee in science is that the facts will eventually change. When they do, scientific information tools need to adapt. Information exchange formats occupy an especially delicate position in this regard. Ideally, changes at this foundational level of tooling will always be well-communicated and backward-compatible. But this isn’t always true in practice.

Ciencias QuímicasInglés
Publicado

The molfile format is an integral part of chemistry today. First publicly described in 1991, molfiles are now found in most places that chemistry and computers intersect. Molfiles tend to be encoded with the “V2000” specification, but this isn’t the only option. In 1995 an update called “V3000” was introduced. Since then V3000 has been revised and extended.

Ciencias QuímicasInglés
Publicado

In JavaScript, all type errors are reported at run time. The main benefit is a syntax that’s easy to learn and use, but it comes at a cost. As a project grows and matures, the convenience of dynamic typing can give way to frustration with code quality and maintainability.

Ciencias QuímicasInglés
Publicado

An oft-cited element of JavaScript’s success is dynamic typing. The language’s minimal type system means that type errors are reported at runtime. This flexibility can be liberating during early work lacking a clear design. But like any technology, there are tradeoffs. A project that grows in size, complexity, or developer count can expose the true costs of dynamic typing.

Ciencias QuímicasInglés
Publicado

Many kinds of molecular encoding are used in cheminformatics, but all are based on the concept of a molecular graph. A molecular graph is a graph whose nodes and edges are augmented to capture information relevant to molecular structure. Generally speaking, atoms map to nodes and bonds map to edges. Comparing two graphs for equivalence turns out to be challenging compared to other kinds of comparisons.

Ciencias QuímicasInglés
Publicado

Since its first public description in 1988, SMILES has become one of chemistry’s most widely-used information exchange formats. All major toolkits support it. A lot of user-facing software reads and writes it. Many public databases include it. More recently it’s become a popular input/output format for machine learning. But despite this widespread adoption, SMILES is remarkably under-specified.

Ciencias QuímicasInglés
Publicado

The previous article in this series introduced a simple method for writing PostgreSQL extensions in Rust with the pgx crate. The example in that article, although a complete extension, only exposed a function. A more interesting extension would define a custom data type that worked correctly with indexes. This article describes such an extension. Compiling and Running the Extension The extension, cas_number is available on GitHub.

Ciencias QuímicasInglés
Publicado

PostgreSQL (aka “Postgres”) is a widely-used relational database system. One of the many features making it so popular is extensibility. Postgres ships with several extensions, and others are available from third parties. Underpinning this large body of extended functionality is a system for building and deploying extensions.

Ciencias QuímicasInglés
Publicado

The RDKit Postgres extension (“the extension”) enables fast chemical substructure queries in plain SQL. Convenience is the main selling point of this utility, which allows low-level data processing to stay within the database layer of an application. While evaluating RDKit for use in a revamped commercial product, I uncovered an easily-detected, show-stopping performance issue that to my knowledge has never been documented before.

Ciencias QuímicasInglés
Publicado

Chemical structure databases have the odd distinction of being both ubiquitous and often non-trivial to implement. The defining characteristic of these systems is that records can be fetched based on exact- or substructure queries. Structure-searchable databases pops up in all kinds of contexts ranging from individual research projects to on-off data processing tasks to drug discovery efforts to analytical labs to small chemical businesses.