Chemical SciencesHugo

Depth-First

Depth-First
Recent content on Depth-First
Home Page
language
Chemical Sciences
Published

Molecular identifiers, also known as “chemical names,” underpin modern chemistry. A recent paper introduced TUCAN, a new molecular identifier. As noted in my overview, TUCAN could one day play a similar role in molecular identification as canonical SMILES and IUPAC nomenclature. An important point along the way is canonicalization, or the selection of one representation out of many possible for a given molecule.

Chemical Sciences
Published

Note: This article requires revisions on one or more important points. In particular, the algorithm does not in its current form enumerate every candidate indexing as claimed. See the revised article for details. Molecular identifiers (or “chemical names”) are everywhere in chemistry. A recent post discussed a new kind of molecular identifier called TUCAN.

Chemical Sciences
Published

It’s hard to overstate the importance of molecular identifiers to chemistry. Also called “chemical names,” molecular identifiers enable individuals, laboratories, organizations, and countries to efficiently exchange information about molecules. Given this foundational role, you might expect to find a well-organized and lavishly-funded effort to develop and improve molecular identifiers. This is, of course, not the case.

Chemical Sciences
Published

There’s a rather large category of important chemistry software that doesn’t get a lot of attention in journal articles. It inhabits the twilight zone between chemist and programmer. I don’t mean “chemist” in the sense of cheminformatician or computational chemist. I mean “chemist” in the sense of a trained experimentalist who gathers and uses experimental data.

Chemical Sciences
Published

Python’s many advantages come at a cost: execution speed on the “traditional” runtime lags that of other languages by a considerable margin. Python’s solution is to expose the runtime to more efficient extensions written C and C++. As noted previously here, Python extensions can also be written in pure Rust through PyO3. But some projects call for greater control.

Chemical Sciences
Published

Broad access to chemical information remains a largely unrealized goal. Back in 2006 the second post on this blog noted the recent introduction of PubChem and ZINC as game-changing developments. But despite progress in the creation of open structure collections, repositories linking molecular structure with properties are much less well-developed. An important frontier in this area is open reaction data.

Chemical Sciences
Published

Stereoisomerism plays a crucial role in the science and technology of chemistry, but this is a relatively new development. Analytical and synthetic techniques have not yet advanced to the stage that allows configuration to be assigned with the same ease as other aspects of molecular structure. Depending on the context, it’s still not unusual for configuration to remain partially or completely unknown indefinitely.

Chemical Sciences
Published

Graphs are central to many areas of programming, so it’s not surprising to find many general-purpose graph libraries. But these ready-made solutions sometimes lack the focus needed to solve specific problem well. Having hit this problem several times, I recently proposed a solution in the form of a minimal graph API with a Rust implementation.

Chemical Sciences
Published

A fingerprint is a molecular representation that omits certain kinds of structural information with the goal of increasing computational speed. The success of this approach is evidenced by numerous modern applications ranging from structure search to property prediction. A good fingerprint trades just enough structural information to achieve the desired computational goal, so flexibility matters.

Chemical Sciences
Published

A previous article offered some reasons to adopt the V3000 molfile format. Although there are several, the one that gets the most attention is “enhanced stereochemistry” support. It should come as no surprise that the cost of this enhancement is increased complexity. Fortunately, V3000’s stereochemistry model extends the one used in V2000. Unfortunately, the V2000 stereochemistry model is not exactly simple.

Chemical Sciences
Published

Parsers are crucial for many data processing tasks. Contrary to what appearances might imply, writing a parser from scratch is not difficult given the right starting point. This article presents a flexible system for writing custom parsers for a wide range of languages. It assumes some experience with Rust, but no experience with language theory. More experienced readers might want to skip directly to the Lyn crate.