Rogue Scholar

Ciencias QuímicasInglés

A Beginner's Guide to Parsing in Rust

Publicado 16 de diciembre de 2021

Parsers are crucial for many data processing tasks. Contrary to what appearances might imply, writing a parser from scratch is not difficult given the right starting point. This article presents a flexible system for writing custom parsers for a wide range of languages. It assumes some experience with Rust, but no experience with language theory. More experienced readers might want to skip directly to the Lyn crate.

Ciencias QuímicasInglés

MDL Valence-Mageddon

https://doi.org/10.59350/j8cym-s2b52

Publicado 1 de diciembre de 2021

Autor Richard L. Apodaca

The one guarantee in science is that the facts will eventually change. When they do, scientific information tools need to adapt. Information exchange formats occupy an especially delicate position in this regard. Ideally, changes at this foundational level of tooling will always be well-communicated and backward-compatible. But this isn’t always true in practice.

Ciencias QuímicasInglés

Ten Reasons to Adopt the V3000 Molfile Format

https://doi.org/10.59350/bayqd-v0n37

Publicado 17 de noviembre de 2021

Autor Richard L. Apodaca

The molfile format is an integral part of chemistry today. First publicly described in 1991, molfiles are now found in most places that chemistry and computers intersect. Molfiles tend to be encoded with the “V2000” specification, but this isn’t the only option. In 1995 an update called “V3000” was introduced. Since then V3000 has been revised and extended.

Ciencias QuímicasInglés

Typed JavaScript

https://doi.org/10.59350/tb6v9-h7q96

Publicado 3 de noviembre de 2021

Autor Richard L. Apodaca

In JavaScript, all type errors are reported at run time. The main benefit is a syntax that’s easy to learn and use, but it comes at a cost. As a project grows and matures, the convenience of dynamic typing can give way to frustration with code quality and maintainability.

Ciencias QuímicasInglés

Types without TypeScript

https://doi.org/10.59350/ayc13-bd691

Publicado 21 de octubre de 2021

Autor Richard L. Apodaca

An oft-cited element of JavaScript’s success is dynamic typing. The language’s minimal type system means that type errors are reported at runtime. This flexibility can be liberating during early work lacking a clear design. But like any technology, there are tradeoffs. A project that grows in size, complexity, or developer count can expose the true costs of dynamic typing.

Ciencias QuímicasInglés

Molecular Graph Canonicalization

https://doi.org/10.59350/eddb9-fan42

Publicado 6 de octubre de 2021

Autor Richard L. Apodaca

Many kinds of molecular encoding are used in cheminformatics, but all are based on the concept of a molecular graph. A molecular graph is a graph whose nodes and edges are augmented to capture information relevant to molecular structure. Generally speaking, atoms map to nodes and bonds map to edges. Comparing two graphs for equivalence turns out to be challenging compared to other kinds of comparisons.

Ciencias QuímicasInglés

Beyond SMILES

https://doi.org/10.59350/vzt4z-4kk91

Publicado 22 de septiembre de 2021

Autor Richard L. Apodaca

Since its first public description in 1988, SMILES has become one of chemistry’s most widely-used information exchange formats. All major toolkits support it. A lot of user-facing software reads and writes it. Many public databases include it. More recently it’s become a popular input/output format for machine learning. But despite this widespread adoption, SMILES is remarkably under-specified.

Ciencias QuímicasInglés

A Rust PostgreSQL Extension for CAS Numbers

https://doi.org/10.59350/62m13-4wc34

Publicado 7 de septiembre de 2021

Autor Richard L. Apodaca

The previous article in this series introduced a simple method for writing PostgreSQL extensions in Rust with the pgx crate. The example in that article, although a complete extension, only exposed a function. A more interesting extension would define a custom data type that worked correctly with indexes. This article describes such an extension. Compiling and Running the Extension The extension, cas_number is available on GitHub.

Ciencias QuímicasInglés

Postgres Extensions in Rust

https://doi.org/10.59350/pgrxe-ye566

Publicado 25 de agosto de 2021

Autor Richard L. Apodaca

PostgreSQL (aka “Postgres”) is a widely-used relational database system. One of the many features making it so popular is extensibility. Postgres ships with several extensions, and others are available from third parties. Underpinning this large body of extended functionality is a system for building and deploying extensions.

Ciencias QuímicasInglés

The RDKit/Postgres Ordered Substructure Search Problem

https://doi.org/10.59350/2sg8a-s3y15

Publicado 11 de agosto de 2021

Autor Richard L. Apodaca

The RDKit Postgres extension (“the extension”) enables fast chemical substructure queries in plain SQL. Convenience is the main selling point of this utility, which allows low-level data processing to stay within the database layer of an application. While evaluating RDKit for use in a revamped commercial product, I uncovered an easily-detected, show-stopping performance issue that to my knowledge has never been documented before.

Ciencias QuímicasInglés

Running the RDKit Postgres Cartridge with Docker

https://doi.org/10.59350/h84w7-9vz85

Publicado 28 de julio de 2021

Autor Richard L. Apodaca

Chemical structure databases have the odd distinction of being both ubiquitous and often non-trivial to implement. The defining characteristic of these systems is that records can be fetched based on exact- or substructure queries. Structure-searchable databases pops up in all kinds of contexts ranging from individual research projects to on-off data processing tasks to drug discovery efforts to analytical labs to small chemical businesses.

Depth-First

A Beginner's Guide to Parsing in Rust

MDL Valence-Mageddon

Ten Reasons to Adopt the V3000 Molfile Format

Typed JavaScript

Types without TypeScript

Molecular Graph Canonicalization

Beyond SMILES

A Rust PostgreSQL Extension for CAS Numbers

Postgres Extensions in Rust

The RDKit/Postgres Ordered Substructure Search Problem

Running the RDKit Postgres Cartridge with Docker