Chemical SciencesJekyll

Jeremy Monat, PhD

Jeremy Monat, PhD
Scientific software developer
Home PageAtom Feed
language
Chemical Sciences
Published

Exploring other cheminformatics toolkits besides the RDKit, I wanted to try EPAM Indigo Toolkit. The Indigo Toolkit is free and open-source with Apache License 2.0, so it can be used in proprietary software. I was unable to find simple examples of drawing molecules in a Python Jupyter Notebook, so here’s how to do that. This post also demonstrates how to save molecular images to a file.

Chemical Sciences
Published

As the YouTubers would say, “A lot of you have been asking me about how to write cheminformatics blog posts.” Well, not a lot, but at least a couple! I realized that there’s a pattern to how I write cheminformatics blog posts (16 so far), so I’m sharing that here. My blog posts are intended to be tutorials that explore a topic, usually with existing tools. I figure out how to accomplish a cheminformatics objective, then share that.

Chemical Sciences
Published

Molecules have a color if their electronic energy levels are close enough to absorb visible rather than ultraviolet light. For organic molecules, that’s often because of an extensive chain of conjugated bonds. Can we use cheminformatics to find evidence that increasing conjugated bond chain length decreases absorption wavelength, which makes a molecule colored?

Chemical Sciences
Published

Tautomers are chemical structures that readily interconvert under given conditions. For example, an amino acid has a neutral form, and a zwitterionic form with separated positive and negative charges. Cheminformatics packages have algorithms to enumerate tautomers based on rules. Which algorithms produce the most tautomers? And how successful is InChI at representing with a single representation all tautomers of a given structure?

Chemical Sciences
Published

This blog post presents a more computationally-efficient way to determine the abundance of the molecular isotopes of a molecule. In part 1, we created a molecule for each possible placement of each isotope in a molecule. While that worked, it was computationally expensive because it required creating each permutation. In this blog post, we’ll create each combination only once and calculate its abundance using the binomial distribution.

Chemical Sciences
Published

I contributed MolsMatrixToGridImage to the RDKit 2023.09.1 release because I found myself writing similar code over and over to draw row-and-column grids of molecules. For projects where each row represented something, such as a molecule and the fragments off a common core, my mental model corresponded to a two-dimensional (nested) data structure, whereas the pre-existing function MolsToGridImage supported only linear (flat) data structures.

Chemical Sciences
Published

Here’s how to display formatted molecular formulas in tables and graphs. In addition to formatted molecular formulas, these techniques should work for any Markdown or LaTeX. In the last blog post, we generated molecular formulas from SMILES strings or RDKit molecules. Once we have those molecular formulas, formatted as Markdown or LaTeX, we might want to display them in tables or graphs.

Chemical Sciences
Published

In cheminformatics, the typical way of representing a molecule is with a SMILES string such as CCO for ethanol. A SMILES string can be converted into a molecular graph, which can be used to determine molecular structure and related properties. However, there are still cases where the molecular formula such as C 2 H 6 O is useful.

Chemical Sciences
Published

In a previous post, I revisited Wiener’s paper predicting alkanes’ boiling points using modern cheminformatics tools. This follow-up post refits the data with modern mathematical tools to check how well the literature parameters, and the current parameters optimized here, fit the data. Wiener and Egloff’s works are impressive for using cheminformatics parameters that model physical data with simple relationships.

Chemical Sciences
Published

Harry Wiener was “a pioneer in cheminformatics and chemical graph theory”. In his 1947 Journal of the American Chemical Society article “Structural Determination of Paraffin Boiling Points”, he introduced the path number $\omega$ “as the sum of the distances between any two carbon atoms in the molecule, in terms of carbon-carbon bonds”, which is now known as the Wiener index.