CAS Registry Numbers simplify the thorny problem of referring to chemical substances. These short numerical sequences are arguably the most widely-used form of molecular identifier, appearing on reagent bottles, in publications, in patents and patent applications, and MSDS sheets. During my time as a synthetic organic chemist, I would sometimes run into the problem of finding the structure of a molecule represented by a CAS number.
The only solution to this problem I’ve found is to set the CSS overflow property to “scroll”: InChI=1/C50H70O14/c1-25(24-51)14-28-17-37(52)50(8)41(54-28)19-33-34(61-50)18-32-29(55-33)10-9-12-46(4)42(58-32)23-49(7)40(62-46)21-39-47(5,64-49)13-11-30-44(60-39)26(2)15-31-36(56-30)22-48(6)38(57-31)20-35-45(63-48)27(3)16-43(53)59-35/h9-10,16,24,26,28-42,44-45,52H,1,11-15,17-23H2,2-8H3/b10-9-
One of my favorite features of Ruby is the Interactive Ruby (irb) shell. For those who haven’t used it, irb lets you interactively create Ruby programs. Are you not exactly sure how to use that new library? Do you want to be able to “play” with an object to see how it works? Then irb is the perfect tool. One of the new features contained in Open Babel 2.1 is a Ruby interface.
Why do scientists publish? There is a myth - common among non-scientists - that scientists publish their work mainly out of an altruistic desire to make the world a better place. Sharing their work helps science move faster and that’s a Good Thing. This is, of course, an important factor for most scientists. At least it’s hard to imagine any scientist seriously thinking that they want their work to be used to make the world worse off.
The InChI team has announced a proposal for a standardized InChI hashing mechanism. This would create a free, fixed-length, alphanumeric molecular identifier. This is an excellent proposal. One of the biggest problems in working with InChIs (and other line notations such as SMILES) is that even medium-sized molecules produce very long identifiers. Another problem is the use of characters that must be escaped in URLs.
One of Depth-First’s more popular articles is a summary of free databases titled Thirty-Two Free Chemistry Databases . Clearly there is a need to link the producers of free chemical databases (developers) with the potential users of these services (chemists). Chemistry is slowly emerging from a decades-long period of over-reliance on a single supplier of information.
An older article on the InChI canonicalization algorithm has been restored and updated. The revised article contains a direct link to the InChI Technical Manual pdf file which I uploaded to SourceForge for convenience.
The previous article in this series discussed the requirements of Firefly, a new 2D chemical structure editor for the Web. Another article discussed Firefly’s design constraints and the importance of embracing them. Why so much focus on a structure editor? Simply put, the structure editor is the key link between chemistry and cheminformatics. Without the structure editor, there would be no audience for cheminformatics software.
Greg Beaver has written an interesting set of ten golden rules for running open source projects. Many of these rules apply to running (or working on) anything. My favorite rule is “3. Despite the evidence, it’s probably your fault.” Greg develops the popular Open Source project PEAR, a packaging tool for PHP that looks similar in concept to RubyGems.
I ran across John Bradshaw’s excellent presentation Strings and Things. Part historical overview, part explanation of the SMILES/SMARTS line notation systems, Bradshaw’s slides are chock full of interesting tidbits. My favorite: slide 29 - “Line notations are dead.” It’s a wonderful illustration of why predicting the future of technology is so tricky.