
An anonymous reader reported that the American Medical Association published the structure of dapagliflozin. Here are the details.
An anonymous reader reported that the American Medical Association published the structure of dapagliflozin. Here are the details.
Mike released Operator 0.8, which picks up RDF (RDFa en eRDF) from HTML pages, and adds actions to it. I blogged earlier about the beta and wrote a script for it for chemical RDFa. At this moment, Chemical blogspace and RDF for Molecular Space (see this blog) are using chemical RDFa to semantically markup molecular information. The new Operator release (download) has one notable API change: it now uses “RDF” as key for semantic information;
Via SciFoo Planet (from Partial immortalization ) I learned about TouchGraph Google (Peter brought it into Chemical blogspace). It’s cool, though not open source. Here’s the touch graph for my blog: As you can see, plenty of blogspot bloggers around me, among which, in purple, Useful Chemistry. Funny thing is, each time I repeat the Google search, the output is different. Oh, and make sure to drag one of the halos around;
Peter wondered if data should be stored centralized or decentralized, when Deepak blogged about Freebase and Metaweb. Now, I haven’t really looked into these two projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world wide web; it’s the PubChem compound ID versus the InChI;
Rich blogged about to Never Draw the Same Molecule Twice: Viewing Image Metadata in which he shows his molecular editor outputting images of molecular structure where the connectivity table of structure is embedded in the image. His molecular editor can read the image again, and will automatically pick up the embedded connection table. Noel showed that such can not only be done in Java, but in Python too.
I reported last week about the Molecules in Wikipedia and the plethora of templates used.
I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule.
Well, no wonder: Excel is meant to be used to process money flows. Anyway, greyarea pointed me to this nice blog item from March 2006. It discusses a 2004 article in BMC Bioinformatics Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics by Barry Zeeberg et al. (DOI:10.1186/1471-2105-5-80). Hence, the importance of semantics and proper markup languages.
RDF might be the solution we are looking for to get a grip on the huge amount of information we are facing. microformats, and RDFa, are just solutions along the way, and Gleaning Resource Descriptions from Dialects of Languages (GRDDL) might be an important tool to get the web RDF-ied. One important aspect of RDF is that any resource has a unique URI.
I had some time to work some more on the QSAR functionality in Bioclipse. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrix This screenshot shows that multi-resource selection is now working, and that the calculation is now a Job.
Igor wrote a message to the CCL mailing list about OSRA: The email does not give any information on the fail rate, but the demo they provide via the webinterface does show some minor glitches (the bromine is not recognized): The source reuses OpenBabel and uses the GPL license. The value equal to that of text mining tools like OSCAR3 , and together they sounds like the Jordan and Pippen of mining chemical literature.