InformatikEnglischBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
StartseiteAtom-FeedMastodonISSN 2051-8188
language
LSIDPublicationInformatikEnglisch
Veröffentlicht

My short note on the LSID Tester tool has been published in the Open Access journal Source Code for Biology and Medicine. The article has just come out so the DOI (doi:10.1186/1751-0473-3-2) isn't live yet, the direct link is http://www.scfbm.org/content/3/1/2/. Source code for the tester is available from Google Code.

ErrorTBMapInformatikEnglisch
Veröffentlicht

In the absence of a proper bug reporting system, I'm going to use this post to collect errors in the TBMap project, which maps taxonomic names in TreeBASE onto names in other databases. TaxonID TaxonName Notes T57654 Lycorideae Erroneously agrep matched to the spider family Lycosidae, this is a plant tribe.

CrossrefDOIOpenURLPaperIDInformatikEnglisch
Veröffentlicht

CrossRef have released a tool for bloggers to look up DOIs and insert them into blog posts: So far the tool is only available for WordPress blogs. The idea is that bloggers can use DOIs to uniquely identify papers that they are discussing, while at the same time providing readers with an easy way to go to the site hosting the article, and aggregators such as postgenomic.com can cluster posts about the same paper.

"data Wars"GoogleMashupMicrosoftScrapingInformatikEnglisch
Veröffentlicht

Wired 16.01 has an article entitled The Data Wars by Josh McHugh. A quote from the printed version: It's a sobering read for those of us who advocate harvesting data from as many sources as possible, more so in light of Microsoft's bid to buy Yahoo. Yahoo provides free access to many of its tools via an API (such as the image search I use in iSpecies, and in this sense is much more open than Google. Might this change under Microsoft...?

CitationPagerankInformatikEnglisch
Veröffentlicht

Came across the paper "Using incomplete citation data for MEDLINE results ranking" (pmid:16779053, fulltext available in PMC .The authors applied PageRank (the algorithm Google use to rank search results) to papers in MEDLINE and found that PageRank is robust to information loss. In other words, even if a citation database is incomplete it will do a good job of ranking results.

EAVMd5RDFInformatikEnglisch
Veröffentlicht

Nothing like a little hubris first thing Monday morning... After various experiments, such as a triple store for ants (documented on the Semant blog) and bioGUID (documented on the bioGUID blog), I'm starting from scratch and working on a "database of everything". Put another way, I'm working on a database that aggregates metadata about specimens, sequences, literature, images, taxonomic names, etc.

"rock Pools""sea Level"InformatikEnglisch
Veröffentlicht

Spent the week in Portugal at the EDIT Future Trends of Taxonomy meeting, held at the View from cave, at the beach in front of the Hotel Tivoli Almansor, Carvoeiro.

AlgorithmClassificationTransitive ReductionInformatikEnglisch
Veröffentlicht

Quick note to self, having stumbled on the Wikipedia page on transitive reduction. Given a graph like this: the transitive reduction is: Note that the original graph has an edge a -> d, but this is absent after the reduction because we can get from a to d via b (or c). What's the point?