Each year the grandly titled International Institute for Species Exploration (IISE) publishes list of the top 10 species described in the previous year.
Each year the grandly titled International Institute for Species Exploration (IISE) publishes list of the top 10 species described in the previous year.
My article describing BioStor — "Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library" — has finally seen the light of day in BMC Bioinformatics (doi:10.1186/1471-2105-12-187, the DOI is not working at the moment, give it a little while to go live, meantime you can access the article here).Getting this article published was more work than I expected.
One of the many biggest challenges I've faced with the BioStor project, apart from dealing with messy metadata, has been handling page images. At present I get these from the Biodiversity Heritage Library. They are big (typically 1 Mb in size), and have the caramel colour of old paper.
How to cite: Page, R. (2011). Dark taxa: GenBank in a post-taxonomic world.
Interest in archiving data and data publication is growing, as evidenced by projects such as Dryad, and earlier tools such as TreeBASE. But I can't help wondering whether this is a little misguided. I think the issues are granularity and reuse.Taking the second issue first, how much re-use do data sets get? I suspect the answer is "not much". I think there are two clear use cases, repeatability of a study, and benchmarks.
Quick, poorly thought out idea. I've argued before that Mendeley seems the obvious tool to build a "bibliography of life." It has pretty much all the features we need: nice editing tools, support for DOIs, PubMed identifiers, social networking, etc.But there's one thing it lacks. There's not an easy way to transmit updates from Mendeley to another database.
My paper describing the mapping between NCBI and Wikipedia has been published in PLoS Currents: Tree of Life. You can see the paper here. It's only just gone live, so it's yet to get a PubMed Central number (one of the nice features of PLoS Currents is that the articles get archived in PMC).Publishing in PLoS Currents: Tree of Life was a pleasant experience. The Google Knol editing environment was easy to use, and the reviewing process quick.
A few weeks ago I spent some time mapping pages from the BBC Wildlife Finder to the equivalent taxa in the NCBI taxonomy. This seemed a useful exercise because the Wildlife Finder pages have some wonderful picture, video, and audio content, as well as other nice features, such as reusing Wikipedia page titles as "slugs" in the BBC page URLs.
One side effect of playing with ways to visualise and integrate biology databases is that you stumble across the weird and wonderful stuff that living organisms get up to. My earliest papers were on crustacean taxonomy, so I thought I'd try my latest toy on them. What lives on crustaceans? The "symbiome" graph for crustacea shows a range of associations, including marine bacteria ( Vibrio ), fungi (microsporidians), and other
Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships.
Déjà vu is a scary thing. Four years ago I released a mapping between names in TreeBASE and other databases called TBMap (described here: doi:10.1186/1471-2105-8-158). Today I find myself releasing yet another mapping, as part of my NCBI to Wikipedia project. By embedding the mapping in a wiki, it can be edited, so the kinds of problems I encountered with TbMap, recounted here, here, and here.