I've been playing recently with the Biodiversity Heritage Library (BHL), and am starting to get a sense for the complexities (and limitations) of the metadata BHL stores about publications.
I've been playing recently with the Biodiversity Heritage Library (BHL), and am starting to get a sense for the complexities (and limitations) of the metadata BHL stores about publications.
At the start of this week I took part in a biodiversity informatics workshop at the Naturhistoriska riksmuseets, organised by Kevin Holston. It was a fun experience, and Kevin was a great host, going out of his way to make sure myself and other contributors were looked after.
Stumbled across Alex Wild's post Pyramica vs Strumigenys : why does it matter?, which takes as it's starting point a minor edit war on the Wikipedia page for Pyramica . Alex gives the background to the argument about whether Pyramica is a synonym of Strumigenys , and investigates the issue using the surprisingly small about of data available in GenBank.
Andrew Su has posted an analysis of Gene Wiki, a project to provide Wikipedia pages on every human gene:This result is interesting in that an existing resource (Gene Cards) beats Wikipedia, but only just.
Quick post (really should be doing something else). Reading Jeff Atwood's post Mixing Oil and Water: Authorship in a Wiki World lead me to IBM's wonderful history flow tool to visualise the edit history of a Wikipedia page. There's a nice paper describing history flow (doi:10.1145/985692.985765, free PDF here). Inspired by this I decided to try and implement history flow in PHP and SVG.
Given that one response to my post on Fungi in Wikipedia was to say that fungi are also charismatic, so maybe I should try [insert unsexy taxon name here]. So, I've now looked at all the species I extracted from Wikipedia (nearly 72,000), ran the Google searches, and here are the results:SiteHow many times is it the top
One response to the analysis I did of the Google rank of mammal pages in Wikipedia is to suggest that Wikipedia does well for mammals because these are charismatic. It's been suggested that for other groups of taxa Wikipedia might not be so prominent in the search results.As a quick test I extracted the 1552 fungal species I could find in Wikipedia and repeated the analysis.
Playing a bit more with the Wikipedia mammal data, there are some interesting patterns to note.
One assumption I've been making so far is that when people search for information on an organism using its scientific name, Wikipedia will dominate the search results (see my earlier post for an example of this assumption). I've decided to quantify this by doing a little experiment. I grabbed the Mammal Species of the World taxonomy and extracted the 5416 species names. I then used Google's AJAX search API to look up each name in Google.
Continuing the saga of making sense of the mammal classification in Wikipedia, I've done a quick comparison with the Mammal Species of the World (third edition) classification. MSW is the default taxonomic reference used by WikiProject Mammals.
Following on from my previous post about visualising the mammalian classification in Wikipedia, I've extracted the largest component from the graph for all mammal taxa in Wikipedia, and it is a tree. This wasn't apparent in the previous diagram, where the component appeared as a big ball due to the layout algorithm used.