InformatikEnglischBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
StartseiteAtom-FeedMastodonISSN 2051-8188
language
CitationData QualityGoogle ScholarMendeleyMetadataInformatikEnglisch
Veröffentlicht

Hot on the heels of Geoffrey Nunberg's essay about the train wreck that is Google books metadata (see my earlier post) comes Google Scholar’s Ghost Authors, Lost Authors, and Other Problems by Péter Jacsó. It's a fairly scathing look at some of the problems with the quality of Google Scholar's metadata.Now, Google Scholar isn't perfect, but it's come to play a key role in a variety of bibliographic tools, such as Mendeley, and Papers.

MediawikiSemantic WebTreeBASEWikiWorkshopInformatikEnglisch
Veröffentlicht

At the start of this week I took part in a biodiversity informatics workshop at the Naturhistoriska riksmuseets, organised by Kevin Holston. It was a fun experience, and Kevin was a great host, going out of his way to make sure myself and other contributors were looked after.

AntsHistory FlowPyramicaStrumigenysWikipediaInformatikEnglisch
Veröffentlicht

Stumbled across Alex Wild's post Pyramica vs Strumigenys : why does it matter?, which takes as it's starting point a minor edit war on the Wikipedia page for Pyramica . Alex gives the background to the argument about whether Pyramica is a synonym of Strumigenys , and investigates the issue using the surprisingly small about of data available in GenBank.

Gene WikiGoogleWikipediaInformatikEnglisch
Veröffentlicht

Andrew Su has posted an analysis of Gene Wiki, a project to provide Wikipedia pages on every human gene:This result is interesting in that an existing resource (Gene Cards) beats Wikipedia, but only just.

History FlowSVGVisualisationWikipediaInformatikEnglisch
Veröffentlicht

Quick post (really should be doing something else). Reading Jeff Atwood's post Mixing Oil and Water: Authorship in a Wiki World lead me to IBM's wonderful history flow tool to visualise the edit history of a Wikipedia page. There's a nice paper describing history flow (doi:10.1145/985692.985765, free PDF here). Inspired by this I decided to try and implement history flow in PHP and SVG.

GoogleWikipediaInformatikEnglisch
Veröffentlicht

Given that one response to my post on Fungi in Wikipedia was to say that fungi are also charismatic, so maybe I should try [insert unsexy taxon name here]. So, I've now looked at all the species I extracted from Wikipedia (nearly 72,000), ran the Google searches, and here are the results:SiteHow many times is it the top

FungiGoogleSearchWikipediaInformatikEnglisch
Veröffentlicht

One response to the analysis I did of the Google rank of mammal pages in Wikipedia is to suggest that Wikipedia does well for mammals because these are charismatic. It's been suggested that for other groups of taxa Wikipedia might not be so prominent in the search results.As a quick test I extracted the 1552 fungal species I could find in Wikipedia and repeated the analysis.

Clay ShirkyEOLGooglePower LawSearchInformatikEnglisch
Veröffentlicht

One assumption I've been making so far is that when people search for information on an organism using its scientific name, Wikipedia will dominate the search results (see my earlier post for an example of this assumption). I've decided to quantify this by doing a little experiment. I grabbed the Mammal Species of the World taxonomy and extracted the 5416 species names. I then used Google's AJAX search API to look up each name in Google.