InformatikEnglischBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
StartseiteAtom-FeedMastodonISSN 2051-8188
language
Catalogue Of LifeNCBIWikiWikispeciesInformatikEnglisch
Veröffentlicht

Next few weeks will be busy with term starting, kids visiting, and other commitments, so time to jot down some ideas. The first is to have a Wiki for taxonomic names. Bit like Wikispecies, but actually useful, by which I mean useful for working biologists. This would mean links to digital literature (DOIs, Handles, etc.), use of identifiers for names and taxa (such as NCBI taxids, LSIDs, etc.), and having it pre-populated with data.

ChallengeElsevierMPEInformatikEnglisch
Veröffentlicht

Just to provide a sense of how much data I want to analyse for the Challenge, I have the XML, PDF, and images for 1687 articles from Molecular Phylogenetics and Evolution to play with.

NESCentPhylogenyPostphylogeneticsInformatikEnglisch
Veröffentlicht

Last week I was at NESCent's 2008 Community Summit. As part of that meeting a few of us had a breakout group on "Biodiversity and phylogenetics". Brian O'Meara took some spectacularly thorough notes, including the pithy: S[wofford]: What? Julia Clarke and I were advocating data mining, not entirely successfully.

CrossrefDOIIdentifiersISSNWorldCatInformatikEnglisch
Veröffentlicht

I've been using ISSN's (International Standard Serial Number) to uniquely identify journals, both to generate article identifiers, and as a parameter to send to CrossRef's OpenURL resolver. Recently I've come across journals that change their ISSN, which has fairly catastrophic effects on my lookup tools.

ElsevierGrandChallengeJSONXMLInformatikEnglisch
Veröffentlicht

Starting to get serious about the Grand Challenge. First step is to parse the XML data Elsevier made available. Sadly this is only for Molecular Phylogenetics and Evolution for 2007, I would have liked the whole journal in XML to avoid hassles with parsing PDF. However, XML is not without it's own problems.

EncodingJapaneseOpenURLProgrammingUTF8InformatikEnglisch
Veröffentlicht

In case I forget how to do this, and as an example of how easy it is to get sucked into a black hole of programming micro-details, I spent a hour or more trying to figure out how to handle Japanese characters. I'm building a database of publications linked to taxonomic names, and I'm interested in linking to electronic versions of those publications.

ITISPerceptive PixelTaxonomyVisualisationWowInformatikEnglisch
Veröffentlicht

Ein Fehler ist aufgetreten. Sieh dir dieses Video auf www.youtube.com an oder aktiviere JavaScript, falls es in deinem Browser deaktiviert sein sollte. Found this while Googling. Demo by Perceptive Pixel of browsing the ITIS classification using their multi-touch technology. I want one...

False PositiveGenbankMillipedesRegular ExpressionTasmaniaInformatikEnglisch
Veröffentlicht

OMG. Playing with extracting identifiers from text, I have a regular expression for GenBank accession numbers that looks something like this: (A[A-Z])[0-9]{6} | (U[0-9]){5} | (D[A-Z])[0-9]{6} | (E[A-Z])[0-9]{6} | (NC_)[0-9]{6}). OK, it won't get everything, but what is more worrying are the things it will pickup that aren't GenBank accession numbers.