InformatikEnglischBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
StartseiteAtom-FeedMastodonISSN 2051-8188
language
GenbankNCBIType SpecimensInformatikEnglisch
Veröffentlicht

Scott Federhen told me about a nice new feature in GenBank that he's described in a piece for NCBI News. The NCBI taxonomy database now shows a its of type material (where known), and the GenBank sequence database "knows: about types. Here's the summary:You can query for sequences from type using the query "sequence from type"[filter]. This could lead to some nice automated tools.

2014BioNamesGBIFGoogleKnowledge GraphInformatikEnglisch
Veröffentlicht

More for my own benefit than anything else I've decided to list some of the things I plan to work on this year. If nothing else, it may make sobering reading this time next year. A knowledge graph for biodiversity Google's introduction of the "knowledge graph" gives us a happy phrase to use when talking about linking stuff together. It doesn't come with all the baggage of the "semantic web", or the ambiguity of "knowledge base".

AnnotationEditingFiltered-pushGBIFIdentifiersInformatikEnglisch
Veröffentlicht

Given that it's the start of a new year, and I have a short window before teaching kicks off in earnest (and I have to revise my phyloinformatics course) I'm playing with a few GBIF-related ideas. One topic which comes up a lot is annotating and correcting errors. There has been some work in this area [1][2] bit it strikes me as somewhat complicated.

DNA BarcodingGenbankGPSGuest PostInformatikEnglisch
Veröffentlicht

The following is a guest blog post by David Schindel and colleagues and is a response to the paper by Antonio Marques et al. in Science doi:10.1126/science.341.6152.1341-a.Marques, Maronna and Collins (1) rightly call on the biodiversity research community to include latitude/longitude data in database and published records of natural history specimens.

BHLCodeDjVuHOCRJATSInformatikEnglisch
Veröffentlicht

A while ago I posted BHL to PDF workflow which was a sketch of a work flow to generate clean, searchable PDFs from Biodiversity Heritage Library (BHL) content: I've made some progress on putting this together, as well as expanded the goal somewhat. In fact, there are several goals:BioStor articles need to be archived somewhere. At the moment they live on my server, and metadata is also served by BHL (as the "parts" you see in a scanned volume).

ICZNNatureTaxonomyZootaxaInformatikEnglisch
Veröffentlicht

There is a fairly scathing editorial in Nature [The new zoo. (2013). Nature, 503(7476), 311–312. doi:10.1038/503311b ] that reacts to a recent paper by Dubois et al.:To quote the editorial:Ouch! But Dubois et al.'s paper pretty much deserves this reaction - it's a reactionary rant that is breathtaking in it's lack of perspective.

Index FungorumIONIPNILSIDsNomenclatorsInformatikEnglisch
Veröffentlicht

Quick notes on taxonomic names (again). It's a continuing source of bafflement that the biodiversity community is making a dog's breakfast of names. It seems we are forever making it more complicated than it needs to be, forever minting new acronyms that pollute the landscape without actually contributing anything useful, and forever promising shiny new tools and services without every actually delivering them.

GBIFGithubZooKeysInformatikEnglisch
Veröffentlicht

Here's another example of a Darwin Core Archive that is "broken" such that GBIF is missing some information. GBIF data set A checklist to the wasps of Peru (Hymenoptera, Aculeata) comes from Pensoft, and corresponds to the paper:As with the previous example GBIF says there are 0 georeferenced records in this dataset. This is odd, because the ZooKeys page for this article lists three supplementary files, including KML files for Google Earth.

Darwin Core ArchiveGBIFGithubInformatikEnglisch
Veröffentlicht

Following on from Annotating and cleaning GBIF data: Darwin Core Archive, GitHub, ORCID, and DataCite here's a quick and dirty example of using GitHub to help clean up a Darwin Core Archive. The dataset 3i - Cicadellinae Database has 2,152 species and 4,749 taxa, but GBIF says it has no georeferenced data.