Rogue Scholar

Published August 9, 2015

Following on from Testing the GBIF taxonomy with the graph database Neo4J I've added a more complex test that relies on linking taxa to names. In this case I've picked some legume genera ( Coursetia and Poissonia ) where there have been frequent changes of name.

Note To SelfPossible ProjectComputer and Information Sciences

Possible project: #itaxonomist, combining taxonomic names, DOIs, and ORCID to measure taxonomic impact

https://doi.org/10.59350/xe6ry-h7t08

Published August 9, 2015

Author Roderic Page

Imagine a web site where researchers can go, log in (easily) and get a list of all the species they have described (with pretty pictures and, say, GBIF map), and a list of all DNA sequences/barcodes (if any) that they've published. Imagine that this is displayed in a colourful way (e.g., badges), and the results tweeted with the hastag #itaxonomist.

GBIFGraph DatabaseNeo4JRDFTaxonomyComputer and Information Sciences

Testing the GBIF taxonomy with the graph database Neo4J

https://doi.org/10.59350/tedm1-m9476

Published August 7, 2015

Author Roderic Page

I've been playing with the graph database Neo4J to investigate aspects of the classification of taxa in GBIF's backbone classification. Neo4J is a graph database, and a number of people in biodiversity informatics have been playing with it. Nicky Nicolson at Kew has a nice presentation using graph databases to handle names Building a names backbone, and the Open Tree of Life project use it in their tree machine.

FolksonomyMachine LearningNote To SelfPossible ProjectTagsComputer and Information Sciences

Possible project: extract taxonomic classification from tags (folksonomy)

https://doi.org/10.59350/n20vx-a4930

Published August 4, 2015

Author Roderic Page

Note to self about a possible project. This PLoS ONE paper: describes a method for inferring a hierarchy from a set of tags (and cites related work that is of interest). I've grabbed the code and data from http://hiertags-beta.elte.hu/home/ and put it on GitHub. Possible project Use Tibély et al. method (or others) on taxonomic names extracted from BHL text (or other) and see if we can reconstruct taxonomic classifications.

BioStorCloudCloudantCouchDBPagodaboxComputer and Information Sciences

Towards a new BioStor

https://doi.org/10.59350/sn3f9-3tg25

Published July 31, 2015

Author Roderic Page

One of my pet projects is BioStor, which has been running since 2009 (gulp). BioStor extracts articles from the Biodiversity Heritage Library (details here: http://dx.doi.org/10.1186/1471-2105-12-187), and currently has over 110,000 articles, all open access. The site itself is showing its age, both in terms of performance and design, so I've wanted to update it for a while now.

Darwin Core ArchiveDatabaseGBIFLSIDNamesComputer and Information Sciences

Modelling taxonomic names in databases

https://doi.org/10.59350/f565q-5bh48

Published July 28, 2015

Author Roderic Page

Quick notes on modelling taxonomic names in databases, as part of an ongoing discussion elsewhere about this topic. Simple model One model that is widely used (e.g., ITIS, WoRMS) and which is explicit in Darwin Core Archive is something like this: We have a table for taxa and we don't distinguish between taxa and their names. the taxonomic hierarchy is represented by the parentID field, which points to your parent.

Computer and Information Sciences

The Biodiversity Data Journal is not machine readable

https://doi.org/10.59350/n86f3-zxv91

Published July 27, 2015

Author Roderic Page

In my (previous post ) I discussed the potential for the Biodiversity Data Journal (BDJ) to be a venue for nano (or near-nano publications). In this post I want to draw attention to what I think is a serious stumbling block, which is the lack of machine readable statements in the journal.

AnnotationBiodiversity Data JournalNanopublicationComputer and Information Sciences

Nanopublications and annotation: a role for the Biodiversity Data Journal?

https://doi.org/10.59350/rtx14-m7596

Published July 27, 2015

Author Roderic Page

I stumbled across this intriguing paper: The authors are arguing that there is scope for a unit of publication between a full-blown journal article (often not machine readable, but readable) and the nanopublication (a single, machine readable statement, not intended for people to read), namely the Single Figure Publications (SFP): It seems to me that this is something that the Biodiversity Data Journal is potentially heading towards.

BHLGamesOCRComputer and Information Sciences

Purposeful Games and the Biodiversity Heritage Library

https://doi.org/10.59350/g6rvg-1kz27

Published July 23, 2015

Author Roderic Page

These are some quick thoughts on the games on the BHL site, part of the Purposeful Gaming and BHL Project. As mentioned on Twitter, I had a quick play of the Beanstalk game and got bored pretty quickly.

AltmetricAnnotationDisqusGBIFJSTORComputer and Information Sciences

Altmetrics, Disqus, GBIF, JSTOR, and annotating biodiversity data

https://doi.org/10.59350/xqb1n-xh905

Published July 22, 2015

Author Roderic Page

Browsing JSTOR's Global Plants database I was struck by the number of comments people have made on individual plant specimens. For example, for the Holotype of Scorodoxylum hartwegianum Nees (K000534285) there is a comment from Håkan Wittzell that the "Collection number should read 1269 according to Plantae Hartwegianae". In JSTOR the collection number is 1209. Now, many (if not all) of these specimens will also be in GBIF.

iPhylo

More Neo4J tests of GBIF taxonomy: Using IPNI to find objective synonyms

Possible project: #itaxonomist, combining taxonomic names, DOIs, and ORCID to measure taxonomic impact

Testing the GBIF taxonomy with the graph database Neo4J

Possible project: extract taxonomic classification from tags (folksonomy)

Towards a new BioStor

Modelling taxonomic names in databases

The Biodiversity Data Journal is not machine readable

Nanopublications and annotation: a role for the Biodiversity Data Journal?

Purposeful Games and the Biodiversity Heritage Library

Altmetrics, Disqus, GBIF, JSTOR, and annotating biodiversity data