Rogue Scholar

BioStorCloudCloudantCouchDBPagodaboxComputer and Information Sciences

Towards a new BioStor

Published July 31, 2015

One of my pet projects is BioStor, which has been running since 2009 (gulp). BioStor extracts articles from the Biodiversity Heritage Library (details here: http://dx.doi.org/10.1186/1471-2105-12-187), and currently has over 110,000 articles, all open access. The site itself is showing its age, both in terms of performance and design, so I've wanted to update it for a while now.

Darwin Core ArchiveDatabaseGBIFLSIDNamesComputer and Information Sciences

Modelling taxonomic names in databases

https://doi.org/10.59350/f565q-5bh48

Published July 28, 2015

Author Roderic Page

Quick notes on modelling taxonomic names in databases, as part of an ongoing discussion elsewhere about this topic. Simple model One model that is widely used (e.g., ITIS, WoRMS) and which is explicit in Darwin Core Archive is something like this: We have a table for taxa and we don't distinguish between taxa and their names. the taxonomic hierarchy is represented by the parentID field, which points to your parent.

Computer and Information Sciences

The Biodiversity Data Journal is not machine readable

https://doi.org/10.59350/n86f3-zxv91

Published July 27, 2015

Author Roderic Page

In my (previous post ) I discussed the potential for the Biodiversity Data Journal (BDJ) to be a venue for nano (or near-nano publications). In this post I want to draw attention to what I think is a serious stumbling block, which is the lack of machine readable statements in the journal.

AnnotationBiodiversity Data JournalNanopublicationComputer and Information Sciences

Nanopublications and annotation: a role for the Biodiversity Data Journal?

https://doi.org/10.59350/rtx14-m7596

Published July 27, 2015

Author Roderic Page

I stumbled across this intriguing paper: The authors are arguing that there is scope for a unit of publication between a full-blown journal article (often not machine readable, but readable) and the nanopublication (a single, machine readable statement, not intended for people to read), namely the Single Figure Publications (SFP): It seems to me that this is something that the Biodiversity Data Journal is potentially heading towards.

BHLGamesOCRComputer and Information Sciences

Purposeful Games and the Biodiversity Heritage Library

https://doi.org/10.59350/g6rvg-1kz27

Published July 23, 2015

Author Roderic Page

These are some quick thoughts on the games on the BHL site, part of the Purposeful Gaming and BHL Project. As mentioned on Twitter, I had a quick play of the Beanstalk game and got bored pretty quickly.

AltmetricAnnotationDisqusGBIFJSTORComputer and Information Sciences

Altmetrics, Disqus, GBIF, JSTOR, and annotating biodiversity data

https://doi.org/10.59350/xqb1n-xh905

Published July 22, 2015

Author Roderic Page

Browsing JSTOR's Global Plants database I was struck by the number of comments people have made on individual plant specimens. For example, for the Holotype of Scorodoxylum hartwegianum Nees (K000534285) there is a comment from Håkan Wittzell that the "Collection number should read 1269 according to Plantae Hartwegianae". In JSTOR the collection number is 1209. Now, many (if not all) of these specimens will also be in GBIF.

ChallengeJSONJSON-LDRDFSemantic WebComputer and Information Sciences

Steve Baskauf on RDF and the "Rod Page Challenge"

https://doi.org/10.59350/qxmdp-zjq98

Published July 22, 2015

Author Roderic Page

Steve Baskauf has concluded a thoughtful series of blog posts on RDF and biodiversity informatics with http://baskauf.blogspot.co.uk/2015/07/confessions-of-rdf-agnostic-part-7.html. In this post he discussed the "Rod Page Challenge", which was a series of grumpy posts I wrote (starting with this one) where I claimed RDF basically sucked, and to illustrate this I issued a challenge for people to do something interesting with some RDF I provided.

Biodiversity Data JournalDarwin Core ArchiveEOLGBIFPensoftComputer and Information Sciences

Biodiversity Data Journal data lost on the way to GBIF and EOL

https://doi.org/10.59350/m9y3y-bpc11

Published June 25, 2015

Author Roderic Page

Two ongoing challenges in biodiversity informatics are getting data into a form that is usable, and linking that data across different projects platforms. A recent and interesting approach to this problem are "data journals" as exemplified by the Biodiversity Data Journal. I've been exploring some data from this journal that has been aggregated by GBIf and EOL, and have come across a few issues.

DOIGBIFGithubORCIDComputer and Information Sciences

Thoughts on ReCon 15: DOIs, GitHub, ORCID, altmetric, and transitive credit

https://doi.org/10.59350/cgdvk-qhq18

Published June 24, 2015

Author Roderic Page

I spent last Friday and Saturday at ( Research in the 21st Century: Data, Analytics and Impact , hashtag #ReCon_15) in Edinburgh. Friday 19th was conference day, followed by a hackday at CodeBase. There's a Storify archive of the tweets so you can get a sense of the meeting. Sitting in the audience a few things struck me. No identifier wars, DOIs have won and are everywhere.

Creative CommonsGeoJSONGeophylogenyGithubPLoSComputer and Information Sciences

Visualising Geophylogenies in Web Maps Using GeoJSON

https://doi.org/10.59350/q7mg0-yq203

Published June 24, 2015

Author Roderic Page

I've published a short note on my work on geophylogenies and GeoJSON in PLoS Currents Tree of Life : At the time of writing the DOI hasn't registered, so the direct link is here. There is a GitHub repository for the manuscript and code. I chose PLoS Currents Tree of Life because it is (supposedly) quick and cheap.

Ross MounceSpecimen CodesText MiningComputer and Information Sciences

Text mining for museum specimen identifiers

https://doi.org/10.59350/xvdw8-nc818

Published May 19, 2015

Author Roderic Page

This post is a response to Ross Mounce's post Text mining for museum specimen identifiers. As Ross notes in that post, mining literature for specimen codes is something I've been interested in for a while (search for specimen codes on iPhylo), and @Aime Rankin (formerly an undergraduate student at Glasgow) did some work on this as well. It's great to see progress in this area.