Rogue Scholar

AnnotationBHLBioStorHypothes.isScienze informatiche e dell'informazioneInglese

Hypothes.is revisited: annotating articles in BioStor

Pubblicato 2 settembre 2015

Over the weekend, out of the blue, Dan Whaley commented on an earlier blog post of mine (Altmetrics, Disqus, GBIF, JSTOR, and annotating biodiversity data. Dan is the project lead for hypothes.is, a tool to annotate web pages.

DNA BarcodingEnvironmental DNAScienze informatiche e dell'informazioneInglese

Dark taxa, drones, and Dan Janzen: 6th International Barcode of Life Conference

https://doi.org/10.59350/5xxw2-yk380

Pubblicato 1 settembre 2015

Autore Roderic Page

A little over a week ago I was at the 6th International Barcode of Life Conference, held at Guelph, Canada. It was my first barcoding conference, and was quite an experience. Here are a few random thoughts. Attendees It was striking how diverse the conference crowd was. Apart from a few ageing systematists (including veterans of the cladistics wars), most people were young(ish), and from all over the world.

NamestreamPossible ProjectTaxonomic NamesScienze informatiche e dell'informazioneInglese

Possible project: NameStream - a stream of new taxonomic names

https://doi.org/10.59350/68pt9-7zv22

Pubblicato 14 agosto 2015

Autore Roderic Page

Yet another barely thought out project, although this one has some crude code. If some 16,000 new taxonomic names are published each year, then that is roughly 40 per day. We don't have a single place that aggregates these, so any major biodiversity projects is by definition out of date. GBIF itself hasn't had an update list of fungi or plant names for several years, and at present doesn't have an up to date list of animal names.

Possible ProjectPubMed CentralScienze informatiche e dell'informazioneInglese

Possible project: A PubMed Central for taxonomy

https://doi.org/10.59350/837e3-9k809

Pubblicato 14 agosto 2015

Autore Roderic Page

I need more time to sketch this out fully, but I think a case can be made for a taxonomy-centric (or, perhaps more usefully, a biodiversity-centric) clone of PubMed Central. Why? We already have PubMed Central, and a European version Europe PubMed Central, and the content of Open Access journals such as ZooKeys appears in both, so, again, why?

BHLCloudantCouchDBDjVuSearchScienze informatiche e dell'informazioneInglese

Demo of full-text indexing of BHL using CouchDB hosted by Cloudant

https://doi.org/10.59350/4crdc-fm682

Pubblicato 10 agosto 2015

Autore Roderic Page

One of the limitations of the Biodiversity Heritage Library (BHL) is that, unlike say Google Books, its search functions are limited to searching metadata (e.g., book and article titles) and taxonomic names. It doesn't support full-text search, by which I mean you can't just type in the name of a locality, specimen code, or a phrase and expect to get back much in the way of results.

ISNIORCIDPossible ProjectWikipediaScienze informatiche e dell'informazioneInglese

Possible project: mapping authors to Wikipedia entries using lists of published works

https://doi.org/10.59350/25rrm-gaj56

Pubblicato 10 agosto 2015

Autore Roderic Page

One of the less glamorous but necessary tasks of data cleaning is mapping "strings to things", that is, taking strings such as "George A. Boulenger" and mapping them to identifiers, such as ISNI: 0000 0001 0888 841X. In case of authors such as George Boulenger, one way to do this would be through Wikipedia, which has entries for many scientists, often linked to identifiers for those people (see the bottom of the Wikipedia page for George A.

GBIFIPNINeo4JScienze informatiche e dell'informazioneInglese

More Neo4J tests of GBIF taxonomy: Using IPNI to find objective synonyms

https://doi.org/10.59350/3r86d-ctj80

Pubblicato 9 agosto 2015

Autore Roderic Page

Following on from Testing the GBIF taxonomy with the graph database Neo4J I've added a more complex test that relies on linking taxa to names. In this case I've picked some legume genera ( Coursetia and Poissonia ) where there have been frequent changes of name.

Note To SelfPossible ProjectScienze informatiche e dell'informazioneInglese

Possible project: #itaxonomist, combining taxonomic names, DOIs, and ORCID to measure taxonomic impact

https://doi.org/10.59350/xe6ry-h7t08

Pubblicato 9 agosto 2015

Autore Roderic Page

Imagine a web site where researchers can go, log in (easily) and get a list of all the species they have described (with pretty pictures and, say, GBIF map), and a list of all DNA sequences/barcodes (if any) that they've published. Imagine that this is displayed in a colourful way (e.g., badges), and the results tweeted with the hastag #itaxonomist.

GBIFGraph DatabaseNeo4JRDFTaxonomyScienze informatiche e dell'informazioneInglese

Testing the GBIF taxonomy with the graph database Neo4J

https://doi.org/10.59350/tedm1-m9476

Pubblicato 7 agosto 2015

Autore Roderic Page

I've been playing with the graph database Neo4J to investigate aspects of the classification of taxa in GBIF's backbone classification. Neo4J is a graph database, and a number of people in biodiversity informatics have been playing with it. Nicky Nicolson at Kew has a nice presentation using graph databases to handle names Building a names backbone, and the Open Tree of Life project use it in their tree machine.

FolksonomyMachine LearningNote To SelfPossible ProjectTagsScienze informatiche e dell'informazioneInglese

Possible project: extract taxonomic classification from tags (folksonomy)

https://doi.org/10.59350/n20vx-a4930

Pubblicato 4 agosto 2015

Autore Roderic Page

Note to self about a possible project. This PLoS ONE paper: describes a method for inferring a hierarchy from a set of tags (and cites related work that is of interest). I've grabbed the code and data from http://hiertags-beta.elte.hu/home/ and put it on GitHub. Possible project Use Tibély et al. method (or others) on taxonomic names extracted from BHL text (or other) and see if we can reconstruct taxonomic classifications.

BioStorCloudCloudantCouchDBPagodaboxScienze informatiche e dell'informazioneInglese

Towards a new BioStor

https://doi.org/10.59350/sn3f9-3tg25

Pubblicato 31 luglio 2015

Autore Roderic Page

One of my pet projects is BioStor, which has been running since 2009 (gulp). BioStor extracts articles from the Biodiversity Heritage Library (details here: http://dx.doi.org/10.1186/1471-2105-12-187), and currently has over 110,000 articles, all open access. The site itself is showing its age, both in terms of performance and design, so I've wanted to update it for a while now.

iPhylo

Hypothes.is revisited: annotating articles in BioStor

Dark taxa, drones, and Dan Janzen: 6th International Barcode of Life Conference

Possible project: NameStream - a stream of new taxonomic names

Possible project: A PubMed Central for taxonomy

Demo of full-text indexing of BHL using CouchDB hosted by Cloudant

Possible project: mapping authors to Wikipedia entries using lists of published works

More Neo4J tests of GBIF taxonomy: Using IPNI to find objective synonyms

Possible project: #itaxonomist, combining taxonomic names, DOIs, and ORCID to measure taxonomic impact

Testing the GBIF taxonomy with the graph database Neo4J

Possible project: extract taxonomic classification from tags (folksonomy)

Towards a new BioStor