Scienze informatiche e dell'informazioneIngleseBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Pagina inizialeAtom ForaggioMastodonISSN 2051-8188
language
Bibliography Of LifeCSLElasticSearchJSONJSON-LDScienze informatiche e dell'informazioneInglese
Pubblicato

I've released a simple search engine for publications in Wikidata. Wikicite Search takes its name from the WikiCite project, which was an initiative to create a bibliographic database in Wikidata. Since bibliographic data is a core component of taxonomic research (arguably taxonomy is mostly tracing the fate of the "tags" we call taxonomic names) I've spent some time getting taxonomic literature into Wikidata.

CitationCSLMachine LearningParsingScienze informatiche e dell'informazioneInglese
Pubblicato

Quick note on a tool I've been working on to parse citations, that is to take a series of strings such as: Möllendorff O (1894) On a collection of land-shells from the Samui Islands, Gulf of Siam. Proceedings of the Zoological Society of London, 1894: 146–156. de Morgan J (1885) Mollusques terrestres & fluviatiles du royaume de Pérak et des pays voisins (Presqúile Malaise). Bulletin de la Société Zoologique de France, 10: 353–249.

C++CloudCompilingHerokuScienze informatiche e dell'informazioneInglese
Pubblicato

TL;DR Use a buildpack and set "LDFLAGS=--static" --disable-shared I use Heroku to host most of my websites, and since I mostly use PHP for web development this has worked fine. However, every so often I write an app that calls an external program written in, say, C++. Up until now I've had to host these apps on my own web servers. Today I finally bit the bullet and learned how to add a C++ program to a Heroku-hosted site.

ALABHLBioStorGBIFPlaziScienze informatiche e dell'informazioneInglese
Pubblicato

If you compare the impact that BHL and Plazi have on GBIF, then it's clear that BHL is almost invisible. Plazi has successfully in carved out a niche where they generate tens of thousands of datasets from text mining the taxonomic literature, whereas BHL is a participant in name only. It's not as if BHL lacks geographic data.

CitationCRFIdentifiersMachine LearningSpecimensScienze informatiche e dell'informazioneInglese
Pubblicato

Note to self. The challenge of finding specimen citations in papers keeps coming around. It seems that this is basically the same problem as finding citations to papers, and can be approached in much the same way. If you want to build a database of reference from scratch, one way is to scrape citations from papers (e.g., from the "literature cited" section), convert those strings into structured data, and add those to your database.

Catalogue Of LifeGraphvizSummary TreesVisualisationScienze informatiche e dell'informazioneInglese
Pubblicato

How to cite: Page, R. (2021). Maximum entropy summary trees to display higher classifications https://doi.org/10.59350/af01t-6sw74 A challenge in working with large taxonomic classifications is how you display them to the user, especially if the user probably doesn't want all the gory details.

Bibliography Of LifePreprintWikiCiteWikidataScienze informatiche e dell'informazioneInglese
Pubblicato

Last week I submitted a manuscript entitled "Wikidata and the bibliography of life". I've been thinking about the "bibliography of life" (AKA a database of every taxonomic publication ever published) for a while, and this paper explores the idea that Wikidata is the place to create this database.

BHLBioStorVisualisationScienze informatiche e dell'informazioneInglese
Pubblicato

It's funny how some images stick in the mind. A few years ago Chris Freeland (@chrisfreeland), then working for Biodiversity Heritage Library (BHL), created a visualisation of BHL content relevant to the African continent. It's a nice example of small multiples. For more than a decade (gulp) I've been extracting articles from the BHL and storing them in BioStor.

ChallengeDNA BarcodingGBIFScienze informatiche e dell'informazioneInglese
Pubblicato

Somewhat stunned by the fact that my DNA barcode browser I described earlier was one of the (minor) prizewinners in this year's GBIF Ebbe Nielsen Challenge. For details on the winner and other place getters see ShinyBIOMOD wins 2020 GBIF Ebbe Nielsen Challenge. Obviously I'm biased, but it's nice to see the challenge inspiring creativity in biodiversity informatics. Congratulations to everyone who took part.