Ciências da Computação e da InformaçãoInglêsBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Pagina inicialFeed AtomMastodonISSN 2051-8188
language
Ciências da Computação e da InformaçãoInglês
Publicados

Just some thoughts as I work through some datasets linking taxonomic names to the literature. In the diagram above I've tried to capture the different situatios I encounter. Much of the work I've done on this has focussed on case 1 in the diagram: I want to link a taxonomic name to an identifier for the work in which that name was published. In practise this means linking names to DOIs.

Catalogue Of LifeCitationCrossrefDataCiteDOICiências da Computação e da InformaçãoInglês
Publicados

Quick notes to self following on from a conversation about linking taxonomic names to the literature. There are different sorts of citation: Paper cites another paper Paper cites a dataset Dataset cites a paper Citation type (1) is largely a solved problem (although there are issues of the ownership and use of this data, see e.g. Zootaxa has no impact factor.

CitationGBIFMaterial ExaminedSpecimen CodesCiências da Computação e da InformaçãoInglês
Publicados

Note to self (basically rewriting last year's Finding citations of specimens). Bibliographic data supports going from identifier to citation string and back again, so we can do a "round trip." 1. Given a DOI we can get structured data with a simple HTTP fetch, then use a tool such as citation.js to convert that data into a human-readable string in a variety of formats.

PhylogenyTreeBASECiências da Computação e da InformaçãoInglês
Publicados

So it looks like TreeBASE is in trouble, it's legacy Java code a victim of security issues. Perhaps this is a chance to rethink TreeBASE, assuming that a repository of published phylogenies is still considered a worthwhile thing to have (and I think that question is open). Here's what I think could be done.

MarkdownObsidianCiências da Computação e da InformaçãoInglês
Publicados

Returning to the subject of personal knowledge graphs Kyle Scheer has an interesting repository of Markdown files that describe academic disciplines at https://github.com/kyletscheer/academic-disciplines (see his blog post for more background). If you add these files to Obsidian you get a nice visualisation of a taxonomy of academic disciplines.

CrossrefDOIDuplicatesCiências da Computação e da InformaçãoInglês
Publicados

This blog post provides some background to a recent tweet where I expressed my frustration about the duplication of DOIs for the same article. I'm going to document the details here.

Google MapsGraphMammal Species Of The WorldMammalsTaxonomyCiências da Computação e da InformaçãoInglês
Publicados

I keep returning to the problem of viewing large graphs and trees, which means my hard drive has accumulated lots of failed prototypes. Inspired by some recent discussions on comparing taxonomic classifications I decided to package one of these (wildly incomplete) prototypes up so that I can document the idea and put the code somewhere safe.

GraphQLSPARQLWikiCiteWikidataCiências da Computação e da InformaçãoInglês
Publicados

I've released a very crude GraphQL endpoint for WikiData. More precisely, the endpoint is for a subset of the entities that are of interest to WikiCite, such as scholarly articles, people, and journals. There is a crude demo at https://wikicite-graphql.herokuapp.com. The endpoint itself is at https://wikicite-graphql.herokuapp.com/gql.php.

AiBusiness ModelText MiningCiências da Computação e da InformaçãoInglês
Publicados

Markus Strasser (@mkstra write a fascinating article entitled "The Business of Extracting Knowledge from Academic Publications". His TL;DR: After recounting the many problems of knowledge extraction - including a swipe at nanopubs which "are ... dead in my view (without admitting it)" - he concludes: Well worth a read, and much food for thought.

GeocodingNoCodeRSSCiências da Computação e da InformaçãoInglês
Publicados

Over a decade ago RSS (RDF Site Summary or Really Simple Syndication) was attracting a lot of interest as a way to integrate data across various websites. Many science publishers would provide a list of their latest articles in XML in one of three flavours of RSS (RDF, RSS, Atom). This led to tools such as uBioRSS [1] and my own e-Biosphere Challenge: visualising biodiversity digitisation in real time.