Rogue Scholar

Inglés

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs

Publicado 19 de noviembre de 2025

Some quick notes based on experiments with Model Context Protocol (MCP) and (Claude](https://claude.ai). Model Context Protocol (MCP) is all the rage right now, and I’ve been slow to take a look. Kingsley Idehen recently wrote The Semantic Web Project Didn’t Fail — It Was Waiting for AI (The Yin of its Yang) where he argued that Large Language Models (LLMs) provide (finally) a user-friendly way to query triple stores (i.e., knowledge graphs).

Informática y Ciencias de la InformaciónInglés

Make Data Count Kaggle Competition

https://doi.org/10.59350/1kk3z-yq870

Publicado 7 de agosto de 2025

Autor Roderic Page

I’ve written several times here about the Make Data Count project and its major output to date, the Data Citation Corpus, currently at version 4 (see The fourth release of the Data Citation Corpus incorporates data citations from Europe PMC and additions to affiliation metadata). In June Make Data Count launched a Kaggle Competition with the goal of developing a tool that will process articles (in either PDF or XML format), extract data

Informática y Ciencias de la InformaciónInglés

How many times are DNA barcoding datasets cited?

https://doi.org/10.59350/s0c6z-2m608

Publicado 8 de julio de 2025

Autor Roderic Page

This note accompanies a dataset that I uploaded to Zenodo (https://doi.org/10.5281/zenodo.15824274). My goal in creating this dataset is to link data created on the Barcode of Life Data Systems to the DOIs for those datasets, and then to link those data DOIs to DOIs for the papers (if any) that created those datasets, and/or cited them.

Informática y Ciencias de la InformaciónInglés

A metabarcoding mess and the importance of just looking at the data

https://doi.org/10.59350/q2v8n-wc488

Publicado 5 de junio de 2025

Autor Roderic Page

Here I summarise a few posts on Bluesky where I raised concerns about some metadabarcoding datasets that were highlighted by GBIF: Looking at these datasets it’s clear that something is wrong. Data The datasets discussed are for CO1 Amplicon Sequence Variants from Madagascar, which are part of the Insect Biome Atlas project.

Informática y Ciencias de la InformaciónInglés

Tracking changes in DNA barcode BINs

https://doi.org/10.59350/h97dq-dat02

Publicado 16 de mayo de 2025

Autor Roderic Page

Following on from releasing BOLD View I’ve started to explore how the classifcation of DNA barcodes changes over time. BOLD uses the RESL algorithm described in Ratnasingham & Hebert (2013, 2016) to cluster barcodes into “BINs”. As the number of DNA barcodes grows over time these clusters may change.

Informática y Ciencias de la InformaciónInglés

Future interfaces for the Biodiversity Heritage Library

https://doi.org/10.59350/gvfg4-cw420

Publicado 11 de abril de 2025

Autor Roderic Page

On Wednesday this week (April 9th, 2025) I gave a talk entitled “Future interface(s) for BHL” (the slides are on FigShare) at BHL Day 2025.

Informática y Ciencias de la InformaciónInglés

BOLD View: exploring DNA barcodes

https://doi.org/10.59350/81kzw-qy18

Publicado 26 de febrero de 2025

Autor Roderic Page

For a while now I’ve been exploring ways to navigate through DNA barcodes. Over the years I’ve built various “toys” to explore barcodes, such as Displaying a million DNA barcodes on Google Maps using CouchDB, built a small scale browser using Elastic search that had some succes, and discovered that Postgres can search for DNA sequences and it’s really fast.

Informática y Ciencias de la InformaciónInglés

Internet Archive as a single point of failure

https://doi.org/10.59350/1r3m1-c5e22

Publicado 29 de octubre de 2024

Autor Roderic Page

How to cite: Page, R. (2024). Internet Archive as a single point of failure https://doi.org/10.59350/1r3m1-c5e22 Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details). The impact of this on the Biodiversity Heritage Library (BHL) has been huge, and reveals the extent to which BHL depends on the Archive.

Informática y Ciencias de la InformaciónInglés

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

https://doi.org/10.59350/6qepn-ge510

Publicado 18 de octubre de 2024

Autor Roderic Page

How to cite: Page, R. (2024). Exploring BOLD's DNA barcode data releases: there's a fraction too much friction https://doi.org/10.59350/6qepn-ge510 Recently I’ve been exploring data downloaded from BOLD. Part of this was motivated by work done with David Schindel for a recent book: In this blog post I record some struggles I’ve had with the supposedly “Frictionless” data provided by BOLD.

Informática y Ciencias de la InformaciónInglés

The Data Citation Corpus revisited

https://doi.org/10.59350/wvwva-v7125

Publicado 8 de octubre de 2024

Autor Roderic Page

How to cite: Page, R. (2024). The Data Citation Corpus revisited https://doi.org/10.59350/wvwva-v7125 TL;DR These are some brief notes on the latest version (v. 2) of the Data Citation Corpus, relased shortly before the Make Data Count Summit 2024, which also included a discussion on the practical uses of the corpus. I downloaded version 2 from Zenodo doi:10.5281/zenodo.13376773.

Informática y Ciencias de la InformaciónInglés

Why do museum and gallery displays ignore the web?

https://doi.org/10.59350/a83tn-c6t14

Publicado 13 de agosto de 2024

Autor Roderic Page

How to cite: Page, R. (2024). Why do museum and gallery displays ignore the web? https://doi.org/10.59350/a83tn-c6t14 This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exhibition of objects from the British Museum, London. It has all the trappings of a modern exhibition, beautiful lighting, a custom sound track, and lots of social media coverage.

iPhylo

Model Context Protocol (MCP) and triple stores: natural language queries for knowledge graphs

Make Data Count Kaggle Competition

How many times are DNA barcoding datasets cited?

A metabarcoding mess and the importance of just looking at the data

Tracking changes in DNA barcode BINs

Future interfaces for the Biodiversity Heritage Library

BOLD View: exploring DNA barcodes

Internet Archive as a single point of failure

Exploring BOLD's DNA barcode data releases: there's a fraction too much friction

The Data Citation Corpus revisited

Why do museum and gallery displays ignore the web?