Sciences naturellesAnglaisJekyll

Biopragmatics

Unraveling complex biology with biological knowledge graphs. Content licensed under CC BY 4.0.
Page d'accueilFlux AtomMastodon
language
OntologyOWLGenesHGNCSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

This is the second of a two-part post about encoding databases as ontologies. In the first part, I gave a background on how I started working on this problem and the software stack I developed along the way. In this post, I explain the philosophy and design about how I encoded the HGNC (HUGO Gene Nomenclature Committee) database as an ontology using PyOBO.

OntologyOWLGenesSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

This is the first of a two-part post about encoding databases as ontologies. In this post, I give a background on the problems in biocuration that led me to start encoding databases as ontologies, the software I have written to do it, and the repository I have created to store the resulting artifacts in a FAIR, open, and sustainable way.

Knowledge GraphsSparqlChemistryCultureCultural HeritageSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

At the sixth NFDI4Chem consortium meeting, Torsten Schrade from the NFDI4Culture consortium gave a lovely and whimsical talk entitled A Data Alchemist’s Journey through NFDI which explored ways that we might federate and jointly query both consortia’s knowledge via their respective SPARQL endpoints. He proposed a toy example in which he linked paintings depicting alchemists trying to make gold to compounds containing gold.

RORWikidataOrganizationOrganizationsBibliometricsSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

I was looking at the different NFDI consortia in the Research Organization Registry (ROR), and found that the only two that have a parent relations to the NFDI (ror:05qj6w324) are NFDI4DS (ror:00bb4nn95) and MaRDI (ror:04ncnzm65). This felt strange to me, so I started looking around Wikidata to see if I could automatically make a curation sheet to send along to them.

PackagingPythonToxJustCookiecutter-snekpackSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

I became aware of just while watching Hynek’s second video on uv a few months ago. I immediately fell in love with its elegance and simplicity, so I have begun replacing task running in my repositories that relied on tox with just. This post gives a bit of background, context, and walks through making the switch on one of my repositories that has some annoying dependencies.

NFDISPARQLBioregistrySciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

Earlier this week at the sixth NFDI4Chem consortium meeting, Torsten Schrade from the NFDI4Culture consortium gave a lovely and whimsical talk entitled A Data Alchemist’s Journey through NFDI which explored ways that we might federate and jointly query both consortia’s knowledge via their respective SPARQL endpoints.

CURIEURIURNIRIIdentifiersSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

Using standard CURIE prefixes and URI prefixes in semantic web artifacts such as Resource Description Framework (RDF) promotes interoperability, enables reuse in downstream data integration, and makes data more FAIR. The Bioregistry defines a set of standard CURIE prefixes and URI prefixes against which RDF files can be validated/standardized.

ChEMBLCheminformaticsChemoinformaticsChemistryBibliometricsSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

I’ve recently submitted an article to the Journal of Open Source Software (JOSS) describing chembl-downloader, a Python package for automating downloading and using ChEMBL data in a reproducible way. In this post, I use chembl-downloader to show how the number of compounds, assays, activities, and other entities in ChEMBL have changed over time.

CURIEURIURNIRIIdentifiersSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

The Bioregistry is a database and toolchain for standardization of prefixes, CURIEs, and URIs that appear in linked (open) data. While I created it in 2019 as a component of PyOBO in order to support parsing database cross-references appearing in biomedical ontologies, it has since become an independent project with a community-driven governance model and much broader applications. This post is a first attempt to quantify its usage and impact.

BiomarkerSemantic SpacesBioregistryBiomarkerKBSciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

The Bioregistry is a community-driven registry of semantic spaces and their metadata. When I learned about BiomarkerKB at the International Society for Biocuration’s 18th Annual International Biocuration Conference, I was excited to curate new records (and prefixes) in the Bioregistry to cover BiomarkerKB’s semantic spaces on biomarkers.

OntologyEmbeddingsBertSbertSimilaritySciences naturellesAnglais
Publié
Auteur Charles Tapley Hoyt

The Ontology Lookup Service (OLS) is now indexing dense embeddings for ontology terms constructed from term labels, synonyms, and descriptions using LLMs. I maintain a Python client library for the OLS (ols-client) and was recently asked to implement a wrapper to the OLS’s API endpoint that exposes these embeddings.