Ciências da Computação e da InformaçãoInglêsBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Pagina inicialFeed AtomMastodonISSN 2051-8188
language
Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2024). Problems with the DataCite Data Citation Corpus https://doi.org/10.59350/t80g1-xys37 DataCite have released the Data Citation Corpus, together with a dashboard that summarises the corpus. This is billed as: The goal is to build a citation database between scholarly articles and data, such as datasets in repositories, sequences in GenBank, protein structures in PDB, etc.

Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). It’s 2023 - why are we still not sharing phylogenies? https://doi.org/10.59350/n681n-syx67 A quick note to support a recent Twitter thread https://twitter.com/rdmpage/status/1729816558866718796?s=61&t=nM4XCRsGtE7RLYW3MyIpMA The article “Diversification of flowering plants in space and time” by Dimitrov et al. describes a genus-level phylogeny for 14,244 flowering plant genera.

Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). Where are the plant type specimens? Mapping JSTOR Global Plants to GBIF. https://doi.org/10.59350/m59qn-22v52 This blog post documents my attempts to create links between two major resources for plant taxonomy: JSTOR’s Global Plants and GBIF, specifically between type specimens in JSTOR and the corresponding occurrence in GBIF.

ABBYYCRFDjVuDocument LayoutHOCRCiências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). Document layout analysis. https://doi.org/10.59350/z574z-dcw92 Some notes to self on document layout analysis. I’m revisiting the problem of taking a PDF or a scanned document and determining its structure (for example, where is the title, abstract, bibliography, where are the figures and their captions, etc.). There are lots of papers on this topic, and lots of tools.

Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). The problem with GBIF’s Phylogeny Explorer. https://doi.org/10.59350/v0bt3-zp114 GBIF recently released the Phylogeny Explorer, using legumes as an example dataset. The goal is to enables users to “view occurrence data from the GBIF network aligned to legume phylogeny.” The screenshot below shows the legume phylogeny side-by-side with GBIF data.

Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). Adventures in machine learning: iNaturalist, DNA barcodes, and Lepidoptera. https://doi.org/10.59350/5q854-j4s23 Recently I’ve been working with a masters student, Maja Nagler, on a project using machine learning to identify images of Lepidoptera. This has been something of an adventure as I am new to machine learning, and have only minimal experience with the Python programming language.

Ciências da Computação e da InformaçãoInglês
Publicados

How to cite: Page, R. (2023). A taxonomic search engine. https://doi.org/10.59350/r3g44-d5s15 Tony Rees commented on my recent post Ten years and a million links. I’ve responded to some of his comments, but I think the bigger question deserves more space, hence this blog post. Tony’s comment My response I think there are several ways to approach this.

Ciências da Computação e da InformaçãoInglês
Publicados

As trailed on a Twitter thread last week I’ve been working on a manuscript describing the efforts to map taxonomic names to their original descriptions in the taxonomic literature. The preprint is on bioRxiv doi:10.1101/2023.05.29.542697 Much of the work has been linking taxa to names, which still has huge gaps.