Informática y Ciencias de la InformaciónInglésBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Página de inicioFeed AtomMastodonISSN 2051-8188
language
GBIFLinkingLinkoutNCBITreeBASEInformática y Ciencias de la InformaciónInglés
Publicado

In response to Rutger Vos's question I've started to add GBIF taxon ids to the iPhylo Linkout website. If you've not come across iPhylo Linkout, it's a Semantic Mediawiki-based site were I maintain links between the NCBI taxonomy and other resources, such as Wikipedia and the BBC Nature Wildlife finder. For more background see Page, R. D. M. (2011). Linking NCBI to Wikipedia: a wiki-based approach. PLoS Currents, 3, RRN1228.

EOLErrorLeptograpsusTrustInformática y Ciencias de la InformaciónInglés
Publicado

There's a recent thread on the Encyclopedia of Life concerning erroneous images for the crab Leptograpsus . This is a crab I used to chase around rooks on stormy west-coast beaches near Auckland, so I was a little surprised to see the EOL page for Leptograpsus looks like this: The name and classification is the crab, but the image is of a fish ( Lethrinus variegatus ). Perhaps at some point in aggregating the images

BioStorClassificationData CleaningErrorGBIFInformática y Ciencias de la InformaciónInglés
Publicado

This post arose from an ongoing email conversation with Tony Rees about extracting and annotating taxonomic names. In BioStor I use the GBIF classification to display the taxonomic names found in the OCR text in the form of a tree. The idea is to give the reader a sense of "what the paper is about". I also use the classification to help link to GBIF occurrence records.

ChallengeData IntegrationEOLTaxonomyInformática y Ciencias de la InformaciónInglés
Publicado

In the spirit of the Would you give me a grant experiment? [1] here's the draft of a proposal I'm working on for the Computable Data Challenge. It's an attempt to merge taxonomic names, the primary literature, and phylogenetics into one all-singing, all-dancing website that makes it easy to browse names, see the publications relevant to those names, and see what, if anything, we know about the phylogeny of those taxa.

Dark TaxaDNA BarcodingNCBIInformática y Ciencias de la InformaciónInglés
Publicado

Dark taxa have become even darker. NCBI has pulled the plug on large numbers of DNA barcode sequences that lack scientific names. For example, taxon Cyclopoida sp. BOLD:AAG9771 (tax_id 818059) now has a sparse page that has no associated sequences. From an earlier download of EMBL I know that this taxon is associated with at least 5 sequences, such as GU679674. But if you go to that sequence you get this: So the the sequence is hidden.

CrossrefDataCiteDOIIdentifiersSpecimen CodesInformática y Ciencias de la InformaciónInglés
Publicado

Based on recent discussions my sense is that our community will continue to thrash the issue of identifiers to death, repeating many of the debates that have gone on (and will go on) in other areas. To be trite, it seems to me we have three criteria: cheap , resolvable , and persistent . We get to pick two.

ChallengeDataEOLInformática y Ciencias de la InformaciónInglés
Publicado

Now we are awash in challenges! EOL has announced its Computable Data Challenge: Some $US 50,000 is on offer. "Challenge" is perhaps a misnomer, as EOL is offering this money not as a prize at the end, but rather to fund one or more proposals (submitted by 22 May) that are accepted.

BHLBiomedicalGBIFLinkingMekong River SchistosomiasisInformática y Ciencias de la InformaciónInglés
Publicado

When I think of the Biodiversity Heritage Library (BHL) or GBIF I tend to think of taxonomy and biodiversity. Folk wisdom has it that BHL is full of old books, mostly pre-1923. Great for finding old taxonomic names, or nice artwork, but not exactly "modern" biology. GBIF is mainly about displaying organism distributions based on museum specimens, the primary data of taxonomic research.

Informática y Ciencias de la InformaciónInglés
Publicado

The iEvoBio 2012 Challenge has been announced, and the topic is synthesizing phylogenies. The task: The rules of this challenge are: The set of trees you use must have at least 10,000 leaves in total. Acceptable entries could be a set comprising 2,500 distinct trees covering the same four taxa, a single tree with 10,000+ leaves, or anything in between. Your results must be scientifically new.

AnnotationErrorGBIFGenbankIdentifiersInformática y Ciencias de la InformaciónInglés
Publicado

One reason I'm pursuing the theme of specimen identifiers (and identifiers in general) is the central role they play in annotating databases. To give a concrete example, I (among others) have argued for a wiki-style annotation layer on top of GenBank to capture things such as sequencing errors, updated species names, etc. Annotation is a lot easier if we have consistent identifiers for the things being annotated.

IntegrationLinksVelcroInformática y Ciencias de la InformaciónInglés
Publicado

Sometimes I need to remind myself just why I'm spending so much time trying to make sense of other people's data, and why I go on (and on) about identifiers. One reason for my obsession is I want data to be "sticky", like the burrs shown in the photo above (Who invented velcro? by A-dep). Shared identifiers are like the hooks on the burrs, if two pieces of data have the same identifier they will stick together.