Bilgisayar ve Bilişim BilimleriİngilizceBlogger

iPhylo

Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Ana SayfaAtom BeslemeMastodonISSN 2051-8188
language
AnnotationErrorGBIFGenbankIdentifiersBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

One reason I'm pursuing the theme of specimen identifiers (and identifiers in general) is the central role they play in annotating databases. To give a concrete example, I (among others) have argued for a wiki-style annotation layer on top of GenBank to capture things such as sequencing errors, updated species names, etc. Annotation is a lot easier if we have consistent identifiers for the things being annotated.

IntegrationLinksVelcroBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

Sometimes I need to remind myself just why I'm spending so much time trying to make sense of other people's data, and why I go on (and on) about identifiers. One reason for my obsession is I want data to be "sticky", like the burrs shown in the photo above (Who invented velcro? by A-dep). Shared identifiers are like the hooks on the burrs, if two pieces of data have the same identifier they will stick together.

BHLBioStorGBIFIdentifiersLinkingBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

Following on from exploring links between GBIF and GenBank here I'm going to look at links between GBIF and the primary literature, in this case articles scanned by the Biodiversity Heritage Library (BHL). The OCR text in BHL can be mined for a variety of entities. BHL itself has used uBio's tools to identity taxonomic names in the OCR text, and in my BioStor project I've extracted article-level metadata and geographic co-ordinates.

ClusteringData CleaningGraphvizTaxonomyBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

Revisiting an old idea (Clustering taxonomic names) I've added code to cluster strings into sets of similar strings to the phyloinformatics course site.This service (available at http://iphylo.org/~rpage/phyloinformatics/services/clusterstrings.php) takes a list of strings, one per line, and returns a list of clusters.

LSIDRantBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

I'll keep this short: LSIDs suck because they are so hard to set up that many LSIDs don't actually work. Because of this there seems to be no shame in publishing "fake" LSIDs (LSIDs that look like LSIDs but which don't resolve using the LSID protocol). Hey, it's hard work, so let's just stick them on a web page but not actually make them resolvable.

FrogsGBIFGenbankGeophylogenyKMLBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

As part of my mantra that it's not about the data, it's all about the links between the data, I've started exploring matching GenBank sequences to GBIF occurrences using the specimen_voucher codes recorded in GenBank sequences. It's quickly becoming apparent that this is not going to be easy.

ChallengeEOLTree Of LifeBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

The Encyclopedia of Life have announced the EOL Phylogenetic Tree Challenge. The contest has two purposes:First prize is a trip to iEvoBio 2012, this year in Ottawa, Canada. For more details visit the challenge website. There is also an EOL community devoted to this challenge.Challenges are great things, especially ones with worthwhile tasks and decent prizes.

BLASTDark TaxaPhyloinformaticsBilgisayar ve Bilişim Bilimleriİngilizce
Yayınlandı

I've updated the BLAST a sequence and get a tree tool described in a previous post to output additional details, such as a list of the sequences used to build the tree and some basic metadata (such as the taxon name, name of any associated host, publication, and geographic coordinates). If the sequences are geotagged, then you will also see a little map showing the localities.