BiologieEnglischBlogger

Blasted Bioinformatics!?

Bioinformatics lessons learned the hard way, bugs, gripes, and maybe topical paper reviews too...
StartseiteAtom-Feed
language
BLASTNCBIBiologieEnglisch
Veröffentlicht

Dear Santa, Please could you ask the Elves at the NCBI to deliver the following BLAST+ feature requests for Christmas 2014? Thank you, Peter P.S. Do they think I have been naughty or nice with my BLAST blog posts? These are roughly in order of increasing priority, starting with some relatively minor issues.

BLASTNCBIBiologieEnglisch
Veröffentlicht

In the last couple of years, my preferred BLAST output format has switched from BLAST XML to plain tabular output. The main reason for this it is easier to parse, and now gives easy access to more fields - BLAST+ 2.2.28 added descriptions and taxonomy output to the tabular and CSV output, but the cumulative effect is BLAST XML has been lagging behind.

BLASTNCBIBiologieEnglisch
Veröffentlicht

While working on updating the NCBI BLAST+ wrapper for Galaxy for any changes in the new BLAST+ 2.2.30 release, I hit a cryptic error message from deltablast:  $ deltablast -query rhodopsin_proteins.fasta -subject four_human_proteins.fasta -evalue 1e-08 -outfmt "6 qseqid sseqid score" -rpsdb /data/blastdb/cdd_delta BLAST engine error: /data/blastdb/cdd_delta contains no frequency ratios needed for composition-based statistics.

BLASTNCBIBiologieEnglisch
Veröffentlicht

For some time I had thought that the best option for computer parsing of BLAST+ output was BLAST XML. It had all the key bits of information, and XML is designed for automated parsing. However, with the extra fields added to the tabular or comma separated output in BLAST+ 2.2.28 like the long overdue hit descriptions, and taxonomy fields, I think they are now preferable. BLAST XML is now lagging behind!

BLASTNCBIBiologieEnglisch
Veröffentlicht

This is in a sense a continuation of my previous BLAST blog post, My IDs not good enough for NCBI BLAST+. My core complaint is that makeblastdb currently ignores the user's own identifiers and automatically assigns its own identifiers (gnl|BL_ORD_ID|0, gnl|BL_ORD_ID|1, gnl|BL_ORD_ID|2, etc ), and that the BLAST+ suite as a whole is inconsistent about hiding these in its output.

BLASTNCBIBiologieEnglisch
Veröffentlicht

Back in 2009, I wrote some Python scripts to use the NCBI Entrez Utilities to search for and download all known complete virus genomes in GenBank format, which I then processed to make FASTA files and BLAST databases. Recently I updated them and ran into some problems... false positives like entire bacterial genomes!

GalaxyLaTeXBiologieEnglisch
Veröffentlicht

This summer I submitted a paper to the innovative new open access journal PeerJ, where it was published this week (Cock et al. 2013). I decided to write up the experience in the style of the PeerJ's Interview with an Author blog posts. I've copied the questions they normally ask, and written up my own replies - other than some rough edges in their current submission system it was all good.

GalaxyBiologieEnglisch
Veröffentlicht

Travis CI is one of the best things to happen to GitHub in some time - it adds automated testing capabilities to your source code repository as changes are committed, and even on pull requests to help ensure new work doesn't break existing functionality. We've been using this for Biopython for over a year, but this month I've started using TravisCI for testing my add-ons for the Galaxy Project as well.

BiologieEnglisch
Veröffentlicht

Yesterday I attended the annual "Potatoes in Practice" meeting for the first time, mainly to see the finished display which I helped produce. Here it is, showing the twelve chromosomes of potato, drawn as stylized uniform green 'X' shapes, with different colour LEDs marking traits of interest for potato breeding.