Rogue Scholar

Ion TorrentBiologieAnglais

Is IonTorrent open or not?

Publié 12 décembre 2011

It seems IonTorrent are trying to present themselves as the open democratising sequencing platform for high throughput sequencing, with their Ion Community, sample datasets and (in theory) open source software.

BiologieAnglais

Random access to BZIP2?

https://doi.org/10.59350/mtk8h-23g61

Publié 22 novembre 2011

Auteur Peter Cock

In my last post I looked at how the GZIP variant BGZF (Blocked GNU Zip Format, used in BAM files) allowed efficient random access to large compressed files. This time I'm looking at bzip2 (bz2) which offers better compression than GZIP, but is also block based so in theory the same random access strategy can be employed.

CompressionSAM/BAMBiologieAnglais

BGZF - Blocked, Bigger & Better GZIP!

https://doi.org/10.59350/na97p-0pe24

Publié 8 novembre 2011

Auteur Peter Cock

BAM files are compressed using a variant of GZIP (GNU ZIP), called BGZF (Blocked GNU Zip Format). Anyone who has read the SAM/BAM Specification will have seen the terms BGZF and virtual offsets , but what you may not realise is how general purpose this is for random access to any large compressed file.

FASTQSAM/BAMBiologieAnglais

FASTQ must die! Long live SAM/BAM!

https://doi.org/10.59350/25zt9-en439

Publié 21 octobre 2011

Auteur Peter Cock

I think it is time to retire the FASTQ file format in favour of storing unaligned reads in SAM/BAM format.

SAM/BAMBiologieAnglais

SAM/BAM without gapped reference

https://doi.org/10.59350/pxdq2-4hk44

Publié 3 octobre 2011

Auteur Peter Cock

In my last post I talked about SAM/BAM with a gapped reference, and how this makes it much easier to work with inserted bases relative to the reference/consensus - especially for visualisation. I should point out that some viewers do actually manage to show the inserts as columns even with the traditional ungapped/unpadded reference sequence - notably Gap5, Bambino, and the text based samtools tview, as shown in these tview screenshots.

SAM/BAMBiologieAnglais

SAM/BAM with gapped reference

https://doi.org/10.59350/tr3d0-87z86

Publié 22 septembre 2011

Auteur Peter Cock

A lot of my time this week has gone into thinking and "talking" on the samtools-devel mailing list about the SAM/BAM file format and how it might be improved for ( de novo ) assemblies. SAM/BAM Anyone working with high throughput sequencing data (formerly known as Next Generation Sequencing, NGS), should be well versed with the SAM/BAM file format.

GFFNCBIBiologieAnglais

Why are NCBI GFF3 files still broken?

https://doi.org/10.59350/ejx9r-6qw23

Publié 15 août 2011

Auteur Peter Cock

For the early part of my career in Bioinformatics I was able to avoid GFF3 files - initially I focused on finished annotated genomes from the NCBI in plain text GenBank format (which has complications of its own), but with genome sequencing becoming widespread, so too is genome assembly and annotation. And for this, you will have to learn about GFF3 files.

BLASTNCBIBiologieAnglais

Opening up NCBI BLAST?

https://doi.org/10.59350/ybb4r-91n60

Publié 11 août 2011

Auteur Peter Cock

The BLAST chapter of the Biopython Tutorial (PDF) starts with these lines by Brad Chapman, I know what he meant - but it turns out things could be easier, especially once you start running "standalone BLAST" on your own machines, rather than using the NCBI's ever improving BLAST website. Part of the problem is setting up BLAST and its databases can be complicated (especially on a cluster), but also inevitably, BLAST has bugs.

Blasted Bioinformatics!?

Is IonTorrent open or not?

Random access to BZIP2?

BGZF - Blocked, Bigger & Better GZIP!

FASTQ must die! Long live SAM/BAM!

SAM/BAM without gapped reference

SAM/BAM with gapped reference

Why are NCBI GFF3 files still broken?

Opening up NCBI BLAST?