Rogue Scholar

ProductivityBiologíaInglés

Copy Text to the Local Clipboard from a Remote SSH Session

https://doi.org/10.59350/jbbpv-ey569

Publicado 26 de noviembre de 2012

Autor Stephen Turner

This is an issue that has bugged me for years, and I've finally found a good solution on osxdaily and Stack Overflow.

AnnotationBioinformaticsDatabasesENCODESearchBiologíaInglés

RegulomeDB: Identify DNA Features and Regulatory Elements in Non-Coding Regions

https://doi.org/10.59350/m96bt-rdz09

Publicado 7 de noviembre de 2012

Autor Stephen Turner

Many papers have noted the challenges associated with assigning function to non-coding genetic variation, and since the majority of GWAS hits for common traits are non-coding, resources for providing some mechanism for these associations are desperately needed.

BioinformaticsRNA-SeqSequencingSoftwareBiologíaInglés

STAR: ultrafast universal RNA-seq aligner

https://doi.org/10.59350/rfxks-m8r39

Publicado 2 de noviembre de 2012

Autor Stephen Turner

There's a new kid on the block for RNA-seq alignment. Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics (2012). Aligning RNA-seq data is challenging because reads can overlap splice junctions. Many other RNA-seq alignment algorithms (e.g. Tophat) are built on top of DNA sequence aligners.

BioinformaticsPythonRTutorialsBiologíaInglés

Learn R and Python, and Have Fun Doing It

https://doi.org/10.59350/g4dpz-2cx92

Publicado 25 de septiembre de 2012

Autor Stephen Turner

If you need to catch up on all those years you spent not learning how to code (you need to know how to code), here are a few resources to help you quickly learn R and Python, and have a little fun doing it. First, the free online Coursera course Computing for Data Analysis just started.

BioinformaticsRRNA-SeqSequencingBiologíaInglés

DESeq vs edgeR Comparison

https://doi.org/10.59350/nmhm2-jp919

Publicado 18 de septiembre de 2012

Autor Stephen Turner

Update (Dec 18, 2012): Please see this related post I wrote about differential isoform expression analysis with Cuffdiff 2. DESeq and edgeR are two methods and R packages for analyzing quantitative readouts (in the form of counts) from high-throughput experiments such as RNA-seq or ChIP-seq.

RStatisticsVisualizationBiologíaInglés

More on Exploring Correlations in R

https://doi.org/10.59350/9ed93-wm925

Publicado 28 de agosto de 2012

Autor Stephen Turner

About a year ago I wrote a post about producing scatterplot matrices in R. These are handy for quickly getting a sense of the correlations that exist in your data. Recently someone asked me to pull out some relevant statistics (correlation coefficient and p-value) into tabular format to publish beside a scatterplot matrix.

BioinformaticsENCODERecommended ReadingSequencingWeb AppsBiologíaInglés

Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

https://doi.org/10.59350/xsb19-my410

Publicado 1 de agosto de 2012

Autor Stephen Turner

Recently published in Nucleic Acids Research: F. Zambelli, G. M. Prazzoli, G. Pesole, G. Pavesi, Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets., Nucleic acids research 40 , W510–5 (2012). Cscan web interface screenshot This paper presents a methodology and software implementation that allows users to discover a set of transcription factors or epigenetic modifications

BioinformaticsGgplot2RBiologíaInglés

Plotting the Frequency of Twitter Hashtag Usage Over Time with R and ggplot2

https://doi.org/10.59350/nz33d-3a519

Publicado 17 de julio de 2012

Autor Stephen Turner

The 20th annual ISMB meeting was held over the last week in Long Beach, CA. It was an incredible meeting with lots of interesting and relevant talks, and lots of folks were tweeting the conference, usually with at least a few people in each concurrent session. I wrote the code below that uses the twitteR package to pull all the tweets about the meeting under the #ISMB hashtag. You can download that raw data here.

RVisualizationBiologíaInglés

Fix Overplotting with Colored Contour Lines

https://doi.org/10.59350/3vgfs-mf860

Publicado 6 de julio de 2012

Autor Stephen Turner

I saw this plot in the supplement of a recent paper comparing microarray results to RNA-seq results. Nothing earth-shattering in the paper - you've probably seen a similar comparison many times before - but I liked how they solved the overplotting problem using heat-colored contour lines to indicate density. I asked how to reproduce this figure using R on Stack Exchange, and my question was quickly answered by Christophe Lalanne.

BioinformaticsDatabasesDbGaPGWASWeb AppsBiologíaInglés

Browsing dbGAP Results

https://doi.org/10.59350/gqs2e-8wv84

Publicado 27 de junio de 2012

Autor Stephen Turner

Thanks to the excellent work of Lucia Hindorff and colleagues at NHGRI, the GWAS catalog provides a great reference for the cumulative results of GWAS for various phenotypes. Anyone familiar with GWAS also likely knows about dbGaP – the NCBI repository for genotype-phenotype relationships – and the wealth of data it contains.

BiologíaInglés

Identifying Pathogens in Sequencing Data

https://doi.org/10.59350/208td-dr225

Publicado 21 de junio de 2012

Autor Stephen Turner

I just read an interesting paper on pathogen discovery using next-generation sequencing data, recommended to me by Nick Loman. A previously described algorithm (PathSeq, Kostic et al) for discovering microbes by deep-sequencing human tissue uses computational subtraction, whereby the initial collection of reads is depleted of human DNA by consecutive alignment to the human reference using MAQ and BLAST.

Getting Genetics Done

Copy Text to the Local Clipboard from a Remote SSH Session

RegulomeDB: Identify DNA Features and Regulatory Elements in Non-Coding Regions

STAR: ultrafast universal RNA-seq aligner

Learn R and Python, and Have Fun Doing It

DESeq vs edgeR Comparison

More on Exploring Correlations in R

Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

Plotting the Frequency of Twitter Hashtag Usage Over Time with R and ggplot2

Fix Overplotting with Colored Contour Lines

Browsing dbGAP Results

Identifying Pathogens in Sequencing Data