Rogue Scholar

GitGithubLinuxBiological Sciences

Useful Unix/Linux One-Liners for Bioinformatics

Published October 21, 2013

Author Stephen Turner

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

BioinformaticsRecommended ReadingRNA-SeqTutorialsBiological Sciences

De Novo Transcriptome Assembly with Trinity: Protocol and Videos

https://doi.org/10.59350/9yme1-bcx95

Published October 10, 2013

Author Stephen Turner

One of the clearest advantages RNA-seq has over array-based technology for studying gene expression is not needing a reference genome or a pre-existing oligo array. De novo transcriptome assembly allows you to study non-model organisms, cancer cells, or environmental metatranscriptomes.

BioinformaticsSoftwareBiological Sciences

Utility script for launching bare JAR files

https://doi.org/10.59350/y8r2z-rmk71

Published August 21, 2013

Author Stephen Turner

Torsten Seemann compiled a list of minimum standards for bioinformatics command line tools, things like printing help when no commands are specified, including version info, avoid hardcoded paths, etc. These should be obvious to any seasoned software engineer, but many of these standards are not followed in bioinformatics.

BioinformaticsDatabasesSQLTutorialsBiological Sciences

Understanding the ENSEMBL Schema

https://doi.org/10.59350/pecdt-k4c24

Published August 12, 2013

ENSEMBL is a frequently used resource for various genomics and transcriptomics tasks. The ENSEMBL website and MART tools provide easy access to their rich database, but ENSEMBL also provides flat-file downloads of their entire database and a public MySQL portal. You can access this using the MySQL Workbench using the following: Once inside, you can get a sense for what the ENSEMBL schema (or data model) is like.

RTutorialsBiological Sciences

Google Developers R Programming Video Lectures

https://doi.org/10.59350/7w827-zc641

Published August 5, 2013

Author Stephen Turner

Google Developers recognized that most developers learn R in bits and pieces, which can leave significant knowledge gaps. To help fill these gaps, they created a series of introductory R programming videos. These videos provide a solid foundation for programming tools, data manipulation, and functions in the R language and software.

BioinformaticsConferencesRSoftwareVisualizationBiological Sciences

Archival, Analysis, and Visualization of #ISMBECCB 2013 Tweets

https://doi.org/10.59350/g3wgz-sa186

Published July 24, 2013

Author Stephen Turner

As the 2013 ISMB/ECCB meeting is winding down, I archived and analyzed the 2000+ tweets from the meeting using a set of bash and R scripts I previously blogged about. The archive of all the tweets tagged #ISMBECCB from July 19-24, 2013 is and will forever remain here on Github. You'll find some R code to parse through this text and run the analyses below in the same repository, explained in more detail in my previous blog post.

BioinformaticsRSequencingTutorialsBiological Sciences

Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

https://doi.org/10.59350/zptat-w3044

Published July 12, 2013

Author Stephen Turner

At last week's 2013 useR! conference in Albacete, Spain, Martin Morgan and Marc Carlson led a course on using R/Bioconductor for analyzing next-gen sequencing data, covering alignment, RNA-seq, ChIP-seq, and sequence annotation using R. The course materials are online here, including R code for running the examples, the PDF vignette tutorial, and the course material itself as a package. Course Materials from useR!

RBiological Sciences

Customize your .Rprofile and Keep Your Workspace Clean

https://doi.org/10.59350/t4q7h-bc226

Published July 2, 2013

Author Stephen Turner

Like your .bashrc, .vimrc, or many other dotfiles you may have in your home directory, your .Rprofile is sourced every time you start an R session. On Mac and Linux, this file is usually located in ~/.Rprofile. On Windows it's buried somewhere in the R program files. Over the years I've grown and pruned my .Rprofile to set various options and define various "utility" functions I use frequently at the interactive prompt.

BioinformaticsRNA-SeqSoftwareWeb AppsBiological Sciences

ENCODE ChIP-Seq Significance Tool: Which TFs Regulate my Genes?

https://doi.org/10.59350/12a34-6vg56

Published June 7, 2013

Author Stephen Turner

I collaborate with several investigators on gene expression projects using both microarray and RNA-seq. After I show a collaborator which genes are dysregulated in a particular condition or tissue, the most common question I get is " what are the transcription factors regulating these genes? " This isn't the easiest question to answer.

BioinformaticsGWASProductivitySoftwareStatisticsBiological Sciences

PLATO, an Alternative to PLINK

https://doi.org/10.59350/54v8t-vth68

Published May 30, 2013

Since the near beginning of genome-wide association studies, the PLINK software package (developed by Shaun Purcell’s group at the Broad Institute and MGH) has been the standard for manipulating the large-scale data produced by these studies.

BioinformaticsConferencesMetagenomicsRRNA-SeqBiological Sciences

Automated Archival and Visual Analysis of Tweets Mentioning #bog13, Bioinformatics, #rstats, and Others

https://doi.org/10.59350/2fhng-b8j54

Published May 15, 2013

Author Stephen Turner

Automatically Archiving Twitter Results Ever since Twitter gamed its own API and killed off great services like IFTTT triggers, I've been looking for a way to automatically archive tweets containing certain search terms of interest to me. Twitter's built-in search is limited, and I wanted to archive interesting tweets for future reference and to start playing around with some basic text / trend analysis.

Getting Genetics Done

Useful Unix/Linux One-Liners for Bioinformatics

De Novo Transcriptome Assembly with Trinity: Protocol and Videos

Utility script for launching bare JAR files

Understanding the ENSEMBL Schema

Google Developers R Programming Video Lectures

Archival, Analysis, and Visualization of #ISMBECCB 2013 Tweets

Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

Customize your .Rprofile and Keep Your Workspace Clean

ENCODE ChIP-Seq Significance Tool: Which TFs Regulate my Genes?

PLATO, an Alternative to PLINK

Automated Archival and Visual Analysis of Tweets Mentioning #bog13, Bioinformatics, #rstats, and Others