I recently stumbled across this collection of computational biology primers in Nature Biotechnology. Many of these are old, but they're still great resources to get a fundamental understanding of the topic. Here they are in no particular order.
I recently stumbled across this collection of computational biology primers in Nature Biotechnology. Many of these are old, but they're still great resources to get a fundamental understanding of the topic. Here they are in no particular order.
I came across this awesome gist explaining how to syntax highlight code in Keynote. The same trick works for Powerpoint. Mac only. Install homebrew if you don’t have it already and brew install highlight. highlight -O rtf myfile.ext | pbcopy to highlight code to a formatted text converter in RTF output format, and copy the result to the system clipboard. Paste into Keynote or Powerpoint.
How many reads do I need? What's my sequencing depth? These are common questions I get all the time. Calculating how much sequence data you need to hit a target depth of coverage, or the inverse, what's the coverage depth given a set amount of sequencing, are both easy to answer with some basic algebra. Given one or the other, plus the genome size and read length/configuration, you can calculate either.
This is a guest post from VP Nagraj, a data scientist embedded within UVA’s Health Sciences Library, who runs our Data Analysis Support Hub (DASH) service. Last weekend I was fortunate enough to be able to participate in the first ever Shiny Developer Conference hosted by RStudio at Stanford University. I’ve built a handful of apps, and have taught an introductory workshop on Shiny.
A while back I showed you how to make volcano plots in base R for visualizing gene expression results. This is just one of many genome-scale plots where you might want to show all individual results but highlight or call out important results by labeling them, for example, with a gene name. But if you want to annotate lots of points, the annotations usually get so crowded that they overlap one another and become illegible.
This is a guest post from VP Nagraj, a data scientist embedded within UVA’s Health Sciences Library, who runs our Data Analysis Support Hub (DASH) service. The What GRUPO (Gauging Research University Publication Output) is a Shiny app that provides side-by-side benchmarking of American research university publication activity.
Background This tutorial shows an example of RNA-seq data analysis with DESeq2, followed by KEGG pathway analysis using GAGE. Using data from GSE37704, with processed data available on Figshare DOI: 10.6084/m9.figshare.1601975.
I work with gene lists on a nearly daily basis. Lists of genes near ChIP-seq peaks, lists of genes closest to a GWAS hit, lists of differentially expressed genes or transcripts from an RNA-seq experiment, lists of genes involved in certain pathways, etc. And lots of times I’ll need to convert these gene IDs from one identifier to another. There’s no shortage of tools to do this. I use Ensembl Biomart.
I just returned from the Genome Informatics meeting at Cold Spring Harbor. This was, hands down, the best scientific conference I've been to in years. The quality of the talks and posters was excellent, and it was great meeting in person many of the scientists and developers whose tools and software I use on a daily basis.
The problem I was looking for a way to compile an RMarkdown document and have the filename of the resulting PDF or HTML document contain the name of the input data that it processed. That is, if I compiled the analysis.Rmd file, where in that file it did some analysis and reporting on data001.txt, I’d want the resulting filename to look something like data001.txt.analysis.html.
I forgot where I originally found the code to do this, but I recently had to dig it out again to remind myself how to draw two different y axes on the same plot to show the values of two different features of the data. This is somewhat distinct from the typical use case of aesthetic mappings in ggplot2 where I want to have different lines/points/colors/etc. for the same feature across multiple subsets of data.