My new blog/newsletter ("Paired Ends") is now at blog.stephenturner.us. I'll be posting semi-regular updates and literature highlights in bioinformatics, computational biology, and data science, along with the occasional post on programming.
My new blog/newsletter ("Paired Ends") is now at blog.stephenturner.us. I'll be posting semi-regular updates and literature highlights in bioinformatics, computational biology, and data science, along with the occasional post on programming.
A while back I wrote this post about how I stay current in bioinformatics & genomics. That was nearly five years ago . A lot has changed since then. A few links are dead.
The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly excellent . The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.
How many reads do I need? What's my sequencing depth? These are common questions I get all the time. Calculating how much sequence data you need to hit a target depth of coverage, or the inverse, what's the coverage depth given a set amount of sequencing, are both easy to answer with some basic algebra. Given one or the other, plus the genome size and read length/configuration, you can calculate either.
This is a guest post from VP Nagraj, a data scientist embedded within UVA’s Health Sciences Library, who runs our Data Analysis Support Hub (DASH) service. Last weekend I was fortunate enough to be able to participate in the first ever Shiny Developer Conference hosted by RStudio at Stanford University. I’ve built a handful of apps, and have taught an introductory workshop on Shiny.
A while back I showed you how to make volcano plots in base R for visualizing gene expression results. This is just one of many genome-scale plots where you might want to show all individual results but highlight or call out important results by labeling them, for example, with a gene name. But if you want to annotate lots of points, the annotations usually get so crowded that they overlap one another and become illegible.
This is a guest post from VP Nagraj, a data scientist embedded within UVA’s Health Sciences Library, who runs our Data Analysis Support Hub (DASH) service. The What GRUPO (Gauging Research University Publication Output) is a Shiny app that provides side-by-side benchmarking of American research university publication activity.
Background This tutorial shows an example of RNA-seq data analysis with DESeq2, followed by KEGG pathway analysis using GAGE. Using data from GSE37704, with processed data available on Figshare DOI: 10.6084/m9.figshare.1601975.
I work with gene lists on a nearly daily basis. Lists of genes near ChIP-seq peaks, lists of genes closest to a GWAS hit, lists of differentially expressed genes or transcripts from an RNA-seq experiment, lists of genes involved in certain pathways, etc. And lots of times I’ll need to convert these gene IDs from one identifier to another. There’s no shortage of tools to do this. I use Ensembl Biomart.
I just returned from the Genome Informatics meeting at Cold Spring Harbor. This was, hands down, the best scientific conference I've been to in years. The quality of the talks and posters was excellent, and it was great meeting in person many of the scientists and developers whose tools and software I use on a daily basis.