BiologieAnglaisBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Page d'accueil
language
RStatisticsBiologieAnglais
Publié
Auteur Stephen Turner

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame.

RBiologieAnglais
Publié
Auteur Stephen Turner

TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential. --- I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared for our brave new world of computing- and data-intensive science.

LinuxBiologieAnglais
Publié
Auteur Stephen Turner

GNU datamash is a command-line utility that offers simple calculations (e.g. count, sum, min, max, mean, stdev, string coalescing) as well as a rich set of statistical functions, to quickly assess information in textual input files or from a UNIX pipe.

Data ScienceRTutorialsBiologieAnglais
Publié
Auteur Stephen Turner

Data “janitor-work” The New York Times recently ran a piece on wrangling and cleaning data: “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scrubbing, tidying, or something else, the article above is worth a read (even though it implicitly denigrates the important work that your housekeeping staff does). It’s one of the few “Big Data” pieces that

BioinformaticsRRNA-SeqTutorialsBiologieAnglais
Publié
Auteur Stephen Turner

Last week I taught a three-hour introduction to R workshop for life scientists at UVA's Health Sciences Library. I broke the workshop into three sections: In the first half hour or so I presented slides giving an overview of R and why R is so awesome. During this session I emphasized reproducible research and gave a demonstration of using knitr + rmarkdown in RStudio to produce a PDF that can easily be recompiled when data updates.

BioinformaticsRTutorialsBiologieAnglais
Publié
Auteur Stephen Turner

A couple of months ago I posted about how to visualize exome coverage with bedtools and R. But if you're looking to get a basic handle on genome arithmetic, take a look at Aaron Quinlan's bedtools tutorials from the 2013 CSHL course.

GitGithubRBiologieAnglais
Publié
Auteur Stephen Turner

If you're doing any kind of scientific computing and not using version control, you're doing it wrong. The git version control system and GitHub, a web-based service for hosting and collaborating on git-controlled projects, have both become wildly popular over the last few years.

BioinformaticsRRNA-SeqVisualizationBiologieAnglais
Publié
Auteur Stephen Turner

I've been asked a few times how to make a so-called volcano plot from gene expression results. A volcano plot typically plots some measure of effect on the x-axis (typically the fold change) and the statistical significance on the y-axis (typically the -log10 of the p-value). Genes that are highly dysregulated are farther to the left and right sides, while highly significant changes appear higher on the plot.