Biological SciencesBloggerArchived

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Home Page
language
Published
Author Stephen Turner

Sometimes you need to run some UNIX command on a file but only want to operate on the body of the file, not the header. Create a file called body somewhere in your $PATH, make it executable, and add this to it: #!/bin/bash IFS= read -r header printf '%s\n' "$header" eval $@ Now, when you need to run something but ignore the header, use the body command first.

Published
Author Stephen Turner

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame.

Published
Author Stephen Turner

TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential. --- I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared for our brave new world of computing- and data-intensive science.

Published
Author Stephen Turner

GNU datamash is a command-line utility that offers simple calculations (e.g. count, sum, min, max, mean, stdev, string coalescing) as well as a rich set of statistical functions, to quickly assess information in textual input files or from a UNIX pipe.

Published
Author Stephen Turner

Data “janitor-work” The New York Times recently ran a piece on wrangling and cleaning data: “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scrubbing, tidying, or something else, the article above is worth a read (even though it implicitly denigrates the important work that your housekeeping staff does). It’s one of the few “Big Data” pieces that

Published
Author Stephen Turner

Last week I taught a three-hour introduction to R workshop for life scientists at UVA's Health Sciences Library. I broke the workshop into three sections: In the first half hour or so I presented slides giving an overview of R and why R is so awesome. During this session I emphasized reproducible research and gave a demonstration of using knitr + rmarkdown in RStudio to produce a PDF that can easily be recompiled when data updates.