Your Rprofile is a script that R executes every time you launch an R session.
Your Rprofile is a script that R executes every time you launch an R session.
I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (@genetics_blog) over the last two weeks. Here is a nice tutorial on accessing high-throughput public data (from NCBI) using R and Bioconductor. Cloudnumbers.com , a startup that allows you to run high-performance computing (HPC) applications in the cloud, now supports the previously mentioned R IDE, RStudio.
I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below. Try it out on the built in iris dataset.
Sequencing company Complete Genomics recently made available 69 ethnically diverse complete human genome sequences: a Yoruba trio; a Puerto Rican trio; a 17-member, 3-generation pedigree; and a diversity panel representing 9 different populations. Some of the samples partially overlap with HapMap and the 1000 Genomes Project. The data can be downloaded directly from the FTP site.
There are several tools available for conducting a post-hoc analysis of GWAS data looking for enrichment of significant SNPs using literature or pathway based resources. Examples include GRAIL, ALLIGATOR, and WebGestalt among others (see SNPath R Package). Since gene enrichment and pathway analysis essentially evolved from methods for analyzing gene expression data, many of these tools require specific gene identifiers as input.
Nucleic Acids Research just published its Web Server Issue, featuring new and updates to existing web servers and applications for genomics and proteomics research. In case you missed it, be sure to check out the Database Issue that came out earlier this year. This web server issue has lots of papers on tools for microRNA analysis, and protein/RNA secondary structure analysis and annotation.
I wanted to contribute any content and code I post here to the R Programming Wikibook so I made a slight change to the Creative Commons license on this blog. All the written content is now cc-by-sa and all the code here is still open source BSD.
Genome-wide association studies have produced a wealth of new genetic associations to numerous traits over the last few years. As such, new studies of these phenotypes often attempt to replicate previous associations in their samples, or examine how the effects of these SNPs are altered by environmental factors or clinical subtypes.
DNA genotyping and sequencing are getting cheaper every day. As Oxford Nanopore CTO Clive Brown recently discussed at Genomes Unzipped, when the cost of a full DNA sequence begins to fall below $1000, the value of having that information far outweighs the cost of data generation.
Agilent Technologies is fostering integrated, whole-systems approaches to biological research through two $75,000 grants. The application deadline is August 12, 2011. Funds will support academic or nonprofit research projects covering the development of open source Agilent-compatible software tools for integrating data from different omics platforms—genomics, transcriptomics, proteomics, and metabolomics.
I just read a helpful paper on pathway analysis and interactome reconstruction: Tieri, P., Fuente, A. D., Termanini, A., & Franceschi, C. (2011). Integrating Omics Data for Signaling Pathways, Interactome Reconstruction, and Functional Analysis. In Bioinformatics for Omics Data, Methods in Molecular Biology, vol.