At our most recent R user group meeting we were delighted to have presentations from Mark Lawson and Steve Hoang, both bioinformaticians at Hemoshear. All of the code used in both demos is in our Meetup’s GitHub repo.
At our most recent R user group meeting we were delighted to have presentations from Mark Lawson and Steve Hoang, both bioinformaticians at Hemoshear. All of the code used in both demos is in our Meetup’s GitHub repo.
For over 15 years, members of the computer science, machine learning, and data mining communities have gathered in a beautiful European location each spring to share ideas about biologically-inspired computation. Stemming from the work of John Holland who pioneered the field of genetic algorithms, multiple approaches have been developed that exploit the dynamics of natural systems to solve computational problems.
In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different ethnic axes . Price et al. published on this in 2006, and since then PCA plots are a common component of many published GWAS studies.
The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative publication reference. The list was voted on by other IEEE and ACM award winners to narrow this down to a top 10 list.
Revolutions blog recently posted a link to R code by Joshua Reich with self-contained examples of using machine learning techniques in R, including various clustering methods (k-means, nearest neighbor, and kernel), recursive partitioning (CART), principle components analysis, linear discriminant analysis, and support vector machines. This post also links to some slides that go over the basics of machine learning.
Hierarchical clustering is a technique for grouping samples/data points into categories and subcategories based on a similarity measure. Being the powerful statistical package it is, R has several routines for doing hierarchical clustering. The basic command for doing HC is Nearly all clustering approaches use a concept of distance.