BiologiaIngleseBlogger

Getting Genetics Done

Getting Things Done in Genetics & Bioinformatics Research
Pagina iniziale
language
AnnotationBioinformaticsGWASPLINKSQLBiologiaInglese
Pubblicato
Autore Unknown

One of the most powerful tools you can learn to use in genomics research is a relational database system, such as MySQL.  These systems are fairly easy to setup and use, and provide users the ability to organize and manipulate data and statistical results with simple commands.  As a graduate student (during the height of GWAS), this single skill quickly turned me into an “expert”.

GWASRVisualizationBiologiaInglese
Pubblicato
Autore Unknown

Manhattan plots have become the standard way to visualize results for genetic association studies, allowing the viewer to instantly see significant results in the rough context of their genomic position.  Manhattan plots are typically shown on a linear X-axis (although the circos package can be used for radial plots), and this is consistent with the linear representation of the genome in online genome browsers.

ConferencesGitRTwitterVisualizationBiologiaInglese
Pubblicato
Autore Stephen Turner

I archived and anlayzed all Tweets with the hashtag #ASHG2013 using my previously mentioned code. Number of Tweets by date shows Wednesday was the most Tweeted day: The top used hashtags other than #ASHG2013: The most prolific users: And what Twitter analysis would be complete without the widely loved, and more widely hated word cloud: Edit 8:24am : I have gotten notes that some Tweets were not captured in this archive.

PubMedBiologiaInglese
Pubblicato
Autore Stephen Turner

Several post-publication peer review forums already exist, such as Faculty of 1000 or PubPeer, that facilitate discussion of papers after they have already been published. F1000 only allows a small number of "faculty" to comment on articles, and access to read commentary requires a paid subscription. PubPeer and similar startup services lack a critical mass of participants to make such a community truly useful.

GitGithubLinuxBiologiaInglese
Pubblicato
Autore Stephen Turner

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

BioinformaticsRecommended ReadingRNA-SeqTutorialsBiologiaInglese
Pubblicato
Autore Stephen Turner

One of the clearest advantages RNA-seq has over array-based technology for studying gene expression is not needing a reference genome or a pre-existing oligo array. De novo transcriptome assembly allows you to study non-model organisms, cancer cells, or environmental metatranscriptomes.

BioinformaticsSoftwareBiologiaInglese
Pubblicato
Autore Stephen Turner

Torsten Seemann compiled a list of minimum standards for bioinformatics command line tools, things like printing help when no commands are specified, including version info, avoid hardcoded paths, etc. These should be obvious to any seasoned software engineer, but many of these standards are not followed in bioinformatics.

BioinformaticsDatabasesSQLTutorialsBiologiaInglese
Pubblicato
Autore Unknown

ENSEMBL is a frequently used resource for various genomics and transcriptomics tasks.  The ENSEMBL website and MART tools provide easy access to their rich database, but ENSEMBL also provides flat-file downloads of their entire database and a public MySQL portal.  You can access this using the MySQL Workbench using the following: Once inside, you can get a sense for what the ENSEMBL schema (or data model) is like.

RTutorialsBiologiaInglese
Pubblicato
Autore Stephen Turner

Google Developers recognized that most developers learn R in bits and pieces, which can leave significant knowledge gaps. To help fill these gaps, they created a series of introductory R programming videos. These videos provide a solid foundation for programming tools, data manipulation, and functions in the R language and software.