Biological SciencesSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Home PageRSS FeedMastodon
language
R NextflowBiological Sciences
Published

I joined Twitter1 way back in 2009. For nearly 10 years “scitwitter” was an amazing place for discussion, discovery, and engagement with the scientific community. The #Rstats and #pydata hashtags were great places to learn about something new in programming, #icanhazpdf was great for getting papers you didn’t have access to, and conference live-tweeting was common and useful for those of us with FOMO not able to make it in person.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights an AI agent for automated multi-omic analysis (AutoBA), rapid species-level metagenome profiling and containment (sylph), a review on genome-wide association analysis beyond SNPs, private information leakage from scRNA-seq count matrices, and a method to “unlearn” viral knowledge in protein language models as a means to develop safe PLM-based variant effect analysis (PROEDIT). Others that caught my attention include

TILPythonBiological Sciences
Published

In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem. If you’re primarily an R developer like me, I recently wrote about resources for getting better at Python for R users.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights protein design with RoseTTAFold, surveillance with wastewater sequencing, T2T human genomes, Vitessce for visualization of multimodal spatial single-cell data, and Taxometer for taxonomic classification of metagenomics contigs.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights a new Nextflow workflow for calculating polygenic scores with adjustments for genetic ancestry, a paper demonstrating that whole exome plus imputation on more samples is more powerful than whole genome sequencing for finding more trait associated variants, a new deep-learning-based splice site predictor that improves spliced alignments, a new method for accurate community profiling of large metagenomic datasets, and

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights a new method for gene-level alignment of single-cell trajectories, an R package for integrating gene and protein identifiers across biological sequence databases, characterization of SVs across humans and apes, universal prediction of cellular phenotypes, a method to quantify cell state heritability versus plasticity and infer cell state transition with single cell data, and a new AI-driven, natural language-oriented

R TILBiological Sciences
Published

Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis. In that post I simulated 100 million rows of a dataset and wrote to disk as CSV. I then benchmarked how long it took to read in and compute a simple grouped mean. One thing I didn’t do here was separate the time it took to read data into memory (for base R and dplyr) versus computing the actual summary.