Data science, AI, bioinformatics, and computational biology updates from the field. Occasional posts on programming.

WIRED published a special issue with 17 readings from the furthest reaches of the AI age

WIRED: AI of a Thousand Faces

Over 100 recorded talks from posit::conf(2025) are now available on YouTube

posit::conf(2025) talks on YouTube

59 new software packages, updates for genomics, single-cell transcriptomics, spatial omics, QC tooling, core I/O, GFF/CIGAR handling, much more.

Bioconductor 3.22 Released

The most recent release of Positron (2025.10.0 build 199) has a few updates to the data explorer. Among other things, the actions you take in the data explorer like filtering and sorting can be converted to code: dplyr code for data frames, and SQL for DuckDB datasets.

Positron: data explorer exports dplyr code

In cybersecurity, defenders have to get everything right. Cyber defense must patch every vulnerability, secure every endpoint, anticipate every exploit. Attackers, by contrast, only need to find one overlooked flaw. That imbalance, which has shaped decades of cyber conflict, has a biological analogue.

Asymmetries in Biosecurity

Over the weekend I stumbled across an interesting conference paper:

 Evaluating and Improving Navigability of Wikipedia

. The article shows if you follow the first link in the main text of an English Wikipedia article, and then repeat the process for subsequent articles, it’ll lead you to Philosophy

 97% of the time.

I had my doubts, so I gave this a try with a few topics close to me. From

 Data science

All Roads Lead to Philosophy

Today I discovered the constructive package and the construct() function for creating R objects with idiomatic R code to make human-readable reproducible examples.

 CRAN:

https://cran.r-project.org/package=constructive

 Source:

https://github.com/cynkra/constructive/

 Docs &amp;

Construct objects with idiomatic R code

Today I learned you can create and customize ggplot2 visualizations using your voice alone.

Voice control ggplot2 with ggbot2

Happy Friday, colleagues. September has gone by at warp speed.

Weekly recap (Sep 26, 2025)

Last night we had our first event in the newly (re-)launched Charlottesville R Users (CRU) group. We had about 30 or so attendees — about half from academia and half from industry, with a few from local government organizations.

Charlottesville R Users

Happy Friday, friends. This is my regular attempt to close out my browser tabs I’ve accumulated over the past week with blog posts, podcasts, papers, etc. in data science, genomics, public health, programming, scicomm, and other miscellany. Enjoy!

Subscribe now A new study in the European Heart Journal on Accelerated vascular ageing after COVID-19 infection, and Eric Topol’s coverage on Ground Truths, COVID and our arteries.

Closing my tabs (Aug 22, 2025)

Happy Friday, colleagues. Lots going on this week. Once again I’m going through my long list of idle browser tabs trying to catch up where I can. ElevenReader (no affiliation) has been helpful to catch up this week!

Subscribe now The Arc Institute reported the first ever viable genomes with genome language models.

Weekly recap (Sep 19, 2025)

I recently came across this wonderful position paper by Olivia Guest and colleagues, where they pick apart the tech industry’s marketing, hype, &amp; harm, arguing for safeguarding higher education, critical thinking, expertise, academic freedom, &amp; scientific integrity.

 Guest, O., et al. Against the Uncritical Adoption of 'AI' Technologies in Academia. Zenodo, 5 Sept.

70 years of AI hype

I’ve spent years writing bioinformatics tools (I’ve spent the last 6 years in industry and 90% of these tools I’ll never publish or open-source) and building infectious disease forecasting models (most of these

 are

open-source, like

 FOCUS

for COVID-19 [paper, code],

 FIPHDE

for influenza [paper, code],

 PLANES

for forecast plausibility analysis [paper, code, blog post]). Anyone who has

Single-cell analysis and infectious disease forecasting: Google's new AI scientist

Happy Friday, colleagues. Somehow it’s September (I did not approve of this). Lots going on this week, and this is my regular attempt to close out my browser tabs I’ve accumulated over the past week with blog posts, podcasts, papers, etc. in AI, data science, genomics, public health, programming, scicomm, and other miscellany.

Closing my tabs (Sep 5 2025)

This week’s recap highlights analysis of human de novo mutation rates from a four-generation pedigree reference, how LLMs internalize scientific literature and citation practices, the py_ped_sim forward pedigree and genetic simulator for complex family pedigree analysis, and a review on predicting gene expression from DNA sequence using deep learning models like Enformer and Borzoi.

Weekly Recap (Aug 2025, part 3)

This week’s recap highlights Variant-EFFECTS for rewriting regulatory DNA to dissect and reprogram gene expression, zero-shot evaluation revealing the limitations of single-cell foundation models, EcoWeaver for large-scale prediction of gene functional associations from coevolutionary signals, and how assemblies of long-read metagenomes suffer from diverse errors.

Weekly Recap (Aug 2025, Part 1)

Weekly Recap (Nov 21, 2025)

I recently wrote a piece about leaving academia for biotech. I left academia for industry in 2019. I spent four years at a consulting firm before joining Colossal Biosciences.

 This week I’m returning to the University of Virginia School of Data Science as a tenured associate professor and dean of research.

The transition from academia to industry can be tricky, but it’s also increasingly common.

Moving from biotech to academia

I’ve written a lot about Ollama here. Ollama lets you run open-weight models like Llama, Gemma, Mistral, Qwen, DeepSeek, etc. on your own computer. You don’t have to pay for a frontier model like ChatGPT, Claude, or Gemini, and all the inputs and outputs stay on your computer, minimizing any privacy and security concerns. Until recently Ollama was a command-line only tool.

Ollama now has a GUI

I liked Steve Krouse’s essay, “Vibe code is legacy code.” It helped crystalize some half-baked thoughts I have on vibe coding. Here’s an excerpt.Subscribe now

 Maintainability and vibe are inversely correlated

I’ve been using GitHub copilot and chatbots for code for years, and I’ve written about them a lot here.

Vibe code is legacy code

I covered

 Autocycler

(paper, code, docs) in last week’s recap: From the abstract: Here’s a schematic of the workflow: And some benchmarks:Subscribe now

 Demo

I wanted to try this tool out myself. I followed the demo dataset described in the Autocycler docs, which contains ONT reads from a few

 E. coli

plasmids, and mostly used the same code provided in the docs to run Autocycler on this data.

Autocycler: long read consensus assembly

I’m thrilled to share the publication of our new paper published today in

 Nature Reviews Biodiversity

: You can read the paper (free) here: https://rdcu.be/ewG5R.Read the paper (free) This Perspective paper was a global collaboration between Colossal Biosciences, the University of East Anglia, the Globe institute at the University of Copenhagen, the Mauritian Wildlife Foundation, Durrell Wildlife Conservation Trust, the government of

Genome engineering in biodiversity conservation and restoration

I started my academic career in biomedical research as faculty in the University of Virginia (UVA) School of Medicine. After eight years I jumped to industry &amp;

Returning to Academia: UVA Data Science

Note:
 
 After I wrote this post last week, the Tidyverse team released ragnar 0.2.0 on July 12. Everything here should still work, but take a look at the release notes to learn about some nice new features that aren’t covered here.

I’ve written a little about retrieval-augmented generation (RAG) here before.

Tidy RAG in R with ragnar

This week’s recap highlights the Rust-based wgatools for manipulating alignments and visualizing in the terminal, the nf-core scnanoseq Nextflow pipeline for ONT scRNA-seq, sawfish for better SV discovery and genotyping with long reads, the BINSEQ high-performance binary formats for nucleotide sequence data, and a unified analysis of atlas single-cell data.

Weekly Recap (July 2025, part 1)

karyoploteR

is an R package that’s been in Bioconductor for nearly a decade. It lets you create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them.

 Bioconductor

: https://bioconductor.org/packages/karyoploteR/

 Tutorial

: https://bernatgel.github.io/karyoploter_tutorial/

 Paper

: Bernat Gel &amp;

karyoploteR

is an R package that’s been in Bioconductor for nearly a decade. It lets you create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them.

 Bioconductor

: https://bioconductor.org/packages/karyoploteR/

 Tutorial

: https://bernatgel.github.io/karyoploter_tutorial/

 Paper

: Bernat Gel &amp; Eduard Serra. (2017).

Plot Data Along a Genome with karyoploteR

I originally wrote and published this essay at The Connected Ideas Project, an excellent newsletter by my good friend and colleague Alexander Titus. If you’re not reading TCIP you’re missing out.

After I finished my postdoc I was faculty in academia for eight years before moving to a consulting firm for five years, then joined a biotech startup two years ago.

Moving from academia to biotech

There was a time in late 2023 to early 2024 when I and probably many others in the R community felt like R was falling woefully behind Python in tooling for development using AI and LLMs. This is no longer the case. The R community, and Posit in particular, have been on an absolute tear bringing new packages online to take advantage of all the capabilities that LLMs provide.

The Modern R Stack for Production AI

This week’s recap highlights the new Datavzrd tool for interactive visualization and communication of tabular data (I’m genuinely really looking forward to trying this one), tracing the shared foundations of gene expression and chromatin structure, PISA for visualizing cis-regulatory rules in genomic data, fast protein structure searching using structure graph embeddings, and a review/perspective on intrinsically disordered regions as

Weekly Recap (June 2025, part 2)

This is part 4 of a series on uv. Other posts in this series:


 uv, part 1: running scripts and tools


 uv, part 2: building and publishing packages


 uv, part 3: Python in R with reticulate

I’ve never been a big fan of notebooks, and I’m not the only one. Out of order code execution, hidden state, difficulty diffing in version control, output bloat, etc.

uv, part 4: uv with Jupyter

One of my previous employers was a Google Cloud partner, which gave me full and free access to all of Google Cloud’s certification programs, where I took the Professional Cloud Architect and Professional Data Engineer programs. It shouldn’t surprise anyone that with Google leaning hard into GenAI that they have new certification programs and learning paths, like this Generative AI Leader certification.

Free AI courses from Google

This week’s recap highlights polars-bio for fast and scalable and out-of-core operations on large genomic interval datasets, combining DNA and protein alignments to improve genome annotation with LiftOn, feature selection methods for scRNA-seq, STRkit for read-level genotyping of short tandem repeats using long reads and single-nucleotide variation, and nf-core/detaxizer for decontamination of human sequences in metagenomics data.

Weekly Recap (May 2025, part 3)

In the spirit of learning in public, I wanted an excuse to dive into Quarto to learn more about publishing formats beyond simple PDF and HTML documents. If you’re not familiar, Quarto (quarto.org) is the successor to RMarkdown, the next-generation scientific publishing system that works natively with Python, R, and OJS. If you already have RMarkdown you probably don’t have to do

 anything

to it to get it to render with Quarto.

Writing a book with Quarto

This week’s recap highlights FLAMES for prioritizing genes at trait-associated GWAS hits, integrating protein language models and an automatic biofoundry for enhanced protein evolution, benchmarking DNA sequence models for causal regulatory variant prediction, and the doubletrouble R/Bioconductor package for identifying and classifying gene and genome duplications.

Weekly Recap (April 2025, part 2)

This week’s recap highlights Evo2 for variant effect analysis and genome design, a preprint showing that pretraining doesn’t necessarily increase performance on genomic foundation models, a new R package ggalign for making complex biological data visualizations with ggplot2, and an ancestral reconstruction method for ancient DNA. I also highlight a few reviews in biodiversity genomics.

Weekly Recap (April 2025, part 1)

Last year I wrote a post describing an R package I put together that fetches recent bioRxiv preprints from a given subject and summarizes them in a couple of sentences using a local LLM running through Ollama:  That tool has a limitation in that it’s using the bioRxiv RSS feed to pull recent paper titles and abstracts, and the RSS feeds currently only provide the 30 most recent preprints in each subject area.

Exploring the bioRxiv API with R, httr2, rvest, tidytext, and Datawrapper

Anyone reading this newsletter has surely used the frontier models like ChatGPT, Claude, and Gemini. I’ve written a few posts about using

 local

models but haven’t really talked much about the tools I use to directly interact with these models. Those previous posts interact with local models using tools like ellmer in R or my own biorecap package which interacts with a locally running Ollama server.

GUIs for Local LLMs with RAG

This is part 1 of a series on uv. Other posts in this series:


 This post


 uv, part 2: building and publishing packages


 uv, part 3: Python in R with reticulate


 uv, part 4: uv with Jupyter

Lately I’ve heard a lot great things about uv, an extremely fast Python package and project manager, written in Rust.

uv, part 1: running scripts and tools

It’s been a few weeks since I wrote a recap about what I’m reading. It’s been difficult watching helplessly as the institutions and financial infrastructure underpinning my profession are being systematically and irreversibly dismantled, with brilliant scientists I know personally having their careers destroyed and lives upturned.

Weekly Recap (Feb 2025, part 2)

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights a browser application for visualizing pathogen dispersal, a DNA language model evaluation benchmark on regulatory DNA, regularized ensemble polygenic risk prediction with GWAS summary statistics, multimodal analysis of RNA-seq data for complex trait genetics, and a deep dive on blastp’s E-value.

Weekly Recap (Jan 2025 part 3)

Something a little different for this week’s recap. I’ve been thinking a lot lately about the practice of data science education in this era of widely available (and really good!) LLMs for code. Commentary at the top based on my own data science teaching experience, with a deep dive into a few recent papers below.

AI in data science education

The majority of developers use LLMs to help write code, present company included. When I’m working in languages I know well, they're fantastic at handling the grunt work: generating boilerplate, suggesting completions, and writing tedious tests and documentation.

Write code in unfamiliar territory with AI

OpenAI introduced the ability to create custom GPTs back in November 2023. I wanted to try to create one of these, and in the spirit of learning in public this post describes how I made it. But first, what does it do?Gene Info Custom GPT

 Gene Info custom GPT

The Gene Info custom GPT takes a list of human gene symbols as input.

Gene Info Custom GPT

Background


 Bluesky, atrrr, local LLMs

I’ve written a few posts lately about Bluesky — first, Bluesky for Science, about Bluesky as a home for Science Twitter expats after the mass eXodus, another on using the atrrr package to expand your Bluesky network. I’ve also spent some time looking at R packages to provide an interface to Ollama.

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

The Baader–Meinhof phenomenon (aka the frequency illusion) is the name for that thing that happens when you buy a new car, and suddenly you notice that same model car everywhere you drive.

What I'm reading: de-extinction edition

A few days ago I wrote about translating R package help documentation using a local LLM (e.g. llama3.x)… …when Mick Watson commented: I was already thinking of wiring up something like this using local AI models — something to summarize podcasts, conference recordings, etc. The relatively new (as of this writing) Gemini 2.0 Flash model will do this for you for YouTube videos. But what if you wanted to do this offline using a local LLM?

Video to audio to transcript to summary using local AI: whisperfile and llama3.3

I had good intentions to give NaNoWriMo a try this year but didn’t get very far. Instead I gave OpenAI’s Creative Writing Coach GPT a try for a (very) short story I had in mind, inspired by my frustration trying to access closed-access research articles for a review article I’m preparing.

The Enlightenment Conservatory

This week’s recap highlights a new way to turn Nextflow pipelines into web apps, DRAGEN for fast and accurate variant calling, machine-guided design of cell-type-targeting cis-regulatory elements, a Nextflow pipeline for identifying and classifying protein kinases, a new language model for single cell perturbations that integrates knowledge from literature, GeneCards, etc., and a new method for scalable protein design in a relaxed sequence

Weekly Recap (Dec 2024, part 2)

This week’s recap highlights the WorkflowHub registry for computational workflows, building a virtual cell with AI, a review on bioinformatics methods for prioritizing causal genetic variants in candidate regions, a benchmarking study showing deep learning methods are best for variant calling in bacterial nanopore sequencing, and a new ML model from researchers at Genentech for predicting cell-type- and condition-specific gene expression across

Weekly Recap (Dec 2024, part 1)

Happy Friday, colleagues. It’s the end of the week and once again I’m going through my long list of idle browser tabs trying to catch up where I can. Lots of R and AI-related news this week.

Subscribe now

 Strengthening nucleic acid biosecurity screening against generative protein design tools.

This was a really cool paper published in

 Science

this week from a team at Microsoft, Battelle, IDT, Twist, and others.

Weekly recap (Oct 10, 2025)

In the spirit of learning in public,1 today I learned about the .keep argument in dplyr. This doesn’t add anything you can’t do with a select or transmute, but might help simplify some of your dplyr pipelines.2 In the examples below I’m using a few rows from the built-in iris dataset to demonstrate how to use the .keep argument by creating a new ratio variable that’s the ratio of the sepal length to width.

TIL: dplyr::mutate()'s .keep argument

This week’s recap highlights pangenome graph construction with nf-core/pangenome, building pangenome graphs with PGGB, benchmarking algorithms for single-cell multi-omics prediction and integration, RNA foundation models, and a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted bulk RNA-seq data.

Weekly Recap (Nov 2024, part 3)

This week’s recap highlights an AI agent for automated multi-omic analysis (AutoBA), rapid species-level metagenome profiling and containment (sylph), a review on genome-wide association analysis beyond SNPs, private information leakage from scRNA-seq count matrices, and a method to “unlearn” viral knowledge in protein language models as a means to develop safe PLM-based variant effect analysis (PROEDIT).  Others that caught my attention include

Weekly Recap (Nov 2024, part 2)

I just returned from a week in Barcelona where I attended the Nextflow Summit and nf-core hackathon, and I can hardly contain my excitement for the near term future of bioinformatics, computational biology, and open science in general.

Nextflow Summit Barcelona 2024

I'm not going to listen to opinions about AI from people who don't use AI.

Quoting Kevin Roose (NYT / Hard Fork)

This week’s recap highlights protein design with RoseTTAFold, surveillance with wastewater sequencing, T2T human genomes, Vitessce for visualization of multimodal spatial single-cell data, and Taxometer for taxonomic classification of metagenomics contigs.

Weekly Recap (Oct 2024, part 4)

A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others. A quick Google Trends analysis shows that this search query has grown steadily over the last decade. Any real data scientist would agree that this argument is silly, that the right answer is to use the best tool for the job. What’s “best” isn’t always easy to answer.

Python for R users

TL;DR:
 
 Codeium offers a free Copilot-like experience in Positron. You can install it from the Open VSX registry directly within the extensions pane in Positron.

AI code completion in Positron

This week’s recap highlights a new method for gene-level alignment of single-cell trajectories, an R package for integrating gene and protein identifiers across biological sequence databases, characterization of SVs across humans and apes, universal prediction of cellular phenotypes, a method to quantify cell state heritability versus plasticity and infer cell state transition with single cell data, and a new AI-driven, natural language-oriented

Weekly Recap (Oct 2024, part 2)

Yesterday I wrote about base R vs. dplyr vs. duckdb for a simple summary analysis. In that post I simulated 100 million rows of a dataset and wrote to disk as CSV. I then benchmarked how long it took to read in and compute a simple grouped mean. One thing I

 didn’t

do here was separate the time it took to read data into memory (for base R and dplyr) versus computing the actual summary.

Use nanoparquet instead of readr/CSV

TL;DR
 
 : For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB's analytical query processing techniques in a dplyr-compatible API. Learn more at duckdb.org/docs/api/r and duckplyr.tidyverse.org.

I wanted to see for myself what the fuss was about with DuckDB.

DuckDB vs dplyr vs base R

This week’s recap highlights a new multispecies codon optimization method, personalized pangenome references with vg, a commentary on the wild west of spike-in normalization, a new pipeline for comprehensive and scalable polygenic scoring across ancestrally diverse populations, a paper showing deep learning / transformer-based methods don’t outperform simple linear models for predicting gene expression after genetic perturbations, and finally, a

Weekly Recap (Oct 2024, part 1)

This week’s recap highlights a Nextflow pipeline for eQTL detection, an end-to-end pipeline for spatial transcriptomics (visium) data analysis, a method for identification of perturbed cell types in single cell RNA-seq data, a method for guide assignment in single-cell CRISPR screens, a tool for on-target/off-target analysis of gene editing outcomes, and “digital microbes” for collaborative team science on emerging microbes.

Weekly Recap (Sep 2024, part 5)

Last month I published a paper and an R package for summarizing preprints from bioRxiv using a local LLM. I wrote about it here: Llama 3.2 was just released today (announcement). The biggest news is the addition of a multimodal vision model, but I was intrigued by the reasonably good performance of the tiny 3B text model. I used this as an excuse to update the biorecap R package.

Llama 3.2 summaries of bioRxiv and medRxiv preprints with biorecap

Cite:

Stephen Turner. “Learning in Public.”

 Paired Ends

(2024). DOI: https://doi.org/10.59350/xwgsf-nj906. I wrote my first public blog post in 2009. I started Getting Genetics Done to share what I was learning at the end of my PhD/postdoc through my first few years as faculty.

Learning in Public

This week’s recap highlights a new nf-core workflow for multi-omics trait association studies, a new tool for linking genotype to phenotype (G2P) by directly sequencing alleles from CRISPR base editing experiments, the SplitsTree app for interactive analysis and visualization using phylogenetic trees and networks, mapping cellular interactions from spatially resolved transcriptomics data, a study of marine microbial diversity and bioprospecting

Weekly Recap (Sep 2024, part 4)

Update March 2025
 
 : The preprint described in this paper is now peer-reviewed and published in PLoS ONE.


 
 VP (Pete) Nagraj
 
 is a long time friend, colleague, and collaborator, and is the author of this post. Pete and I have co-authored over a dozen publications, and have taught several graduate courses in data science together.

PLANES: Plausibility Analysis of Epidemiological Signals

This week’s recap highlights a new tool from Wei Shen and Zamin Iqbal for efficient sequence alignment against millions of prokaryotic genomes (LexicMap), a new tool from Heng Li for efficiently constructing and querying a sequence index at scale, an R/Bioconductor package for detecting and correcting DNA contamination in RNA-seq data, a method for dating gene age using synteny, how AlphaFold predictions for some types of conformations are

Weekly Recap (Sep 2024, part 3)

It’s been a big week in the genomics+bioinformatics space. This post expands on a few of the recent papers I posted in a Twitter thread. I highlight a few in the deep dive at the top, then link to a few other papers of note later on. Subscribe (free!) to Paired Ends to get summaries like this delivered to your e-mail as soon as I write them.

What I'm reading (Aug 2024, part 1)

Here are a few papers that caught my attention recently. I summarize a few in the deep dive at the top, then link to a few other papers of note later on. Subscribe to Paired Ends to get periodic summaries like this delivered to your e-mail.

What I'm reading (Aug 2024, part 4)

I’ve been using the llama3.1:70b model just released by Meta using Ollama running on my MacBook Pro. Ollama makes it easy to talk to a locally running LLM in the terminal (ollama run llama3.1:70b) or via a familiar GUI with the open-webui Docker container. Here I’ll demonstrate using the ollamar package on CRAN to talk to an LLM running locally on my Mac.

Use R to prompt a local LLM with ollamar

What I'm reading (Aug 2024, part 2)

This post is about the R package development experience with Positron, the new IDE from Posit based on VS Code. This is not a tutorial on R package development in general — there are great resources for that elsewhere. Read on.Subscribe to Paired Ends to get future posts like this delivered to your e-mail.

 RStudio, VS Code, and Positron

Back in 2011 I wrote a blog post about a relatively new IDE for R called RStudio.

R package development in Positron

This post expands on a few of the papers I posted in this Twitter thread. I highlight a few in the deep dive at the top, then link to a few other papers of note later on. Subscribe to Paired Ends to get summaries like this in your e-mail as soon as I write them.

What I'm reading (July 2024)

NSF reorg, Science in 2050, 2025 LLM recap, R+Python, R Data Scientist, virtual cells, genomics in 2026, Claude Code course, AI and labor, how uv got so fast, Anthropic/biotech, papers+preprints

Weekly Recap (January 2, 2026)

Happy Friday, colleagues. It’s the end of the week and once again I’m going through my long list of idle browser tabs trying to catch up where I can. Lots of R and AI-related news this week.

Subscribe now Emil Hvitfeldt:

 Slidecrafting (slidecrafting-book.com)

. This is a really wonderful one-stop shop for tips on making beautiful slides with reveal.js and Quarto.

Weekly recap (Oct 3, 2025)

R + AI, uv, RAG+Zotero, Quarto books, Codex in Positron, Positron assistant &amp;

Paired Ends Wrapped: Top 10 Posts From 2025

29 talks from the 2025 Nextflow Summit are now available on YouTube

Nextflow Summit 2025 Talks on YouTube

Not behind, but at the forefront: On feeling overwhelmed by AI progress, and why that means you're exactly where you need to be.

My First Look at Claude Code

I'm still catching up on papers from my late 2024 backlog. This week’s recap highlights autonomous microbial sensors for detecting TNT in soil, genome size estimation from long reads, STABIX for indexing and compressing GWAS summary statistics, and Clair3-RNA for deep learning-based small variant calling on long-read RNA-seq data.

Weekly Recap (Jan 2025 part 2)

AIxBio is here: Navigating the pacing problem and the future of global biosecurity

Biotechnology and AI: Technological Convergence and Information Hazards (Part 1)

An AI Manhattan Project for Science and Biotech

The Genesis Mission

R updates (R Data Scientist, R+AI conference, R weekly), Claude 4.5 Opus, Genesis Mission, AI+science, AI updates (Posit, AI Data Scientist), missing heritability, AI+edu, biotech, AIxBio, new papers

Weekly Recap (Nov 26, 2025)

A laboratory safety benchmark finds retrieval augmented generation (RAG) can make strong models worse.

Contextual Distraction: RAG isn't a Seatbelt

This week’s recap highlights compendium of human gene functions derived from evolutionary modelling from the Gene Ontology Consortium, an AI reasoning model applied to rare disease diagnosis, an agentic AI for scRNA-seq data exploration, and applying FAIR principles to scientific workflows.

Weekly Recap (May 2025, part 2)

I recommend subscribing to Claus Wilke’s newsletter, Genes, Minds, Machines. I’ve linked to many of his essays in recent weeks. This one was a good read. Now that I’m back in academia and will be taking on Ph.D. students in my lab, this was good advice for me to read, as a future mentor.

Most graduate students propose to do too much

If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be scattered across many files throughout a repo.

Turn a GitHub repo into a single text file for LLM-friendly input

A new study in Nature finds that AI adoption correlates with faster career ascent and higher citation impact, while the semantic spread of entire fields subtly contracts around fewer topical regions.

AI Amplifies Careers and Compresses Fields

If you read Paired Ends because you care about how biology, technology, and society co-evolve, this story was written for you.

Synthetic Eden: The Tools That Save Elephants and Reshape Us

I am in the middle of writing a review / perspectives paper. One that I’m confident will be exciting once we get it published. Some sections of the review cover subject matter at the outer periphery of my expertise.

Inciteful+Zotero to find relevant literature

It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve been in this field. Today’s ecosystem is rich with tools that make our work faster, better, enjoyable, and increasingly accessible.

Tech I'm thankful for (2024)

This week’s recap highlights biobank-scale relatedness estimation, SNP calling and haplotype phasing with long RNA-seq reads, predicting expression-altering promoter mutations with deep learning, and cross-species filtering for reducing alignment bias in comparative genomics studies.

Weekly Recap (Sep 2025 part 1)

This post expands on a few of the papers I posted in this Twitter thread. I highlight a few in the deep dive at the top, then link to a few other papers of note later on. Subscribe to Paired Ends to get summaries like this delivered to your e-mail as soon as I write them. Subscribe now You might remember me from Getting Genetics Done where I blogged about genetics, statistics, and bioinformatics from 2008-2017.

What I'm reading (July 2024, part 2)

This week’s recap highlights nanoMDBG for metagenome assembly from nanopore reads, the SCassist AI-based workflow for single-cell analysis, discovery and characterization of GxE and GxG effects in a vertebrate model, the PIGEON framework for estimating gene-environment interaction for polygenic traits, and long-read alignment with multi-level parallelism.

Weekly Recap (Aug 2025, part 2)

Thoughts on Ronald Purser's December 2025 essay in Current Affairs, "AI is Destroying the University and Learning Itself"

The False Choice Between Meaning and Accountability in Higher Education

Whether this is your first conference talk or your fiftieth, we’re looking for speakers from a variety of backgrounds and experience levels

Call for Proposals Now Open: 2026 Applied Machine Learning Conference

This post is inspired by the Bluesky Network Analyzer made by @theo.io.

I’m encouraging everyone I know online to join the scientific community on Bluesky. In that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network. I started following accounts of people I knew from X and from a few starter packs I came across.

Expand your Bluesky network with R

This week's recap highlights a new pipeline for metagenome quality assessment and taxonomic annotation (MAGFlow &amp;

Weekly Recap (Nov 2024, part 1)

Resignations at Anthropic's safeguards research team and xAI, Opus 4.6 evaluation awareness, Bytedance AI video generation, US doesn't back Global AI Safety Report, "heinous crimes" and $1T selloffs

Paired Ends

One week in AI safety

Diagrams in Claude with Excalidraw and draw.io

Tiered Access for AIxBio Governance

AI-Enabled Biological Design and the Risks of Synthetic Biology

Weekly Recap (February 6, 2026)

Ten simple rules for teaching data science

OpenAI Codex App

Biological risk and the AI capability shift

Tacit Knowledge and Biosecurity

How AI assistance impacts the formation of coding skills

Weekly Recap (January 30, 2026)