Biological SciencesSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Home PageRSS FeedMastodon
language
Biological Sciences
Published
Author Stephen Turner

There was a time in late 2023 to early 2024 when I and probably many others in the R community felt like R was falling woefully behind Python in tooling for development using AI and LLMs. This is no longer the case. The R community, and Posit in particular, have been on an absolute tear bringing new packages online to take advantage of all the capabilities that LLMs provide.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights polars-bio for fast and scalable and out-of-core operations on large genomic interval datasets, combining DNA and protein alignments to improve genome annotation with LiftOn, feature selection methods for scRNA-seq, STRkit for read-level genotyping of short tandem repeats using long reads and single-nucleotide variation, and nf-core/detaxizer for decontamination of human sequences in metagenomics data.

R TILBiological Sciences
Published
Author Stephen Turner

In the spirit of learning in public, I wanted an excuse to dive into Quarto to learn more about publishing formats beyond simple PDF and HTML documents. If you’re not familiar, Quarto (quarto.org) is the successor to RMarkdown, the next-generation scientific publishing system that works natively with Python, R, and OJS. If you already have RMarkdown you probably don’t have to do anything to it to get it to render with Quarto.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights compendium of human gene functions derived from evolutionary modelling from the Gene Ontology Consortium, an AI reasoning model applied to rare disease diagnosis, an agentic AI for scRNA-seq data exploration, and applying FAIR principles to scientific workflows.

R PythonBiological Sciences
Published
Author Stephen Turner

This is part 3 of a series on uv. Other posts in this series: uv, part 1: running scripts and tools uv, part 2: building and publishing packages This post Coming soon… Python and R I get the same question all the time from up and coming data scientists in training: “should I use Python or R?” My answer is always the same: it’s not Python versus R, it’s python

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights new methods in genetic epidemiology, mostly centered around genomic data sharing and privacy-preserving methods: a short commentary on genomic data sharing highlighting how new challenges complicate large-scale data sharing practices, a privacy-preserving method for QTL mapping, privacy-preserving methods for federated biobank-scale GWAS analysis, a Nextflow pipeline for polygenic score QC and construction, and new

Biological Sciences
Published
Author Stephen Turner

I get asked a lot how I have the time to read all these papers and articles I post about here. I don’t. Not all of them at least. I listen to many of them. Lately I’ve been using an app called ElevenReader , made by the popular text-to-speech service ElevenLabs. It’s free for iOS and Android.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights FLAMES for prioritizing genes at trait-associated GWAS hits, integrating protein language models and an automatic biofoundry for enhanced protein evolution, benchmarking DNA sequence models for causal regulatory variant prediction, and the doubletrouble R/Bioconductor package for identifying and classifying gene and genome duplications.

PapersBiological Sciences
Published
Author Stephen Turner

This week’s recap highlights Evo2 for variant effect analysis and genome design, a preprint showing that pretraining doesn’t necessarily increase performance on genomic foundation models, a new R package ggalign for making complex biological data visualizations with ggplot2, and an ancestral reconstruction method for ancient DNA. I also highlight a few reviews in biodiversity genomics.

AIBiological Sciences
Published
Author Stephen Turner

In a previous post I demonstrated how to set up a local LLM that you can run through either a command line interface (Ollama) or a graphical user interface (Open WebUI and others), and quickly demonstrated how to “chat with your documents” with a local model using LMStudio. In that previous post I simply attached a few documents to a one-off chat.