BiologiaIngleseSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Pagina inizialeRSS ForaggioMastodon
language
PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights the Evo model for sequence modeling and design, biomedical discovery with AI agents, improving bioinformatics software quality through teamwork, a new tool from Brent Pedersen and Aaron Quinlan (vcfexpress) for filtering and formatting VCFs with Lua expressions, a new paper about the NHGRI-EBI GWAS Catalog, and a review paper on designing and engineering synthetic genomes.

AIBiologiaInglese
Pubblicato
Autore Stephen Turner

A few days ago I wrote about translating R package help documentation using a local LLM (e.g. llama3.x)… …when Mick Watson commented: I was already thinking of wiring up something like this using local AI models — something to summarize podcasts, conference recordings, etc. The relatively new (as of this writing) Gemini 2.0 Flash model will do this for you for YouTube videos. But what if you wanted to do this offline using a local LLM?

TILAIBiologiaInglese
Pubblicato
Autore Stephen Turner

Last week I posted about a web app that turns a GitHub repo into a single text file for LLM-friendly input. This is great for capturing LLM-friendly text from a GitHub repo, but what about any other arbitrary website or PDF? I was catching up on Simon Willison’s newsletter reading about an app he made with Claude artifacts that uses the Jina Reader API to generate Markdown from a website. You don’t need to use the API to do this.

R AIBiologiaInglese
Pubblicato
Autore Stephen Turner

Using LLMs in R Most of the developer tooling for AI/LLM training and evaluation is Python-centric, but just over the past few months we’ve seen a surge of new tooling for AI/LLM applications for the R ecosystem. ollamar and rollama provide wrappers around the Ollama API allowing you to run LLMs locally on your machine.

PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights a new way to turn Nextflow pipelines into web apps, DRAGEN for fast and accurate variant calling, machine-guided design of cell-type-targeting cis-regulatory elements, a Nextflow pipeline for identifying and classifying protein kinases, a new language model for single cell perturbations that integrates knowledge from literature, GeneCards, etc., and a new method for scalable protein design in a relaxed sequence

PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights the WorkflowHub registry for computational workflows, building a virtual cell with AI, a review on bioinformatics methods for prioritizing causal genetic variants in candidate regions, a benchmarking study showing deep learning methods are best for variant calling in bacterial nanopore sequencing, and a new ML model from researchers at Genentech for predicting cell-type- and condition-specific gene expression across

R TILPythonBiologiaInglese
Pubblicato
Autore Stephen Turner

In the spirit of learning in public,1 today I learned about the .keep argument in dplyr. This doesn’t add anything you can’t do with a select or transmute, but might help simplify some of your dplyr pipelines.2 In the examples below I’m using a few rows from the built-in iris dataset to demonstrate how to use the .keep argument by creating a new ratio variable that’s the ratio of the sepal length to width.

R NextflowPythonBiologiaInglese
Pubblicato
Autore Stephen Turner

It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve been in this field. Today’s ecosystem is rich with tools that make our work faster, better, enjoyable, and increasingly accessible.

PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights pangenome graph construction with nf-core/pangenome, building pangenome graphs with PGGB, benchmarking algorithms for single-cell multi-omics prediction and integration, RNA foundation models, and a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted bulk RNA-seq data.

R BiologiaInglese
Pubblicato
Autore Stephen Turner

This post is inspired by the Bluesky Network Analyzer made by @theo.io. I’m encouraging everyone I know online to join the scientific community on Bluesky. In that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network. I started following accounts of people I knew from X and from a few starter packs I came across.