BiologiaIngleseSubstack

Paired Ends

Bioinformatics, computational biology, and data science updates from the field. Occasional posts on programming.
Pagina inizialeRSS ForaggioMastodon
language
AIBiologiaInglese
Pubblicato
Autore Stephen Turner

OpenAI introduced the ability to create custom GPTs back in November 2023. I wanted to try to create one of these, and in the spirit of learning in public this post describes how I made it. But first, what does it do?Gene Info Custom GPT Gene Info custom GPT The Gene Info custom GPT takes a list of human gene symbols as input.

R AIBiologiaInglese
Pubblicato
Autore Stephen Turner

Background Bluesky, atrrr, local LLMs I’ve written a few posts lately about Bluesky — first, Bluesky for Science, about Bluesky as a home for Science Twitter expats after the mass eXodus, another on using the atrrr package to expand your Bluesky network. I’ve also spent some time looking at R packages to provide an interface to Ollama.

AIBiologiaInglese
Pubblicato
Autore Stephen Turner

I had good intentions to give NaNoWriMo a try this year but didn’t get very far. Instead I gave OpenAI’s Creative Writing Coach GPT a try for a (very) short story I had in mind, inspired by my frustration trying to access closed-access research articles for a review article I’m preparing.

PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights the Evo model for sequence modeling and design, biomedical discovery with AI agents, improving bioinformatics software quality through teamwork, a new tool from Brent Pedersen and Aaron Quinlan (vcfexpress) for filtering and formatting VCFs with Lua expressions, a new paper about the NHGRI-EBI GWAS Catalog, and a review paper on designing and engineering synthetic genomes.

AIBiologiaInglese
Pubblicato
Autore Stephen Turner

A few days ago I wrote about translating R package help documentation using a local LLM (e.g. llama3.x)… …when Mick Watson commented: I was already thinking of wiring up something like this using local AI models — something to summarize podcasts, conference recordings, etc. The relatively new (as of this writing) Gemini 2.0 Flash model will do this for you for YouTube videos. But what if you wanted to do this offline using a local LLM?

TILAIBiologiaInglese
Pubblicato
Autore Stephen Turner

Last week I posted about a web app that turns a GitHub repo into a single text file for LLM-friendly input. This is great for capturing LLM-friendly text from a GitHub repo, but what about any other arbitrary website or PDF? I was catching up on Simon Willison’s newsletter reading about an app he made with Claude artifacts that uses the Jina Reader API to generate Markdown from a website. You don’t need to use the API to do this.

R AIBiologiaInglese
Pubblicato
Autore Stephen Turner

Using LLMs in R Most of the developer tooling for AI/LLM training and evaluation is Python-centric, but just over the past few months we’ve seen a surge of new tooling for AI/LLM applications for the R ecosystem. ollamar and rollama provide wrappers around the Ollama API allowing you to run LLMs locally on your machine.

PapersBiologiaInglese
Pubblicato
Autore Stephen Turner

This week’s recap highlights a new way to turn Nextflow pipelines into web apps, DRAGEN for fast and accurate variant calling, machine-guided design of cell-type-targeting cis-regulatory elements, a Nextflow pipeline for identifying and classifying protein kinases, a new language model for single cell perturbations that integrates knowledge from literature, GeneCards, etc., and a new method for scalable protein design in a relaxed sequence