Messaggi di Rogue Scholar

language
Altre scienze socialiInglese
Pubblicato in Aaron Tay's Musings about librarianship
Autore Aaron Tay

Introduction In my last post, I argued that Deep Search—iterative retrieval that blends keyword, semantic, and citation chasing with LLM-based relevance judgments—is the real breakthrough behind today’s “Deep Research” tools. It consistently beats one-shot embedding search in recall/precision, and in hindsight, it’s what I loved all along (the “generation” step just came bundled). The price?

OpenstreetmapCommonsPeer ProductionMappingOsmAltre scienze socialiInglese
Pubblicato in Bastian Greshake Tzovaras

Last weekend was the 21st birthday of OpenStreetMap (OSM), and with some friends we celebrated the occasion with a little mapping party. Our plan was to combine flying drones to collect aerial imagery and collecting street-level imagery with more traditional field mapping. Due to high winds, we mostly ended up with street-level imagery and doing field mapping though, using a variety of tools.

PublishingWikidataScholiaChimicaInglese
Pubblicato in chem-bla-ics

The Internet Journal of Chemistry (IJC, issn:1099-8292) was one of the first scientific journals to get published on the world wide web (part of the Internet), see doi:10.1080/00987913.2000.10764578. Issues were published from 1998 to 2004. But because it predates systematic archiving of webpages by libraries, a lot is lost.

AIGPT-5LLMsScienze naturaliInglese
Pubblicato in Chris von Csefalvay
Autore Chris von Csefalvay

I’ve spent the better part of this weekend putting OpenAI’s latest offerings through their paces - both the newly released open-weight models and GPT-5 itself. Armed with a selection of coding challenges, mathematical problems, and the sort of esoteric research queries that usually separate the wheat from the chaff, I’ve been conducting what amounts to a weekend-long torture test of these systems.

ConferencesDinoCon 2025Fictional PeoplePeople We LikeStinkin' MammalsScienze della Terra e dell'AmbienteInglese
Pubblicato in Sauropod Vertebra Picture of the Week
Autore Matt Wedel

Mike and I are working on our respective talks for DinoCon 2025 — a timely concern, since Mike presents next Saturday and I’m on next Sunday. My talk will be an adapted and upgraded version of the keynote talk I gave at the Tate Geological Museum’s Annual Summer Conference last summer.

BiologiaInglese
Pubblicato in Paired Ends

I recently wrote a piece about leaving academia for biotech. I left academia for industry in 2019. I spent four years at a consulting firm before joining Colossal Biosciences. This week I’m returning to the University of Virginia School of Data Science as a tenured associate professor and dean of research. The transition from academia to industry can be tricky, but it’s also increasingly common.

IupacBeilsteinChemblChimicaInglese
Pubblicato in chem-bla-ics

A lot is happening. If you have been following this project more closesly, you may have already seen some interesting updates, but I will post it here too. First, a quick recap. In March I started a new Blue Obelisk project to collect CCZero IUPAC names from primary literature (paper still pending). It turned out we can automate that, while legally not violating any laws or licenses.

Large Language ModelAi SearchAltre scienze socialiInglese
Pubblicato in Aaron Tay's Musings about librarianship
Autore Aaron Tay

Back in 2022, I was hyped about Retrieval-Augmented Generation (RAG).The novelty of seeing a search engine spit out a direct answer — with citations! — in tools like Elicit and Perplexity felt like the future. I even predicted that this “answers-with-citations” model could become the prominent paradigm for academic search. Three years later, that prediction has partly come true.

PapersBiologiaInglese
Pubblicato in Paired Ends

This week’s recap highlights nanoMDBG for metagenome assembly from nanopore reads, the SCassist AI-based workflow for single-cell analysis, discovery and characterization of GxE and GxG effects in a vertebrate model, the PIGEON framework for estimating gene-environment interaction for polygenic traits, and long-read alignment with multi-level parallelism.