Some quick notes on OCR. Revisiting my DjVu viewer experiments it really struck me how "dirty" the OCR text is. It's readable, but if we were to display the OCR text rather than the images, it would be a little offputting.
Some quick notes on OCR. Revisiting my DjVu viewer experiments it really struck me how "dirty" the OCR text is. It's readable, but if we were to display the OCR text rather than the images, it would be a little offputting.
One year ago I released BioStor, which scratched my itch regarding finding articles in the Biodiversity Heritage Library. This anniversary seems to be a good time to think about where next with this project, but also to ask whether it's been successful.
My views on TreeBASE are pretty well known. Lately I've been thinking a lot about how to "fix" TreeBASE, or indeed, move beyond it. I've made a couple of baby steps in this direction. The first step is that I've created a group for TreeBASE papers on Mendeley.
One of my pet projects is to build a "Universal Article Reader" for the iPad (or similar mobile device), so that a reader can seemlessly move between articles from different publishers, follow up citations, and get more information on entities mentioned in those articles (e.g., species, molecules, localities, etc.). I've made various toys towards this, the latest being a HTML5 clone of Nature's iPhone app.
Over the last few months I've been exploring different ways to view scientific articles on the iPad, summarised here. I've also made a few prototypes, either from scratch (such as my response to the PLoS iPad app) or using Sencha Touch (see Touching citations on the iPad). Today, it's time for something a little different. The Sencha Touch framework I used earlier is huge and wasn't easy to get my head around.
This week saw the release of two tools from the Biodiversity Heritage Library, CiteBank and the BHL-Europe portal. Both have actually been quietly around for a while, but were only publicly announced last week. In developing a new tool there are several questions to ask. Does something already exist that meets my needs? If it doesn't exist, can I build it using an existing framework, or do I need to start from scratch?
Continuing my hobby horse of linking taxonomic databases to digitised literature, I've been working for the last couple of weeks on linking names in the Australian Faunal Directory (AFD) to articles in the Biodiversity Heritage Library (BHL). AFD is a list of all animals known to occur in Australia, and it provides much of the data for the recently released Atlas of Living Australia.
One issue I'm running into with Mendeley is that it can create spurious documents, mangling my references in the process. This appears to be due to some over-zealous attempts to de-duplicate documents.
Quick notes to self on fulltext search and CouchDB. Note that links to CouchDB are local to my machine(s),and won't work unless you are me, or have a copy of the same database running on your machine). CouchDB and Lucene adds fulltext indexing to CouchDB. After a few false starts I now have this working.
David ("Paddy") Patterson, Jerry Cooper, Paul Kirk, Rich Pyle, and David Remsen have published an article in TREE entitled "Names are key to the big new biology" (doi:10.1016/j.tree.2010.09.004). The abstract states: Do we need names? Reading this (full disclosure, I was a reviewer) I can't wondering whether the assumption that names are key really needs to be challenged.