Informática y Ciencias de la InformaciónInglésBlogger

Syntaxus baccata

Syntaxus baccata
Thoughts about bibliographic metadata, programming, statistics, taxonomy, and biology.
Página de inicioFeed AtomMastodon
language
ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

This week I wanted to extend my program, with the lists of cards containing information. In the past few weeks, I made examples with topics such as conifers and zika. In the previous report I explained how I got the facts, and how other people could get them as well. But I felt it was a bit too complex, and mainly too messy.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

Last week I wanted to look into extracting more facts, and the relation between found species and compounds. This would be done by extending ami. However, it became clear there will be big improvements to ami in the future, and things like ChemicalTagger and OSCAR are planned to be implemented anyway. It's better to wait for those things to complete before extending it for my own purposes. Instead I improved the card page for future use.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

This weekly report covers the past two weeks. I blogged twice last week, and I figured that was enough. Last week I blogged about word clouds from ContentMine output. I also blogged about ctj. This week, I have combined both into interactive lists, as seen here and in the images below. List overview. From left to right: articles, and genus/genera and species that were mentioned in the articles. Search results.

ContentMineTaxonomyInformática y Ciencias de la InformaciónInglés
Publicado

Continuation of this post. I got an answer quite quickly (but after posting the previous post): The Plant List marks what species are in what genus and family, and groups families in Major Groups, e.g. gymnosperms. It also marks synonyms. With a list of conifer species and the ContentMine output, I can determine which species are not conifers, and find how they interact with each other.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

Yesterday I published a blogpost, where I talked about ctj and how and why to convert ContentMine's CProjects to JSON. At the end, I mentioned this post, where I would talk about how to use it in different programs, and with d3.js. So here we go. For starters, let's make the data about word frequencies look nice. Not readable (then we would use a table), but visually pleasing. Let's make a word cloud.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

I changed the "JSON maker" I used to convert CProjects to JSON last week to be useful for a more general use case, being more user-friendly and having more options, like grouping by species and minifying. It's now called ctj (CProject to JSON), although that name may be changed to something more clear or appropriate. The GitHub repository can be found here. ctj CProjects are the output of the ContentMine tools.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

The "small program" proved more of a challenge than it seemed. Making a program to generate the JSON (link) was fairly easy. Loop through directories, find files, loop through files, collect XML data, save all collected data as JSON in a file. It took a while, but I think I spent the most time of it setting up the logistics, i.e. a nice logger, a file system reader and an argument processor.

ContentMineTaxonomyInformática y Ciencias de la InformaciónInglés
Publicado

Recently, I tried to find out the exact taxonomy of conifers. I knew that a few years earlier, when I was actively working with it, there were a few issues on Wikipedia concerning the grouping of the main conifer families, namely Araucariaceae, Cephalotaxaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, Taxaceae, and actually the grouping of genera in families as well. Guess what changed: not much, not on Wikipedia anyway.

Citation.jsInformática y Ciencias de la InformaciónInglés
Publicado

Today I updated my GitHub repository of Citation.js. The main difference in the source files was the correction of a typo, but I updated the docs and restructured the directories as well.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

(yes, three days late) This week I wanted to catch up on all the things that had happened while I was on holiday. I have finished my introductory blogpost on ContentMine's blog, and I made this blog and transferred all the weekly reports from the GitHub Wiki to here. Next week will be more interesting.

ContentMineInformática y Ciencias de la InformaciónInglés
Publicado

Originally posted on the ContentMine blog I am Lars, and I am from the Netherlands, where I currently live. I applied to this fellowship to learn new things and combine the ContentMine with two previous projects I never got to finish, and I got really excited by the idea and the ContentMine at large.