Excel stores dates in a very odd way: a serial number of days since 1900.
Excel stores dates in a very odd way: a serial number of days since 1900.
In recent days, several signatories to the Principles on Open Scholarly Infrastructure have taken to performing self-audits of their compliance with the principles. Of course, holding oneself to account in this way is a welcome development. Without some form of self-appraisal it is not possible to know how close one is to fulfilling the goals of POSI.
As noted previously, I am vacating my martineve.com domain. To do so has been a painful process that involves changing every account that uses martin@martineve.com to a new email address. This is painful because it turned out to be about 350 accounts. Different sites categorise the email differently.
It is sometimes easy, when discussing openness, to get bogged down in the technical weeds. People often want detail and specifics: what open license should I use? Precisely how much revenue do I need to keep in reserve safely to wind-down an organization? When does advocacy become lobbying?
This is a quick note to say that, in the near future, I will be abandoning the martineve.com domain name. For quite some time now, the primary address for this blog and site has been https://eve.gd. I know that cool URLs don’t change. But I guess I am not totally cool.
I have read, with some dismay, the draft of Ithaka S+R’s most recent report. I offer here some critical remarks that I hope will allow for revision of the work, which I believe offers an insular, digital-nationalist, exclusionary vision for the future of scholarly communications. The views herein are my personal take, not those of any organization for which I work.
I have been thinking, this week, about the observability of AWS Lambda functions in API Gateway contexts. The major challenge is that Prometheus metrics pose a problem as they are pull-only (via a scraping endpoint). Prometheus metrics are stored in a temporary disk cache and then pulled off-site by Grafana etc.
LocalStack is a great cloud emulation layer. It lets you simulate interaction with AWS, which is great for writing integration tests. However, I wanted a system that, when run locally, would spin up the LocalStack server and then destroy it when done. But when running the test on GitLab CI, it will use the “service” provision of their continuous integration system and connect to that.
In my new role at Crossref I work on a series of data pipelines for research and development projects. These are resource-intensive data processing tasks that need to be executed periodically on a schedule, with good observability, but also with parallel processing capacity. Amazon’s Managed Workflows for Apache Airflow (MWAA) seems like an ideal solution for this.
I am currently conducting a research project at Crossref that requires me to build a database using large backend files (e.g. building a relational database from a 3GB XML file). We need to rebuild this monthly, so Apache Airflow seemed a good tool to run these periodic tasks. There are, however, lots of “gotchas” in this framework that can trip up a newcomer and I thought it might be helpful to document some of these.