Scienze naturaliIngleseHugo

Donny Winston

Donny Winston
Made as simple as possible, but not simpler.
Pagina inizialeAtom ForaggioMastodon
language
Scienze naturaliInglese
Pubblicato

In response to this note, a reader asked I re-read the Stonebraker whitepaper I had linked to in that note, and what it describes does not seem meaningfully different than data fusion. I propose that data unification is more conservative than data fusion – it stops short of the lossy reduction often required for decision-support systems.

Scienze naturaliInglese
Pubblicato

Unification is a process of combining partial-information structures. First used in computing for theorem proving, 1 it is used widely for type inference in programming-language compilers and for logic-programming systems. Data unification is described well in this whitepaper by Stonebraker.

Scienze naturaliInglese
Pubblicato

Do you repeatedly define the same field/attribute across different classes / entity types? For example, you may have many different entities with an “id”, a “name”, etc. When an attribute “belongs to” an entity, you need to repeatedly register specifications for each (re)definition: it’s a string, it needs to pass these tests to be considered valid, etc. What if attributes were top-level?

Scienze naturaliInglese
Pubblicato

Consider “basic theories” that are particularly simple in two ways. First, they describe selected aspects of material objects, abstracting from all other properties – homogenous samples, thermally isolated containers, points, rigid solids, infinitely thin layers, etc. Second, they provide particularly simple expressions and means of combination for their simple objects.

Scienze naturaliInglese
Pubblicato

In order to integrate quantitative data, you need to know (a) if units are commensurate, and (b) if so, how to do conversions. The Quantities, Units, Dimensions, and Types (QUDT) ontology serves three major purposes. First, it provides a global reference for units via URIs; this helps avoid tacit conventions that are prone to misinterpretation. Second, it provides for dimensional analysis via so-called “quantity kind” dimensional vectors;

Scienze naturaliInglese
Pubblicato

One of my favorite features of the PyCharm code editor is go-to-declaration: you can hold the control key and hover your mouse over a usage of a symbol, and you’ll see a tooltip with a preview of the declaration/definition of the symbol. Click it, and you’ll jump to the definition, perhaps in another file. After you’ve reviewed the definition, a keyboard shortcut gets you back to the usage point.

Scienze naturaliInglese
Pubblicato

The RDF data model is quite flexible: Anybody can say Anything about Any topic (aka the “AAA slogan”). However, I recommend – and describe here – a particular modeling strategy when it comes to entering new facts about research activities into a data management system. Once entered this way, workflows may add additional derived facts to suit the needs of downstream applications.

Scienze naturaliInglese
Pubblicato

Have you ever given or gotten data as CSV? Are the meanings of the columns always clear? How are they made clear? Are the given column labels/names and the given file/sheet names always enough? If additional information beyond the CSV file is needed, how is that facilitated? A separate README file that travels with the CSV as part of a zipped archive file?

Scienze naturaliInglese
Pubblicato

If you provide JSON, either as files or as API responses, you might be one step away from ensuring that anyone encountering that JSON has a portal to what it means. This step is to provide a single extra key-value pair in each JSON document – the key is “@context”, and the value is a URL. JSON-LD is “a JSON-based format to serialize Linked Data.