Walkthrough: stellar distances from Gaia parallax
·14 mins
Incrementally builds a research object from a bare script to a tracked, portable, reproducible pipeline — motivated by real problems, not by acronym order.
Version information must be recorded for all components, ideally using the same content-addressed version control system. The primary value is not version numbering (“v1” vs “v2”) but content-addressed identification – two datasets with identical content hashes are provably identical.
Tracking encompasses not only version history but also provenance: what actions produced or modified each component, what inputs were consumed, and what versions of code and environment were involved. For code-driven modifications, provenance should be captured programmatically rather than by manual annotation.
See the STAMPED paper for the full treatment.