Cosmogenic radionuclide (CRN) exposure dating (Granger et al., 2013; Schaefer et al., 2022), luminescence dating (Rhodes, 2011; Murray et al., 2021), and radiocarbon dating (Hajdas et al., 2021) are geochronological techniques that are the most widely applicable to the recent geological past. All three of the techniques allow determination of the deposition age of sediments and associated materials, and cosmogenic radionuclides can also be used to quantify the rate at which landforms or landscapes are lowered by physical and chemical erosion processes. The three techniques have made important contributions to the reconstruction of past environments (Roberts et al., 2001; Singhvi and Porat, 2008; Balco, 2019; Hocknull et al., 2020), and CRN and luminescence dating have revolutionised the field of quantitative geomorphology (Granger and Schaller, 2014; Dixon and Riebe, 2014; Guralnik et al., 2015; King et al., 2016). Radiocarbon dating, luminescence dating, and (to some extent) CRN exposure dating have also made substantial contributions to archaeology (Akcar et al., 2008; Renfrew, 2011; Roberts et al., 2015), including to the debates on the timing of human evolution and migration (Granger et al., 2015; Clarkson et al., 2017; Jacobs et al., 2019; Zilhão et al., 2020; Crabtree et al., 2021).
Like most geochronological techniques, the three dating techniques require specialised training, laboratories, and equipment, and they involve lengthy and costly sample preparation procedures. As a result, studies relying on CRN, luminescence, or radiocarbon techniques will often produce relatively small datasets (n<100) that address very specific research questions and focus on relatively small study areas. Furthermore, the lack of formal reporting standards (Schaefer et al., 2022; Murray et al., 2021; Hajdas et al., 2021) coupled with the disconnect that exists in some cases between the researchers collecting the samples and interpreting the ages and/or rates and the researchers preparing the samples and undertaking the measurements means that the techniques often produce datasets that are unmanaged. These datasets may become forgotten once the study has been completed and results are published, and they may not include sufficient levels of supporting information for the quality of the raw data to be easily determined or for the raw data to be reusable with confidence – for example, in instances where data need to be recalculated due to updated measurement standards and/or data reduction protocols. The above limitations mean that carefully curated compilations of CRN, luminescence, and radiocarbon data are necessary to allow for larger-scale synoptic studies and instances where the quality rating of ages/denudation rates is desirable; moreover, carefully curated compilations of these data are critical to ensuring the longevity and value of often irreplaceable legacy data.
Here, we describe the upgraded and updated version of the database – OCTOPUS v.2. The application part of the database has been extensively rewritten, and it is now running on the Google Cloud Platform (https://cloud.google.com, last access: 13 August 2023). The data are stored in a relational database, and the data collections have been extended to include a global collection of CRN exposure ages on glacial landforms; an Australian collection of OSL and TL ages from aeolian and lacustrine sedimentary archives; OSL, TL, and radiocarbon ages from Sahul (Australia, New Guinea, and the Aru Islands joined by lower sea levels) archaeological records; and a collection of late-Quaternary records of non-human vertebrate fauna fossil ages from Sahul. Supporting data are comprehensive and include bibliographic, contextual, and sample-preparation- and measurement-related information. In the case of fluvial sediment CRN data, the database also includes all necessary information and input files for the recalculation of denudation rates using CAIRN, an open-source program for calculating basin-wide denudation rates from Be-10 and Al-26 data (Mudd et al., 2016). Further, all CRN data have been recalculated and harmonised using the same program. OCTOPUS v.2 can be accessed at https://octopusdata.org (last access: 4 May 2023).
The above section is a modified version of Section 1 from Codilean et al. 2022
The software architecture behind OCTOPUS v.2 is illustrated in Fig. Sys1. Both software and data are deployed on the Google Cloud Platform (GCP) and follow a modular set-up aimed at optimal leveraging of cloud services available within the GCP. Although migration of the OCTOPUS platform to a cloud-hosted infrastructure such as the GCP adds complexity to the system architecture, Google Cloud offers extensive infrastructure and software solutions which are constantly updated with the latest technologies and architectures. This constant evolution ensures that any future work and redesigns of the OCTOPUS platform have access to best-in-class solutions. Further, the OCTOPUS platform is completely reproducible with access to a GCP environment, as the source code contains the entire project and required documentation, including infrastructure definitions, application definitions, and deployment steps.
Fig. Sys1 Schematic of the OCTOPUS v.2 Google Cloud Platform (GCP) set-up. See the text for more details.
The above section is a modified version of Section 2 from Codilean et al. 2022
Semantic data model
Unlike the prior version of the OCTOPUS database that stored data in a series of flat data tables (Codilean et al., 2018), OCTOPUS v.2 builds on a fully relational PostgreSQL database that, using PostGIS spatial extensions, organises data following a two-pronged conceptual model (Fig. Sdm1). First, data are organised hierarchically going from a broader defined agglomeration of “sites” sharing common properties (referred to as a “metasite”) down to “observations”, namely the actual Be-10, Al-26, OSL, TL, or radiocarbon age or rate data. Second, data are also organised thematically into (i) “local” data, spatial features, and parent tables – with all of these serving a single data collection; (ii) “thematic” parent tables serving multiple data collections that are thematically linked (e.g. are based on the same method); and (iii) “global” parent tables that serve all data collections (Fig. Sdm1).
Fig. Sdm1 Representation of the OCTOPUS v.2 semantic data model. The full database schema along with HTML documentation is available in Munack and Codilean (2022). The inset refers to the “Glen Lossie” metasite. See the text for more details.
🪰 A visual, interactive database schema can be found at octopus-db.github.io.
In terms of hierarchy, the OCTOPUS v.2 data model includes four levels: metasite, site, sample, and observation. Whilst sites, samples, and observations apply to all data collections, metasites do not apply to the CRN Denudation and Sahul Sedimentary Archives (SahulSed) collections. A site, the hierarchical level subordinate to metasite, is a geographic point entity from which n≥1 samples have been collected. Therefore, sites without associated samples do not exist. A site is predominantly defined by geographic attributes, including georeferencing information (e.g. country, region, island, river basin, coordinates, and elevation) and other addressing/identification information (e.g. site name, alternative name, and type of site). All site description data are stored in one global table. Samples represent the material – for example, shell, bone, rock fragment, river sand – that was collected and used for the age/denudation rate determination. Therefore, samples are (or were) a tangible entity. In OCTOPUS v.2, samples are described by sets of data-collection-specific attributes; thus, each data collection will have its dedicated sample table that links records to sites via unique site identifiers. Typical sample table attributes deal with physical sample properties (e.g. grain size, material dated, sample thickness, or density) and their very local depositional contexts (e.g. facies, shielding, depth below surface, and excavation square or unit). Finally, observations (i.e. the actual age/denudation rate data) are stored in dedicated method-specific tables that include fields aimed at capturing any meaningful auxiliary data that help evaluate the quality of the age/denudation rate and, where necessary, further allow for the latter to be recalculated/reproduced.
We illustrate how the above hierarchical semantic data model is implemented in OCTOPUS v.2, using the example of a South Australian shell midden cluster (Wilson et al., 2012) (Fig. Sdm1, inset). A cluster of shell middens that share contextual similarities form a metasite – “Glen Lossie” – that has a footprint that may be defined by a bounding box. Individual middens belonging to Glen Lossie are considered sites (point geometry) and have unique OCTOPUS site identifiers assigned (Fig. Sdm1, inset). Shell fragments are samples from those midden sites. In the Glen Lossie case, a repeat measurement was done on a shell fragment with the original ID “GLM3-ss14”. As a result, OCTOPUS considers “GLM3-ss14” and “GLM3-ss14(r)” as a single sample with two associated observations, i.e. two separate radiocarbon ages (Obs. IDs ARCH0171C14001 and ARCH0171C14002 respectively; Fig. Sdm1, inset).
The above section is a modified version of Section 3 from Codilean et al. 2022