Introduction

Cosmogenic radionuclide (CRN) exposure dating (Granger et al., 2013; Schaefer et al., 2022), luminescence dating (Rhodes, 2011; Murray et al., 2021), and radiocarbon dating (Hajdas et al., 2021) are geochronological techniques that are the most widely applicable to the recent geological past. All three of the techniques allow determination of the deposition age of sediments and associated materials, and cosmogenic radionuclides can also be used to quantify the rate at which landforms or landscapes are lowered by physical and chemical erosion processes. The three techniques have made important contributions to the reconstruction of past environments (Roberts et al., 2001; Singhvi and Porat, 2008; Balco, 2019; Hocknull et al., 2020), and CRN and luminescence dating have revolutionised the field of quantitative geomorphology (Granger and Schaller, 2014; Dixon and Riebe, 2014; Guralnik et al., 2015; King et al., 2016). Radiocarbon dating, luminescence dating, and (to some extent) CRN exposure dating have also made substantial contributions to archaeology (Akcar et al., 2008; Renfrew, 2011; Roberts et al., 2015), including to the debates on the timing of human evolution and migration (Granger et al., 2015; Clarkson et al., 2017; Jacobs et al., 2019; Zilhão et al., 2020; Crabtree et al., 2021).

Like most geochronological techniques, the three dating techniques require specialised training, laboratories, and equipment, and they involve lengthy and costly sample preparation procedures. As a result, studies relying on CRN, luminescence, or radiocarbon techniques will often produce relatively small datasets (n<100) that address very specific research questions and focus on relatively small study areas. Furthermore, the lack of formal reporting standards (Schaefer et al., 2022; Murray et al., 2021; Hajdas et al., 2021) coupled with the disconnect that exists in some cases between the researchers collecting the samples and interpreting the ages and/or rates and the researchers preparing the samples and undertaking the measurements means that the techniques often produce datasets that are unmanaged. These datasets may become forgotten once the study has been completed and results are published, and they may not include sufficient levels of supporting information for the quality of the raw data to be easily determined or for the raw data to be reusable with confidence – for example, in instances where data need to be recalculated due to updated measurement standards and/or data reduction protocols. The above limitations mean that carefully curated compilations of CRN, luminescence, and radiocarbon data are necessary to allow for larger-scale synoptic studies and instances where the quality rating of ages/denudation rates is desirable; moreover, carefully curated compilations of these data are critical to ensuring the longevity and value of often irreplaceable legacy data.

Here, we describe the upgraded and updated version of the database – OCTOPUS v.2. The application part of the database has been extensively rewritten, and it is now running on the Google Cloud Platform (https://cloud.google.com, last access: 13 August 2023). The data are stored in a relational database, and the data collections have been extended to include a global collection of CRN exposure ages on glacial landforms; an Australian collection of OSL and TL ages from aeolian and lacustrine sedimentary archives; OSL, TL, and radiocarbon ages from Sahul (Australia, New Guinea, and the Aru Islands joined by lower sea levels) archaeological records; and a collection of late-Quaternary records of non-human vertebrate fauna fossil ages from Sahul. Supporting data are comprehensive and include bibliographic, contextual, and sample-preparation- and measurement-related information. In the case of fluvial sediment CRN data, the database also includes all necessary information and input files for the recalculation of denudation rates using CAIRN, an open-source program for calculating basin-wide denudation rates from Be-10 and Al-26 data (Mudd et al., 2016). Further, all CRN data have been recalculated and harmonised using the same program. OCTOPUS v.2 can be accessed at https://octopusdata.org (last access: 4 May 2023).

Note

The above section is a modified version of Section 1 from Codilean et al. 2022

System architecture

The software architecture behind OCTOPUS v.2 is illustrated in Fig. Sys1. Both software and data are deployed on the Google Cloud Platform (GCP) and follow a modular set-up aimed at optimal leveraging of cloud services available within the GCP. Although migration of the OCTOPUS platform to a cloud-hosted infrastructure such as the GCP adds complexity to the system architecture, Google Cloud offers extensive infrastructure and software solutions which are constantly updated with the latest technologies and architectures. This constant evolution ensures that any future work and redesigns of the OCTOPUS platform have access to best-in-class solutions. Further, the OCTOPUS platform is completely reproducible with access to a GCP environment, as the source code contains the entire project and required documentation, including infrastructure definitions, application definitions, and deployment steps.

_images/system_architecture.png

Fig. Sys1 Schematic of the OCTOPUS v.2 Google Cloud Platform (GCP) set-up. See the text for more details.

Most components of OCTOPUS v.2 run natively on the GCP apart from GeoServer and Tomcat, which are deployed within a Google Compute Engine using a single bespoke Docker container (https://www.docker.com, last access: 04 May 2023). Tabular data as well as the point and polygon geometries associated with each observation (see below) are stored in a PostgreSQL/PostGIS (https://postgis.net, last access: 04 May 2023) relational database running in Cloud SQL. The latter is a SaaS (Software as a Service) meaning that installation, set-up, and running activities of the database are automatically managed by the GCP, decreasing maintenance overhead and providing a monthly uptime/availability of 99.95 %. Raster data and all CAIRN 1 input and output files are stored separately within a Cloud Storage bucket in .zip archives. Unlike the first version of OCTOPUS (Codilean et al., 2018), the .zip archives in OCTOPUS v.2 no longer include the tabular and vector data that are now hosted in the PostgreSQL/PostGIS relational database. Thus, we avoid duplication and make future maintenance of the data more straightforward. The relational database is linked to a GeoServer instance (Fig. Sys1). GeoServer (http://geoserver.org, last access: 04 May 2023) implements a range of OGC data-sharing standards, including the widely used Web Feature Service (WFS) and the Web Map Service (WMS) standards that allow, in addition to connections from a web browser, direct connections to the database from a variety of desktop geographic information system (GIS) applications, including ArcGIS and QGIS (via WFS; see below) and Google Earth (via WMS). GeoServer exports data to various formats, including GML, JSON, Google Earth KML and KMZ, and Esri shapefile. GeoServer (along with Tomcat) is hosted in a Google Compute Engine, an IaaS (Infrastructure as a Service) that allows for a virtualised environment to be run on Google hardware. Geoserver and Tomcat currently exist as a single bespoke Docker container due to limitations of the deployed Geoserver and Tomcat versions that cannot run with separate runtimes. More recent Geoserver and Tomcat versions, however, exist as standard Docker containers that can be run independently aligned with a microservice architecture. Utilising these dockerised versions would permit the applications to be run on managed server-less platforms such as Google Cloud Run, allowing modular horizontal scaling. Further, Tomcat’s Common Gateway Interface (CGI) that provides functionality to the OCTOPUS frontend, such as downloading files and retrieving study bounding boxes, could also be separated into independent resources that run on Google Cloud Functions and allow for near-infinite horizontal scalability to meet any fluctuations in traffic volume. Next, the OCTOPUS web frontend is deployed in a Cloud Storage bucket and uses the OpenLayers (https://openlayers.org, last access: 04 May 2023) JavaScript library to display the geospatial data served by the GeoServer instance in a web browser (Fig. Sys1). Finally Cloud Load Balancing is used to distribute traffic and to separate connections to the web interface from those directed to GeoServer directly via WFS/WMS from third-party applications.

Note

The above section is a modified version of Section 2 from Codilean et al. 2022

Semantic data model

Unlike the prior version of the OCTOPUS database that stored data in a series of flat data tables (Codilean et al., 2018), OCTOPUS v.2 builds on a fully relational PostgreSQL database that, using PostGIS spatial extensions, organises data following a two-pronged conceptual model (Fig. Sdm1). First, data are organised hierarchically going from a broader defined agglomeration of “sites” sharing common properties (referred to as a “metasite”) down to “observations”, namely the actual Be-10, Al-26, OSL, TL, or radiocarbon age or rate data. Second, data are also organised thematically into (i) “local” data, spatial features, and parent tables – with all of these serving a single data collection; (ii) “thematic” parent tables serving multiple data collections that are thematically linked (e.g. are based on the same method); and (iii) “global” parent tables that serve all data collections (Fig. Sdm1).

_images/data_model.png

Fig. Sdm1 Representation of the OCTOPUS v.2 semantic data model. The full database schema along with HTML documentation is available in Munack and Codilean (2022). The inset refers to the “Glen Lossie” metasite. See the text for more details.

See also

🪰 A visual, interactive database schema can be found at octopus-db.github.io.

In terms of hierarchy, the OCTOPUS v.2 data model includes four levels: metasite, site, sample, and observation. Whilst sites, samples, and observations apply to all data collections, metasites do not apply to the CRN Denudation and Sahul Sedimentary Archives (SahulSed) collections. A site, the hierarchical level subordinate to metasite, is a geographic point entity from which n≥1 samples have been collected. Therefore, sites without associated samples do not exist. A site is predominantly defined by geographic attributes, including georeferencing information (e.g. country, region, island, river basin, coordinates, and elevation) and other addressing/identification information (e.g. site name, alternative name, and type of site). All site description data are stored in one global table. Samples represent the material – for example, shell, bone, rock fragment, river sand – that was collected and used for the age/denudation rate determination. Therefore, samples are (or were) a tangible entity. In OCTOPUS v.2, samples are described by sets of data-collection-specific attributes; thus, each data collection will have its dedicated sample table that links records to sites via unique site identifiers. Typical sample table attributes deal with physical sample properties (e.g. grain size, material dated, sample thickness, or density) and their very local depositional contexts (e.g. facies, shielding, depth below surface, and excavation square or unit). Finally, observations (i.e. the actual age/denudation rate data) are stored in dedicated method-specific tables that include fields aimed at capturing any meaningful auxiliary data that help evaluate the quality of the age/denudation rate and, where necessary, further allow for the latter to be recalculated/reproduced.

We illustrate how the above hierarchical semantic data model is implemented in OCTOPUS v.2, using the example of a South Australian shell midden cluster (Wilson et al., 2012) (Fig. Sdm1, inset). A cluster of shell middens that share contextual similarities form a metasite – “Glen Lossie” – that has a footprint that may be defined by a bounding box. Individual middens belonging to Glen Lossie are considered sites (point geometry) and have unique OCTOPUS site identifiers assigned (Fig. Sdm1, inset). Shell fragments are samples from those midden sites. In the Glen Lossie case, a repeat measurement was done on a shell fragment with the original ID “GLM3-ss14”. As a result, OCTOPUS considers “GLM3-ss14” and “GLM3-ss14(r)” as a single sample with two associated observations, i.e. two separate radiocarbon ages (Obs. IDs ARCH0171C14001 and ARCH0171C14002 respectively; Fig. Sdm1, inset).

Note

The above section is a modified version of Section 3 from Codilean et al. 2022

Footnotes

1

Catchment-Averaged denudatIon Rates from cosmogenic Nuclides: CAIRN https://doi.org/10.5194/esurf-4-655-2016