Page Last Updated: April 8, 2026

HBCD Data Processing Workflows🔗

This section provides an overview of the complete HBCD processing workflows for both tabulated data and file-based data, detailing key processing steps, data storage locations (on S3 and other systems), and the responsible teams (see HDCC Structure & Organizational Charts).

Term	Definition
Tabulated data	In standardized HBCD table format, includes behavior & biology, demographics, visit data, and tabulated derivatives. See Data Structure Overview on the main Docs site for an overview of tabulated vs. file-based data.
File-based data	In varied formats, includes raw BIDS and processed derivatives for MRI, MRS, EEG, and wearable sensor data. See Data Structure Overview on the main Docs site for an overview of tabulated vs. file-based data.
Release Candidate ID	The anonymized ID that will be used as the BIDS subject label in any public releases.
DCCID and/or Candidate ID	The original BIDS participant ID prior to de-identification (e.g. `sub-1234` where `1234` is the DCCID) in LORIS and other internal data sources.
PSCID	Additional ID used in LORIS and during data collection. This ID begins with a five character sequence where the first two characters indicate participant status and the last three characters indicate the recruitment site.
de-identification/de-id	The process/outputs associated with replacing DCCIDs/PSCIDs with Release Candidate IDs.
re-identification/re-id	The process/outputs associated with replacing Release Candidate IDs with DCCIDs
SCE	Secure computing environment at the UMN Health Sciences Technology Office
Third party	Refers to external organizations or companies that provide proprietary assessments, scoring tools, or data systems used to collect standardized behavioral, cognitive, and developmental data across study sites
Post-processing pipelines, BIDS Apps	Terms used to denote pipelines whose goal is to take imaging, eeg, or other data organized in BIDS and run numerical algorithms to create outputs that can be used for further processing or for statistical analyses. The outputs of these pipelines are referred to as derivatives or imaging-derived phenotypes depending on the context.
derivatives	Any files produced by a post-processing pipeline. In other words, the outputs of containerized pipelines or BIDS Apps (such as Nibabies) that are run in CBRAIN.
Imaging-derived phenotypes	Scalar values that are output from a pipeline (such as brain volume) that can be concatenated across subjects and used for statistical analyses

S3 Bucket Descriptions🔗

Diagram Key	S3 Object `s3://midb-hbcd*`	Description
Main PR	`-main-pr/`	LORIS production bucket that receives all tabulated and file-based data for the full HBCD study prior to staging and Lasso ingestion, including: `phenotype/`: Tabulated data `assembly_bids/`: Raw BIDS curated by LORIS (DCCIDs used for subject labels) `derivatives/`: Re-identified derivatives `reid_brainswipes/`: Re-identified BrainSwipes data
De-ID	`-main-deid/`	De-identified/anonymized data (Release Candidate IDs used for subject labels), including: `assembly_bids/`: Raw BIDS `derivatives/`: Derivatives `brainswipes/`: BrainSwipes data
De-Id-List	`-main-pr-deidentification-list/`	ID mapping file `release_identifiers_YYYYMMDD.csv`, re-created daily, showing relationships between the various ID types used in HBCD (e.g. contains de-identified participant list information used for de-identification).
Lasso Staging	`-staging/`	Where LORIS deposits tabulated data after running data release script for each BR
Lasso PR*	`-lasso-hdcc-qc-br/`	Lasso Pre-Release contains release version-specific data (de-identified) housed under `br{BETA RELEASE#}/hbcd/` to be ingested into Lasso
QC Env*	`-lasso-hdcc-qc-ongoing-dccid/`	Lasso HDCC environment for ongoing QC; mimics the structure of the release data deposits, but excludes the br{BETA RELEASE#} prefix. The data contains DCCIDs only and is updated with both release and non-release main study participant data regularly, including: Tabulated data provided by LORIS to Lasso Raw BIDS copied from s3://midb-hbcd-main-pr/assembly_bids Re-identified derivatives from s3://midb-hbcd-main-pr/reid_derivatives
JCVI	`-ucsd-main-pr-dicoms/`	JCVI DICOMs and raw data QC results.
MRS BIDS	`-main-pr-mrs/`	MRS data post-BIDS conversion.
Sandbox	`main-sb/`	LORIS Bucket for non-production system to test data flows on pilot data

* Lasso PR and QC Env S3 buckets collectively form the Lasso HDCC QC environment

HBCD VM Staging🔗

See the full diagram for HBCD Staging VMs and associated buckets here. For the Data Release stream, de-identified data flows into the older BR directory (after a copy is pushed to the final BR), and checksum is run at this stage between the previous and new BR data.