Page Last Updated: May 22, 2026
File-Based Data Processing Workflow🔗
Part 1: Site Capture & BIDS Conversion🔗
Data is collected from sites into LORIS (EEG, Axivity, and GABI) or FIONA (for MRI and MRS). LORIS data is subsequently transferred directly into the central S3 main PR bucket, which subsequently is sourced for CBRAIN processing. MRI and MRS must first be converted to BIDS format and MRI data also undergoes extensive raw data QC (see details).
| Key Name in Diagram | S3 URL |
|---|---|
| JCVI | s3://midb-hbcd-ucsd-main-pr-dicoms/ |
| MRS BIDS | s3://midb-hbcd-main-pr-mrs/ |
| Main PR / BIDS | s3://midb-hbcd-main-pr/assembly_bids/ |
| QC Env | s3://midb-hbcd-lasso-hdcc-qc-ongoing-dccid/ |
Part 2: De-Identification, CBRAIN Processing, & Lasso Ingestion🔗
For a detailed breakdown of de-identification, CBRAIN pipeline processing, re-identification, etc., see the UMN De-Identification & Pipeline Processing section below.
All of the processes below are performed by UMN/MSI:
| Key Name in Diagram | S3 URL |
|---|---|
| De-ID-List | s3://midb-hbcd-main-pr-deidentification-list/ |
| De-ID / BIDS | s3://midb-hbcd-main-deid/assembly_bids/ |
| De-ID / Derivatives | s3://midb-hbcd-main-deid/derivatives/ |
| De-ID / BrainSwipes | s3://midb-hbcd-main-deid/v2/brainswipes/ |
| Main PR / Derivatives | s3://midb-hbcd-main-pr/reid_derivatives/ |
| Main PR / BrainSwipes | s3://midb-hbcd-main-pr/reid_brainswipes/ |
| Lasso PR | s3://midb-hbcd-lasso-hdcc-qc-br/br{BETA RELEASE#}/hbcd/ |
| QC Environment | s3://midb-hbcd-lasso-hdcc-qc-ongoing-dccid/ |
The workflow for BrainSwipes is unique compared to other data due to the fact that the quality control (QC) is performed post-CBRAIN processing and therefore must go through additional steps. Some details to note:
- After transfer of the visual reports used for QC to the Prerelease Derivatives S3 URL (
s3://midb-hbcd-prerelease-bids/derivatives/ses-V02/xcp_d/{{SUBJECT}}/figures/{{FILENAME}}.png), a query is run to identify outputs that are out of date and either remove or archive records related to out-of-date files - TBD: Participant sessions that fail structural QC (based on XCP-D derivative visual reports) are flagged to perform manual corrections on the corresponding BIBSNet brain segmentations. The corrected segmentations will not be fed back into the main processing workflows, but are instead integrated into the training set for future BIBSNet models.
Responsibility Assignment Matrices By Modality🔗
| Study Stage | Step | Location | Responsible | Accountable | CConsulted/IInformed |
|---|---|---|---|---|---|
| Data Collection | Participant Source Data Acquisition: DCMs, eCRF population (MRI Acquisition Form) | FIONALORIS | Site Staff (Varies by site) | Site Staff (Varies by site) | -/- |
| Data QC + Action | QC at Source Data Acquisition: eCRF populated properly, DCM header checks, naming convention checks | FIONA | Site Staff (Varies by site) | Site Staff (Varies by site) | -/- |
| Data QC + Action | QC at Source: Check acquisition/protocol adherence | JCVI | Josh Kuperman | Anders Dale | -/- |
| Data QC + Action | Transfer acquisition/protocol adherence (both DCM acquisition and MRI Acquisition Form) to DCC | JCVI | Ron Yang, Don Hagler | Don Hagler | -/- |
| Data Collection | Convert DICOMs to BIDS/NIfTI | UMN MSI | Cecile Madjar | Cecile Madjar | -/- |
| Data Collection | Convert MRS data to BIDS | UMN HSTUMN MSI | Reed McEwan, Cecile Madjar | Reed McEwan | [C] Helge Zoellner [C] Erik Lee [C] Georg Oeltzschner |
| Data QC + Action | QC of the DCM ot BIDs Conversion: Correct BIDS errors | UMN MSI | Cecile Madjar | Cecile Madjar | [C] Erik Lee [C] Lucille Moore [C] Tim Hendrickson |
| Data QC + Action | Check for protocol deviations (based on BIDS) | JCVI | MRI QC Workgroup | Don Hagler | -/- |
| Data Ingestion | Ingestion and catalogue DICOMs in Lasso | -/- | |||
| Data QC + Action | QC of ingestion | JCVIHST | -/- | ||
| Data QC + Action | Initial QC raw data (e.g. manual, automated) | JCVI | MRI QC Workgroup | Don Hagler | -/- |
| Study Stage Step | Location | Responsible | Accountable | CConsulted/IInformed |
|---|---|---|---|---|
| Monthly Net Inventory/Equipment QC | Site | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | -/- |
| EEG Acquisition | Site | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | EEG Core |
| Populate EEG Acquisition Form | Site | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | -/- |
| QC Acquisition form population | Site | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | EEG Core |
| De-Identification & Flags (pre-LORIS) | Site | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | -/- |
| BIDs Wizard Population and Execution | LORIS | Site Staff (Varies Per Site), EEG Core, HDCC, LORIS | Site Staff (Varies Per Site) | [C] Laetitia Fesselier |
| Convert EEG data to BIDS | UMN MSI | Laetitia Fesselier | Laetitia Fesselier | EEG Core |
| Run MADE Pipeline | UMN MSI | Erik Lee | Erik Lee | EEG Core |
| QC pre-processed EEG | UMD EEG Core | Kira Ashton, Dylan Gilbreath, Trisha Maheswari, Elise Harris | Santiago Morales | EEG Core |
| Study Stage Step | Location | Responsible | Accountable | CConsulted/IInformed |
|---|---|---|---|---|
| Acquisition of sample | Site | Site Staff | Site Staff | -/- |
| Population of meta data form | LORIS | Site Staff | Site Coordinator | -/- |
| QC of form population | OHSUOregon Health and Science University | WG Co-Chairs | Elinor Sullivan (Co-Chair) | -/- |
| Shipment of sample | Site | Site Staff | Site Staff | -/- |
| QC: Ensuring sample was received | Sampled/USDTL | Charles Hevi (Sampled) Priti Soni (USDTL) |
Charles Hevi (Sampled) Priti Soni (USDTL) |
-/- |
| QC: deviation code of sample | Sampled/USDTL | Charles Hevi (Sampled) Priti Soni (USDTL) |
Charles Hevi (Sampled) Priti Soni (USDTL) |
-/- |
| Analysis of sample | Sampled/USDTL | Charles Hevi (Sampled) Priti Soni (USDTL) |
Charles Hevi (Sampled) Priti Soni (USDTL) |
-/- |
| Study Stage Step | Location | Responsible | Accountable | CConsulted/IInformed |
|---|---|---|---|---|
| Data Collection | Site | |||
| Convert Axivity data to BIDS | UMN MSI | Cecile Madjar | Cecile Madjar | [C] Jinseok Oh [C] Beth Smith [C] Erik Lee |
| Run Axtivity Pipeline | UMN MSI | Erik Lee | Erik Lee | [C] Jinseok Oh [C] Beth Smith |
| QC preprocessed data |
UMN De-Identification & Pipeline Processing🔗
This documentation outlines how UMN processes imaging data after it has been curated by LORIS into BIDS format.
The workflow consists of eight interdependent components that handle de-identification, pipeline processing, synchronization, and cleanup of imaging data.
Workflow Summary🔗
| # | Workflow | Core Function | Frequency |
|---|---|---|---|
| 1 | Release Candidate ID Creation | Uploads updated release ID mappings for new subjects | Daily |
| 2 | Raw BIDS De-Identification | Removes identifiers and uploads anonymized data to de-ID bucket | Daily |
| 3 | CBRAIN Subject Registration | Registers de-identified subjects in CBRAIN for processing | Daily |
| 4 | Post-Processing of De-ID Data | Runs pipelines (e.g., Nibabies, QSIPrep) on de-identified BIDS | Daily |
| 5 | CBRAIN Log Preservation | Archives failed task logs for permanent tracking | Daily |
| 6 | Raw BIDS Sync Cleanup | Removes outdated data when LORIS and de-ID buckets diverge | Daily |
| 7 | Re-ID for LORIS | Replaces Release Candidate IDs with DCCIDs for LORIS ingestion | Daily |
| 8 | Derivative Sync Cleanup | Removes outdated re-ID derivatives from LORIS bucket | Daily |
Primary Goals🔗
- Ensure only anonymized data (using Release Candidate IDs) is released publicly
- Prevent overlap of Release Candidate IDs and DCCIDs/PSCIDs within the same dataset
- The raw BIDS data curated by LORIS will be periodically updated. These updates also include updates to BIDS metadata and QC (which may impact processing pipelines).
- The derived processing outputs released to the public must be from the same processing stream that internal HBCD investigators use for QC purposes.
- Provide LORIS with access to derived outputs for facilitating internal QC with Workgroups and for tabulation of derivatives
- Limit unnecessary reprocessing while still maintaining integrity between inputs/outputs. For example, if ses-V03 becomes available for a given subject, this should not initiate re-processing of ses-V02 data. However if new files or updated QC becomes available for ses-V02 then ses-V02 reprocessing should occur.
General Limitations🔗
Incoming session data (MRI including initial scans and rescans, EEG, Axivity, GABI, and manual QC ratings) often arrive over several weeks. Automated QC and processing routines may be delayed until all expected elements for a session are available.
Individual Workflows🔗
Goal: Maintain an up-to-date mapping of identifiers for de-identification workflows.
Contacts: Reed McEwan, Dan Duhon
Frequency: Daily (<1 hour)
Inputs: N/A
Outputs: s3://midb-hbcd-main-pr-deidentification-list/release_identifiers.csv
Notes: Phantom data may not yet be included.
Goal: De-identify and upload raw BIDS sessions
Criteria for De-identification:
- Subject is listed in the release ID mapping
- No existing session files in the de-ID bucket
- Session files are available in the LORIS bucket and are ≥1 day old
De-identification Procedure Overview:
- De-identify and upload all supported session files to the de-ID bucket
- Update session metadata (
sessions.<tsv|json>) in de-ID bucket - Tag each file with its
loris-versionid(corresponds toVersionIdin original LORIS files) for traceability
| REMOVED/RETAINED IDENTIFIERS: |
| Removed: PSCIDs, DCCIDs, and Site IDs & manually populated fields (prone to typos) that commonly contain these identifiers |
| Retained: Jittered patient age at acquisition, Acquisition dates/times, and & Acquisition device serial numbers |
| FILE COVERAGE: |
BIDS metadata (scans/session tsv & JSONs)
|
EEG
|
MRS NIfTI files:
|
Contacts: Sriharshitha Anuganti, Erik Lee
Frequency: Daily (PM CST; ensure completion within 24 hours)
Inputs: s3://midb-hbcd-main-pr/assembly_bids (raw BIDS data with DCCIDs)
Outputs: s3://midb-hbcd-main-deid/assembly_bids (with Release Candidate IDs)
Notes: EEG sourcedata files eventlogs.edat3 and eeg_flags.json are not yet supported.
Goal: Make CBRAIN aware of subjects available for processing.
Contacts: Monalisa Bilas, Erik Lee
Frequency: Daily (<1 hour)
Inputs: s3://midb-hbcd-main-deid/assembly_bids
Outputs: Internal CBRAIN records indicating existence of subject folder within BIDS directory
Notes: Each subject has a single CBRAIN BidsSubject File Collection linking all sessions, though sessions are processed independently.
Goal: Run de-identified BIDS data through BIDS App pipelines in CBRAIN (e.g., `Nibabies`, `BIBSNet`, `QSIPrep`) to generate derivatives.
Workflow Steps:
- Detect available sessions sessions in the BIDS directory and CBRAIN
- Check for existing outputs or prior processing attempts
- For sessions without outputs or attempted processing, verify that required prerequisite files exist and pass QC (from
scans.tsv) - Select files for processing based on modality-specific rules (e.g., best T1w image only, all fMRI images passing QC)
- Confirm dependencies between pipelines (e.g., BIBSNet outputs are required by Nibabies)
- Launch CBRAIN processing tasks using predefined settings and including only files selected for processing
- CBRAIN uploads successful job outputs to session-specific folders on S3 and records the corresponding processing tasks and generated file collections internally.
Contacts: Erik Lee, Monalisa Bilas
Frequency: Daily (initial routine <1 hour; processing jobs may take ~1 day)
Inputs: s3://midb-hbcd-main-deid/assembly_bids (raw BIDS data)
Outputs: s3://midb-hbcd-main-deid/derivatives/ses-{V0X} (session-specific subject folders with Release Candidate IDs)
Notes: See the GitHub repository and Documentation for the code that manages processing. CBRAIN logs and file collections are stored internally for traceability.
Goal: Preserve CBRAIN processing logs for failed tasks before the jobs are deleted (a few weeks after completion). Logs from successful jobs are already archived in the .cbrain logs included with the S3 outputs. Note that CBRAIN only transfers outputs to S3 for successful jobs.
Contacts: Monalisa Bilas, Erik Lee
Frequency: Daily (<1 hour)
Inputs: CBRAIN task directories stored on MSI under /scratch.global
Outputs: s3://midb-hbcd-main-deid/cbrain_std_logs/ (Files named {CBRAIN_Task_ID}.<out|err>.out)
Notes: CBRAIN task IDs are unique, so duplicates pose no issue.
Goal: Detect and remove sessions where LORIS and de-ID data diverge.
-
Process:
- Compare file counts between de-id and LORIS session folders
- If files counts are the same, compare the
loris-versionidof the de-id files to ensure they match - If session counts or
loris-versionidmismatch, delete all associated derivatives, CBRAIN task records, and raw BIDS data. The next time the query scripts are run that look for new subjects to process, the processing will be re-initiated for these subjects.
Contacts: Erik Lee, Monalisa Bilas
Frequency: Daily (runtime varies by data volume)
-
Inputs:
- Raw BIDS data:
s3://midb-hbcd-main-deid/assembly_bidsands3://midb-hbcd-main-pr/assembly_bids - Derivatives:
s3://midb-hbcd-main-deid/derivatives - CBRAIN records of userfiles and tasks
Outputs: N/A
Notes: Cleanup enables the next de-ID workflow to rerun cleanly for that session.
Goal: Re-identify de-identified derivatives by replacing Release Candidate IDs with DCCIDs, enabling upload to LORIS. Ensures derivatives are accurately linked back to participant DCCIDs to facilitate internal QC with Workgroups and tabulation of derivatives.
Process:
- Download de-identified derivatives from
s3://midb-hbcd-main-deid/derivatives. - Replace all Release Candidate IDs with corresponding DCCIDs in both filenames and file contents (using file type–specific routines) for (1)text-based files (
.csv,.html,.json,.txt,.toml,.tsv,.log) and (2) EEG .set files (.set,.mat) - Replace anonymized site IDs with real site IDs.
- Upload re-identified files to
s3://midb-hbcd-main-pr/reid_derivativesand set the metadata fieldcbrain-timestampbased on the original file’sLastModifiedvalue.
Re-identification is performed on derivatives for the following pipelines:
| bibsnet bme_x hbcd_motion |
made mriqc mrsqc |
nibabies osprey qmri_postproc |
qsiprep qsirecon-DIPYDKI qsirecon-DSIStudio |
qsirecon-TORTOISE_model-<MAPMRI|tensor> symri xcp_d |
Contacts: Sriharshitha Anuganti, Erik Lee
Frequency: Runs daily
Inputs: s3://midb-hbcd-main-deid/derivatives (de-identified derivatives)
Outputs: s3://midb-hbcd-main-pr/reid_derivatives (re-identified derivatives)
-
Notes:
- Update re-ID routines whenever pipeline filenames or formats change.
- Previous documentation referenced
VersionIdmetadata; this has been replaced withLastModifiedsince the de-ID bucket is non-versioned.
Goal: Remove re-identified derivatives from LORIS when they become out of sync with corresponding de-identified derivatives.
Process:
- For each subject/session/pipeline, compare
LastModified(de-ID) andcbrain-timestamp(re-ID) values between:s3://midb-hbcd-main-deid/derivativess3://midb-hbcd-main-pr/reid_derivatives
- If the number of files or timestamps differ, delete the corresponding re-identified data from
s3://midb-hbcd-main-pr.
Contacts: Sriharshitha Anuganti, Monalisa Bilas, Erik Lee
Frequency: Daily
Inputs: s3://midb-hbcd-main-pr/reid_derivatives and s3://midb-hbcd-main-deid/derivatives
Outputs: N/A
Notes: Ensures only synchronized derivatives remain in LORIS and prevents outdated or mismatched data from being used.