Analysis repository for the OpenNeuro study ds007262 version 1.0.6, titled Cognitive Workload 8-level arithmetic (DOI 10.18112/openneuro.ds007262.v1.0.6). The codebase covers dataset acquisition, trial-table construction, QC, preprocessing, epoching, unimodal feature extraction, fused-table assembly, split-aware machine learning, confusion analysis, and publication-oriented reporting for this specific study snapshot.
analysis_pipeline/: executable stage scripts, pipeline configs, and supporting modules.analysis_pipeline/config/: checked-in YAML profiles for reproducible runs.scripts/: dataset download, end-to-end execution, and report/manuscript helpers.docs/: GitHub-facing documentation plus manuscript handoff material.data/: local BIDS dataset root (ignored).analysis_pipeline/runs/: run-specific outputs (ignored).
This repository is intentionally separate from the OpenNeuro dataset record itself. Track code, configs, and documentation here; keep raw data and generated outputs local.
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txtrequirements.txt covers the classic ML and signal-processing stack. Install PyTorch separately if you plan to run deep Stage 6 models such as lstm1d, gru1d, cnn1d, or transformer.
OpenNeuro dataset snapshot for this repository:
python .\scripts\download_bids.py `
--dataset-id ds007262 `
--snapshot 1.0.6 `
--target .\data\bids_arithmeticDirect archive URL:
python .\scripts\download_bids.py `
--archive-url https://example.org/your_bids_archive.zip `
--target .\data\bids_arithmeticOne-command download plus pipeline execution for ds007262 v1.0.6:
.\scripts\run_end_to_end.ps1 -DatasetId ds007262 -Snapshot 1.0.6 -ForceDownloadDefault fixed-window profile:
python .\analysis_pipeline\run_pipeline.py `
--config .\analysis_pipeline\config\pipeline_unified_classic_nn_baseline_preproc.yamlAlternative overlap profile:
python .\analysis_pipeline\run_pipeline.py `
--config .\analysis_pipeline\config\pipeline_unified_classic_nn_baseline_overlap3s_50pct_preproc.yamlThe checked-in profiles assume a local copy of OpenNeuro ds007262 v1.0.6 under data/bids_arithmetic. They write under analysis_pipeline/runs/<profile_name>/. Both set outputs.clean_start: true, so rerunning the same profile replaces that profile's run directory only. If you want to preserve an existing run, copy the YAML and change outputs.root or set clean_start: false.
Run through feature extraction:
python .\analysis_pipeline\run_pipeline.py `
--config .\analysis_pipeline\config\pipeline_unified_classic_nn_baseline_preproc.yaml `
--only stage0 stage1 stage2 stage3 stage4 stage5Run Stage 6 only:
python .\analysis_pipeline\run_pipeline.py `
--config .\analysis_pipeline\config\pipeline_unified_classic_nn_baseline_preproc.yaml `
--only stage6When --only stage6 is used through the orchestrator, stage6_confusions is auto-run unless --no-auto-stage6-confusions is passed.
| Stage | Script | Main purpose | Main outputs |
|---|---|---|---|
| 0 | build_trial_table.py |
Build the canonical trial table from BIDS events. | <run_root>/reports/trial_table_bids_arithmetic.tsv |
| 1 | stage1_qc_summary.py |
Summarize modality coverage, dropped samples, and participant QC. | <run_root>/reports/qc_dataset_summary.json, figures, subject table |
| 2 | stage2_preprocess.py |
Clean EEG, ECG, and pupil streams and write derivatives. | <run_root>/derivatives/cleaned/, preprocess logs |
| 3 | stage3_epoch_trials.py |
Convert trials into fixed or overlapping epochs with drop accounting. | <run_root>/derivatives/epochs/, epoch_manifest.tsv, epoch_summary.json |
| 4 | stage4_extract_features.py |
Extract modality-specific engineered features. | <run_root>/features/features_eeg.tsv, features_ecg.tsv, features_pupil.tsv |
| 5 | stage5_build_fused_table.py |
Build unimodal and fused ML tables plus split manifests. | <run_root>/features/features_fused_tutorial_baseline.tsv, split_manifest_tutorial_baseline.json |
| 6 | stage6_train_classic_ml.py |
Benchmark classic and optional deep models across datasets, protocols, and class scenarios. | <run_root>/reports/ml_results_*.json, <run_root>/reports/ml_summary_*.md, <run_root>/models/ |
| 6b | stage6_highlight_confusions.py |
Curate top confusion matrices from Stage 6 results. | <run_root>/reports/confusion_highlights_*.json, markdown, PNGs |
| 6c | stage6_build_publication_report.py |
Assemble a publication-facing run summary. | <run_root>/reports/publication_full_report.md, .json |
Stage 6 can also emit live confusion PNGs during training and EEG PSD/topomap QC figures when EEG is part of the selected dataset list.
| Profile | File | Intended use |
|---|---|---|
| Baseline fixed-window run | analysis_pipeline/config/pipeline_unified_classic_nn_baseline_preproc.yaml |
Canonical reproducible run: fixed 6 s calculation windows, classic plus deep model sweep, publication report enabled. |
| Overlap-window run | analysis_pipeline/config/pipeline_unified_classic_nn_baseline_overlap3s_50pct_preproc.yaml |
Same pipeline family with 3 s windows, 1.5 s step size, and overlap enabled for Stage 3. |
Both profiles benchmark the baseline_all_bins, baseline_omit_easiest, baseline_omit_hardest, baseline_low_high_omit_hardest, and baseline_grouped_4class_omit_hardest class scenarios.
run_pipeline.pyexpands output placeholders such as{reports_dir},{features_dir}, and{models_dir}fromoutputs.root.- Expected outputs are verified after every step. Use
--no-strict-outputsonly when debugging incomplete runs. - Stage 1 strict QC carry-forward is propagated automatically into Stages 2 to 5.
--dry-runprints planned commands and expected outputs without executing them.- Stage 6 resolves both Windows and WSL-style dataset paths stored in split manifests.
docs/pipeline_reference.md: explicit stage-by-stage and config-by-config pipeline reference.docs/reproducibility.md: artifact policy, rerun guidance, and Linux-to-Windows handoff instructions.analysis_pipeline/README.md: package-level map of stage scripts and outputs.- local manuscript handoff material can live under
docs/paper_handoff/without changing the reproducible pipeline entry points.
CC0 1.0 Universal (see LICENSE).