Executive summary
GAUNTLET takes raw transcripts and produces the canonical relational corpus corpus_v1_5_2.db plus a tree of report artifacts. The pipeline is deterministic — re-running on identical inputs produces identical outputs, modulo timestamped backup names.
The integration/ layer adds cross-domain contract checks, boundary enforcement, and adapter-owned health checks for archive, review, and corpus domains. It runs alongside GAUNTLET, not in place of it.
Entrypoints
The pipeline has three primary script entrypoints and three orchestrator-level wrappers. Ordering and naming are fixed; tooling, tests, and CI bind to these names.
Primary scripts
python3 gauntlet.py ORCHESTRATED=1 python3 gauntlet_2.py python3 run_full_boat.py
Orchestrator wrappers
ORCHESTRATED=1 make unified-check python3 -m integration.orchestrate_unified ORCHESTRATED=1 make deploy-ios
| Entrypoint | Orchestration flag | Behavior |
|---|---|---|
gauntlet.py | internal | Enforces ORCHESTRATED=1 and PHASE5_PRENORMALIZE=1 across pipeline execution. |
gauntlet_2.py | caller | Requires ORCHESTRATED=1 in the caller environment. |
run_full_boat.py | internal | Sets ORCHESTRATED=1 internally for launched steps. |
Pipeline architecture
transcripts/ -> Phase 1: preparation + normalization + markerization + augmentation -> Phase 2: DB init + ingest + classification + event extraction -> Phase 3: report generation + ingestion audit -> Phase 4: verification suite + marker parity tests -> corpus_v1_5_2.db + artifacts/ integration/ -> shared contracts -> validators -> boundary enforcement -> review/corpus adapters -> unified JSON/MD reports
Phase map · 17 steps
The full step sequence. gauntlet.py and run_full_boat.py drive these in order; the list is also the canonical recovery script for hand-runs.
Phase 1 — corpus preparation
1. tools/preprocess_txt_transcripts_v1.py --path transcripts 2. tools/gauntlet_01_sanitize.py 3. tools/gauntlet_02_sequence.py --target-dir transcripts 4. tools/normalize_transcripts_v1.py --path transcripts 5. tools/inject_frontmatter_v1.py --target-dir transcripts 6. tools/inject_turn_markers_v1.py --transcripts-dir transcripts 7. tools/augment_transcripts_v1.py --in transcripts --out transcripts_augmented --phase-d
Phase 2 — semantic substrate
8. tools/init_corpus_db_v1.py 9. tools/ingest_transcripts_v1.py --rebuild-v2 --source-dir transcripts_augmented 10. tools/classify_regimes_v1.py 11. tools/extract_event_markers_v1.py
Phase 3 — reporting
12. tools/generate_gauntlet_reports_v1.py 13. tools/audit_ingestion_v1.py
Phase 4 — verification
14. tools/audit_transcripts_v1.py --target-dir transcripts_augmented 15. tools/audit_speaker_distribution_v1.py --db corpus_v1_5_2.db 16. tools/audit_ingestion_v1.py 17. python3 -m pytest tests/test_inject_turn_markers_v1.py -v --tb=short --no-header
Unified integration layer
Module inventory:
integration/contracts/shared_contracts.pyintegration/contracts/validators.pyintegration/boundary_enforcement.pyintegration/adapters/review_adapter.pyintegration/adapters/corpus_review_adapter.pyintegration/orchestrate_unified.pyintegration/db_safety.py
Coverage
- turn / session / registry schema validation
- transcript format and speaker ontology conformance
- namespace and path constraints
- explicit cross-domain allow/deny operations
Delegation model
Review and corpus data-health checks are adapter-owned: ReviewAdapter.check_data_health and CorpusReviewAdapter.check_data_health. The orchestrator consumes adapter results, then applies boundary enforcement.
How to run
Core GAUNTLET
python3 gauntlet.py python3 gauntlet.py --rebuild-corpus ORCHESTRATED=1 python3 gauntlet_2.py python3 run_full_boat.py
Unified checks & reports
ORCHESTRATED=1 make unified-check python3 -m integration.orchestrate_unified python3 -m integration.orchestrate_unified --verbose python3 -m integration.orchestrate_unified --domain review python3 -m integration.orchestrate_unified --strict-warnings ORCHESTRATED=1 make unified-test /opt/homebrew/bin/pytest integration/tests/ -v ORCHESTRATED=1 make unified-report-json ORCHESTRATED=1 make unified-check-strict
Strict warning mode
--strict-warningsupgrades warnings to a failing exit code.ORCHESTRATED=1 make unified-check-strictis the Make wrapper.- Warnings appear in the stdout summary and persist in the JSON report under
strict_warnings.
iOS deployment
Run from /Users/tylermontell/Projects/meetLab_archive/relays. deploy-ios is the gated wrapper that ties CorpusReview installation to the GAUNTLET tier sequence.
Default invocation
ORCHESTRATED=1 make deploy-ios ORCHESTRATED=1 make deploy-ios \ MAC_LAN_IP=192.168.86.63 \ IOS_DEVICE_ID=00008120-00067C901E90201E \ IOS_PROJECT_DIR=/Users/tylermontell/Projects/CorpusReview
Behavior
- Enforced gate sequence:
check-drift → tier-0 → tier-1 → tier-2. - Hard orchestration guard:
ORCHESTRATED=1. - Patches
127.0.0.1to$(MAC_LAN_IP)inData/NetworkClient.swiftandData/ReviewStore.swift. - Builds via
xcodebuild, installs viaxcrun devicectl device install app.
Report outputs
/Users/tylermontell/Projects/meetLab_archive/relays/artifacts/unified_integration_report.json /Users/tylermontell/Projects/meetLab_archive/relays/artifacts/unified_integration_report.md
DB vs filesystem boundary
GAUNTLET writes to two distinct surfaces. The DB is the relational substrate; the filesystem holds artifacts, reports, and state. They are not interchangeable.
| Surface | Path | Owner | Mutation rule |
|---|---|---|---|
| Canonical corpus DB | corpus_v1_5_2.db | Phase 2 (init_corpus_db_v1.py, ingest_transcripts_v1.py) | Mutate only via tooling. Never hand-edit. |
| Legacy DB | corpus_v1.db | quarantine candidate | Read-only. Deprecated. |
| Augmented transcripts | transcripts_augmented/ | Phase 1 step 7 | Regenerate by re-running Phase 1 steps 1–7. |
| Reports / unified output | artifacts/ | Phase 3 + integration orchestrator | Regenerable. Safe to delete; will be re-emitted. |
| Unified integration report | artifacts/unified_integration_report.{json,md} | integration.orchestrate_unified | Overwritten on each run. |
| Pytest fixtures & outputs | integration/tests/ | tests | Read-only outside test runs. |
DB mutation safety contract
All DB mutations route through integration/db_safety.py. The contract is:
DBSafetyContext(dry_run=True)opens DBs read-only and blocks backup and mutation helpers.backup()creates timestamped backups (.backup_YYYYMMDDTHHMMSSZ.db) before mutation.row_counts(),row_count_diff(), andassert_counts_non_decreasing()enforce explicit row-count gates.integrity_check()andquick_check()are required pre/post-mutation gates.DBSafetyContext.safe_mutate(path)wraps backup + integrity checks around mutating operations.
Operational constraints
corpus_v1_5_2.dbremains canonical for relational corpus queries.corpus_v1.dbis legacy and must remain read-only until formal quarantine.- Do not hand-edit DB files directly. All mutation goes through tooling guarded by
db_safety.py. - Re-run GAUNTLET before Tier 3 when transcript content changes.
- Run
ORCHESTRATED=1 make unified-checkbefore cross-domain promotion or release. - Run
ORCHESTRATED=1 make deploy-iosfor physical iOS deployment so validation gates execute before install.
PASS (exit 0). Findings: 0 issues, 2 warnings. Tests: 90/90 pass. Boundary deny rules verified.Warnings (non-blocking):
corpus_v1.db present but deprecated · 39 UNKNOWN speakers in turns.json.
Failure triage
Map the symptom to a breakpoint, then run the first diagnostic command. Resolve, then re-run the next-broader gate.
| Breakpoint | First diagnostic | Next gate |
|---|---|---|
| Core ingest / schema | python3 gauntlet.py --rebuild-corpus | python3 tools/audit_ingestion_v1.py |
| Ingestion audit fails | python3 tools/audit_ingestion_v1.py | Inspect turns.json; re-run Phase 2 steps 9–11. |
| Speaker distribution drift | python3 tools/audit_speaker_distribution_v1.py --db corpus_v1_5_2.db | Run speaker reconciliation workflow before re-ingestion of affected queues. |
| Marker parity failure | python3 -m pytest tests/test_inject_turn_markers_v1.py -v --tb=short --no-header | Re-run Phase 1 steps 6–7; re-ingest. |
| Unified contract / boundary | python3 -m integration.orchestrate_unified --verbose | Isolate by --domain archive|review|corpus; re-run make unified-check. |
| Strict-mode warning failure | ORCHESTRATED=1 make unified-check-strict | Inspect strict_warnings in JSON report; resolve upstream. |
| iOS deploy gate stops | Read failing tier from gate sequence (check-drift → tier-0 → tier-1 → tier-2). | Resolve at that tier; re-run make deploy-ios. |
| DB integrity suspect | DBSafetyContext.integrity_check() | If failed: restore from latest .backup_YYYYMMDDTHHMMSSZ.db; re-run mutation under safe_mutate(). |
Refresh status snapshot
ORCHESTRATED=1 make unified-check ORCHESTRATED=1 make unified-test