Configuration¶
Dift supports reusable configuration files for cleaner, reproducible, and automation-friendly workflows.
Configuration files help teams:
- reduce repetitive CLI commands
- standardize validation rules
- simplify CI/CD workflows
- manage environments
- reuse comparison settings
- centralize drift policies
Supported Configuration Formats¶
Dift supports:
- YAML (
.yaml,.yml) - TOML (
.toml) - JSON (
.json)
Basic Configuration Example¶
YAML¶
old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html
output: reports/report.html
Run:
dift --config config.yaml
TOML Example¶
old_dataset = "examples/old.csv"
new_dataset = "examples/new.csv"
key = "customer_id"
threshold = 0.1
report = "json"
output = "reports/report.json"
JSON Example¶
{
"old_dataset": "examples/old.csv",
"new_dataset": "examples/new.csv",
"key": "customer_id",
"threshold": 0.1,
"report": "csv",
"output": "reports/report.csv"
}
Dataset Paths in Config Files¶
Dift can load dataset paths directly from configuration files.
This allows reusable workflows without typing datasets repeatedly.
Example¶
old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
report: html
Run:
dift --config config.yaml
CLI Override Behavior¶
CLI arguments override configuration values.
Example¶
dift prod.csv staging.csv \
--config config.yaml \
--report json \
--output override.json
Dift will use:
- datasets from CLI
- report/output from CLI
- remaining values from config
Configuration Priority¶
Dift resolves settings using:
CLI Arguments > Profiles > Config Files > Defaults
This provides flexibility while keeping workflows reproducible.
Supported Configuration Fields¶
| Field | Description |
|---|---|
| old_dataset | Original dataset |
| new_dataset | New dataset |
| key | Row matching key |
| threshold | Numeric drift threshold |
| report | Report format |
| output | Report output path |
| output_dir | Output directory |
| template | HTML template |
| history | Enable history tracking |
| history_dir | History directory |
| strict_exit_codes | Risk-based exit codes |
| quiet | Suppress non-error output |
| no_color | Disable terminal colors |
Threshold Configurations¶
Dift supports reusable threshold policies.
Example¶
thresholds:
numeric: 0.1
categorical: 0.2
outlier: 0.15
Column-Level Threshold Overrides¶
thresholds:
numeric: 0.1
columns:
revenue:
numeric: 0.05
transactions:
outlier: 0.03
Useful for:
- sensitive metrics
- production monitoring
- anomaly-heavy datasets
Environment-Based Configurations¶
Dift supports environment-specific workflows.
Useful for:
- development
- staging
- production
- CI/CD pipelines
Example Environment Config¶
key: customer_id
report: html
environments:
development:
old_dataset: examples/dev_old.csv
new_dataset: examples/dev_new.csv
threshold: 0.2
staging:
old_dataset: staging_old.csv
new_dataset: staging_new.csv
threshold: 0.15
production:
old_dataset: prod_old.csv
new_dataset: prod_new.csv
threshold: 0.1
Select Environment¶
dift --config config_env.yaml --env production
Environment Variable Support¶
Dift supports environment variable interpolation.
Example¶
old_dataset: ${OLD_DATASET}
new_dataset: ${NEW_DATASET}
Set Variables¶
Linux / macOS¶
export OLD_DATASET=data/old.csv
export NEW_DATASET=data/new.csv
Windows PowerShell¶
$env:OLD_DATASET="data/old.csv"
$env:NEW_DATASET="data/new.csv"
Run¶
dift --config config_env.yaml
Missing Environment Variables¶
If a required variable is missing, Dift shows a helpful validation error.
Example:
Error: Missing environment variable 'OLD_DATASET'
Output Directory Support¶
Save reports into directories:
report: html
output_dir: reports/
Generated filename:
reports/dift_report.html
HTML Templates¶
Specify HTML templates directly inside configs.
Example¶
report: html
template: enterprise
Available templates:
- default
- clean
- compact
- enterprise
- dark
History Tracking¶
Enable persistent comparison history.
Example¶
history: true
history_dir: reports/history
Strict Exit Codes¶
Automation-friendly validation:
strict_exit_codes: true
Exit codes:
| Code | Meaning |
|---|---|
| 0 | Low risk |
| 1 | Medium risk |
| 2 | High risk |
| 3 | Runtime error |
Quiet Mode¶
Useful for automation workflows.
quiet: true
Disable Colors¶
no_color: true
Useful for:
- CI logs
- plain-text terminals
- log aggregation systems
Example Full Configuration¶
old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
report: html
template: enterprise
output: reports/report.html
thresholds:
numeric: 0.1
categorical: 0.2
outlier: 0.15
history: true
history_dir: reports/history
strict_exit_codes: true
quiet: false
no_color: false
Example Production Workflow¶
environments:
production:
old_dataset: ${PROD_OLD}
new_dataset: ${PROD_NEW}
report: json
output: reports/prod_report.json
threshold: 0.05
strict_exit_codes: true
quiet: true
Run:
dift --config production.yaml --env production
Validation Errors¶
Dift provides actionable validation messages for:
- invalid configs
- missing fields
- unsupported formats
- invalid templates
- missing environment variables
- conflicting options
Example:
Error: Unsupported report format 'xml'
Best Practices¶
Recommended practices:
- store reusable configs in version control
- separate environments clearly
- use profiles for recurring workflows
- use environment variables for secrets
- standardize thresholds across teams
- enable strict exit codes in CI/CD
Common Use Cases¶
CI/CD Validation¶
report: json
strict_exit_codes: true
quiet: true
Executive Reporting¶
report: html
template: enterprise
ML Drift Monitoring¶
thresholds:
numeric: 0.03
Historical Monitoring¶
history: true
Next Steps¶
Continue with:
- Thresholds
- Profiles
- Batch Comparisons
- Scheduling
- Automation
- Connectors
- Reports