Quick Start¶

This guide helps you get started with Dift quickly.

You will learn how to:

compare datasets
detect drift
generate reports
use configuration files
run batch comparisons
automate workflows

Your First Comparison¶

Compare two CSV files:

dift examples/old.csv examples/new.csv --key customer_id

This command compares:

schema changes
row changes
null spikes
duplicate spikes
drift patterns
outlier changes

Understanding the `--key`¶

The --key option defines the column used to match rows across datasets.

Example:

--key customer_id

Typical keys:

customer_id
order_id
transaction_id
product_id

Example Output¶

╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM      │
╰─────────────────────────╯

Warnings

Numeric drift:
'revenue'
mean shift 900.00%
(high, threshold 0.1)

Outlier spike:
'revenue' increased by 100.00%
(high)

Categorical shift:
'segment' max frequency shift 60.00%
(high)

Generate Reports¶

JSON Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report json \
  --output report.json

CSV Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report csv \
  --output report.csv

Excel Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report excel \
  --output report.xlsx

HTML Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report html \
  --output report.html

HTML Templates¶

Customize HTML report appearance:

dift examples/old.csv examples/new.csv \
  --report html \
  --template dark \
  --output report.html

Available templates:

default
clean
compact
enterprise
dark

Drift Thresholds¶

Control drift sensitivity using --threshold.

Default threshold:

0.1

Example:

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --threshold 0.2

Lower values detect smaller changes.

Higher values reduce sensitivity.

Output Directory Support¶

Save reports into a directory with auto-generated filenames:

dift examples/old.csv examples/new.csv \
  --report html \
  --output-dir reports/

Generated filenames include:

dift_report.json
dift_report.csv
dift_report.xlsx
dift_report.html

Using Configuration Files¶

Run comparisons using reusable configuration files.

YAML Example¶

old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html

Run:

dift --config examples/config_sample.yaml

Environment-Based Configs¶

Select reusable environments:

dift --config examples/config_env.yaml --env production

Useful for:

development
staging
production
CI/CD workflows

Saved Profiles¶

Create reusable comparison workflows.

Create Profile¶

dift profile create nightly-check \
  --old examples/old.csv \
  --new examples/new.csv \
  --key customer_id \
  --report html

Run Profile¶

dift profile run nightly-check

Batch Comparisons¶

Compare multiple dataset pairs automatically.

Example¶

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --key customer_id

Useful for:

ETL validation
warehouse monitoring
scheduled quality checks
multi-table validation

Comparison History¶

Save historical comparison results:

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --history

View history:

dift history list

Automation-Friendly Mode¶

Use Dift inside:

CI/CD pipelines
Airflow
Jenkins
Prefect
Dagster
cron jobs

Strict Exit Codes¶

dift prod.csv staging.csv \
  --key customer_id \
  --strict-exit-codes

Exit codes:

Code	Meaning
0	Low risk
1	Medium risk
2	High risk
3	Runtime error

Quiet Mode¶

dift old.csv new.csv --quiet

Disable Colors¶

dift old.csv new.csv --no-color

DuckDB Example¶

dift duckdb:///warehouse.duckdb:customers_old \
     duckdb:///warehouse.duckdb:customers_new \
     --key customer_id

BigQuery Example¶

dift bigquery://analytics.sales.orders_old \
     bigquery://analytics.sales.orders_new \
     --key order_id

PostgreSQL Example¶

dift postgresql://user:password@localhost:5432/sales:customers_old \
     postgresql://user:password@localhost:5432/sales:customers_new \
     --key customer_id

MySQL Example¶

dift mysql+pymysql://user:password@localhost:3306/sales:orders_old \
     mysql+pymysql://user:password@localhost:3306/sales:orders_new \
     --key order_id

Snowflake Example¶

dift snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_old \
     snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_new \
     --key order_id

Redshift Example¶

dift redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_old \
     redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_new \
     --key order_id

Common Use Cases¶

ETL Validation¶

dift before.csv after.csv

ML Dataset Drift Detection¶

dift train_v1.csv train_v2.csv --threshold 0.1

Production vs Staging Validation¶

dift prod.csv staging.csv --key id

Historical Monitoring¶

dift prod.csv staging.csv \
  --key customer_id \
  --history

Next Steps¶

Continue with:

Usage Guide
Reports
Configuration
Thresholds
Automation
Connectors
Developer Architecture