Examples¶
This document contains practical examples for using Dift across local datasets, databases, warehouses, automation workflows, reporting, scheduling, and validation pipelines.
Examples are organized by workflow category.
Basic Dataset Comparison¶
Compare two CSV datasets:
dift examples/old.csv examples/new.csv \
--key customer_id
Compare Parquet Files¶
dift examples/old.parquet examples/new.parquet \
--key customer_id
Compare Excel Files¶
dift examples/old.xlsx examples/new.xlsx \
--key customer_id
Compare JSON Files¶
dift examples/old.json examples/new.json \
--key customer_id
Numeric Drift Detection¶
Detect numeric drift using thresholds:
dift examples/old_drift.csv examples/new_drift.csv \
--key id \
--threshold 0.1
Generate JSON Report¶
dift examples/old.csv examples/new.csv \
--key customer_id \
--report json \
--output report.json
Generate CSV Report¶
dift examples/old.csv examples/new.csv \
--key customer_id \
--report csv \
--output report.csv
Generate Excel Report¶
dift examples/old.csv examples/new.csv \
--key customer_id \
--report excel \
--output report.xlsx
Generate HTML Report¶
dift examples/old.csv examples/new.csv \
--key customer_id \
--report html \
--output report.html
Use HTML Templates¶
Generate report using a template:
dift examples/old.csv examples/new.csv \
--key customer_id \
--report html \
--template dark \
--output report.html
Available HTML Templates¶
Supported templates:
- default
- clean
- compact
- enterprise
- dark
Save Reports to Output Directory¶
dift examples/old.csv examples/new.csv \
--report json \
--output-dir reports/
Generated filenames:
dift_report.json
dift_report.csv
dift_report.xlsx
dift_report.html
Use YAML Config File¶
dift --config examples/config_sample.yaml
Example YAML Config¶
old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html
Use TOML Config¶
dift --config examples/config_sample.toml
Example TOML Config¶
old_dataset = "examples/old.csv"
new_dataset = "examples/new.csv"
key = "customer_id"
report = "json"
Use JSON Config¶
dift --config examples/config_sample.json
Example JSON Config¶
{
"old_dataset": "examples/old.csv",
"new_dataset": "examples/new.csv",
"key": "customer_id",
"report": "csv"
}
Dataset Paths Inside Configs¶
dift --config examples/config_with_datasets.yaml
CLI Override Example¶
dift examples/override_old.csv examples/override_new.csv \
--config examples/config_sample.yaml \
--report json
CLI arguments override config values.
Threshold Config Example¶
thresholds:
numeric: 0.1
categorical: 0.2
outlier: 0.15
Column-Level Threshold Override¶
thresholds:
columns:
revenue:
numeric: 0.05
outlier: 0.1
Use Threshold Config¶
dift --config examples/config_thresholds.yaml
Environment-Based Configs¶
Run environment-specific workflow:
dift --config examples/config_env.yaml \
--env production
Example Environment Config¶
environments:
development:
threshold: 0.2
production:
threshold: 0.05
Environment Variable Interpolation¶
Example config:
old_dataset: ${OLD_DATASET}
new_dataset: ${NEW_DATASET}
Set variables:
export OLD_DATASET=examples/old.csv
export NEW_DATASET=examples/new.csv
Create Saved Profile¶
dift profile create nightly-check \
--old examples/old.csv \
--new examples/new.csv \
--key customer_id \
--report html
Run Saved Profile¶
dift profile run nightly-check
List Profiles¶
dift profile list
Show Profile Details¶
dift profile show nightly-check
Delete Profile¶
dift profile delete nightly-check
Generate Cron Schedule¶
dift schedule cron nightly-check
Example output:
0 2 * * * dift profile run nightly-check --history --strict-exit-codes
Create Saved Schedule¶
dift schedule create daily-check \
--profile nightly-check \
--cron "0 2 * * *"
List Schedules¶
dift schedule list
Run Schedule¶
dift schedule run daily-check
Delete Schedule¶
dift schedule delete daily-check
Batch Dataset Comparison¶
dift batch \
--old-dir data/old \
--new-dir data/new \
--key id
Batch HTML Reports¶
dift batch \
--old-dir data/old \
--new-dir data/new \
--report html \
--output-dir reports/batch
Continue On Error¶
dift batch \
--old-dir data/old \
--new-dir data/new \
--continue-on-error
Stop On Error¶
dift batch \
--old-dir data/old \
--new-dir data/new \
--stop-on-error
Save Comparison History¶
dift examples/old.csv examples/new.csv \
--history
Custom History Directory¶
dift examples/old.csv examples/new.csv \
--history \
--history-dir reports/history
List Comparison History¶
dift history list
Show History Record¶
dift history show 1
Clear History¶
dift history clear
Strict Exit Codes¶
dift prod.csv staging.csv \
--strict-exit-codes
Quiet Mode¶
dift old.csv new.csv \
--quiet
Disable Colored Output¶
dift old.csv new.csv \
--no-color
Full Automation Workflow¶
dift prod.csv staging.csv \
--key customer_id \
--strict-exit-codes \
--quiet \
--no-color
DuckDB Comparison¶
dift duckdb:///examples/warehouse.duckdb:customers_old \
duckdb:///examples/warehouse.duckdb:customers_new \
--key customer_id
DuckDB URI Format¶
duckdb:///path/to/database.duckdb:table_name
SQLite Comparison¶
dift sqlite:///examples/data.db:customers_old \
sqlite:///examples/data.db:customers_new \
--key customer_id
PostgreSQL Comparison¶
dift postgresql://user:password@localhost:5432/sales_db:customers_old \
postgresql://user:password@localhost:5432/sales_db:customers_new \
--key customer_id
PostgreSQL Psycopg Example¶
dift postgresql+psycopg://user:password@localhost:5432/sales_db:customers_old \
postgresql+psycopg://user:password@localhost:5432/sales_db:customers_new \
--key customer_id
MySQL Comparison¶
dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_old \
dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_new \
--key customer_id
Redshift Comparison¶
dift redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_old \
redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_new \
--key order_id
Snowflake Comparison¶
dift snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_old \
snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_new \
--key order_id
BigQuery Comparison¶
dift bigquery://analytics.sales.orders_old \
bigquery://analytics.sales.orders_new \
--key order_id
Install SQL Support¶
pip install sqlalchemy
Install PostgreSQL Driver¶
pip install psycopg2-binary
Install MySQL Driver¶
pip install pymysql
Install Redshift Dependencies¶
pip install sqlalchemy-redshift redshift-connector
Install Snowflake Support¶
pip install snowflake-sqlalchemy
Install BigQuery Support¶
pip install google-cloud-bigquery db-dtypes
Install DuckDB Support¶
pip install duckdb
GitHub Actions Example¶
name: Dift Validation
on:
push:
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Dift
run: pip install dift-cli
- name: Run Validation
run: |
dift old.csv new.csv \
--strict-exit-codes \
--quiet \
--no-color
Airflow Example¶
from airflow.operators.bash import BashOperator
validate = BashOperator(
task_id="validate_data",
bash_command="""
dift old.csv new.csv \
--strict-exit-codes
"""
)
ETL Validation Example¶
dift before.csv after.csv
ML Drift Monitoring Example¶
dift train_v1.csv train_v2.csv \
--threshold 0.1
Production vs Staging Example¶
dift prod.csv staging.csv \
--key id
Multi-Table Validation Example¶
dift batch \
--old-dir warehouse_snapshot_1 \
--new-dir warehouse_snapshot_2 \
--report html
Historical Drift Monitoring Example¶
dift prod.csv staging.csv \
--history
Example Console Output¶
╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM │
╰─────────────────────────╯
Warnings
Numeric drift:
'revenue'
mean shift 900.00%
(high, threshold 0.1)
Outlier spike:
'revenue' increased by 100.00%
(high)
Categorical shift:
'segment' max frequency shift 60.00%
(high)
Example Directory Structure¶
Most examples in this documentation use datasets and configuration files located in the project's examples/ directory.
examples/
├── old.csv
├── new.csv
├── old.parquet
├── new.parquet
├── old.xlsx
├── new.xlsx
├── old.json
├── new.json
├── old_drift.csv
├── new_drift.csv
├── config_sample.yaml
├── config_sample.toml
├── config_sample.json
├── config_thresholds.yaml
├── config_env.yaml
└── warehouse.duckdb
Related Documentation¶
See also:
- configuration.md
- automation.md
- profiles.md
- history.md
- connectors/sql.md
- reports.md