Examples¶

This document contains practical examples for using Dift across local datasets, databases, warehouses, automation workflows, reporting, scheduling, and validation pipelines.

Examples are organized by workflow category.

Basic Dataset Comparison¶

Compare two CSV datasets:

dift examples/old.csv examples/new.csv \
  --key customer_id

Compare Parquet Files¶

dift examples/old.parquet examples/new.parquet \
  --key customer_id

Compare Excel Files¶

dift examples/old.xlsx examples/new.xlsx \
  --key customer_id

Compare JSON Files¶

dift examples/old.json examples/new.json \
  --key customer_id

Numeric Drift Detection¶

Detect numeric drift using thresholds:

dift examples/old_drift.csv examples/new_drift.csv \
  --key id \
  --threshold 0.1

Generate JSON Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report json \
  --output report.json

Generate CSV Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report csv \
  --output report.csv

Generate Excel Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report excel \
  --output report.xlsx

Generate HTML Report¶

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report html \
  --output report.html

Use HTML Templates¶

Generate report using a template:

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report html \
  --template dark \
  --output report.html

Available HTML Templates¶

Supported templates:

default
clean
compact
enterprise
dark

Save Reports to Output Directory¶

dift examples/old.csv examples/new.csv \
  --report json \
  --output-dir reports/

Generated filenames:

dift_report.json
dift_report.csv
dift_report.xlsx
dift_report.html

Use YAML Config File¶

dift --config examples/config_sample.yaml

Example YAML Config¶

old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html

Use TOML Config¶

dift --config examples/config_sample.toml

Example TOML Config¶

old_dataset = "examples/old.csv"
new_dataset = "examples/new.csv"
key = "customer_id"
report = "json"

Use JSON Config¶

dift --config examples/config_sample.json

Example JSON Config¶

{
  "old_dataset": "examples/old.csv",
  "new_dataset": "examples/new.csv",
  "key": "customer_id",
  "report": "csv"
}

Dataset Paths Inside Configs¶

dift --config examples/config_with_datasets.yaml

CLI Override Example¶

dift examples/override_old.csv examples/override_new.csv \
  --config examples/config_sample.yaml \
  --report json

CLI arguments override config values.

Threshold Config Example¶

thresholds:
  numeric: 0.1
  categorical: 0.2
  outlier: 0.15

Column-Level Threshold Override¶

thresholds:
  columns:
    revenue:
      numeric: 0.05
      outlier: 0.1

Use Threshold Config¶

dift --config examples/config_thresholds.yaml

Environment-Based Configs¶

Run environment-specific workflow:

dift --config examples/config_env.yaml \
  --env production

Example Environment Config¶

environments:
  development:
    threshold: 0.2

  production:
    threshold: 0.05

Environment Variable Interpolation¶

Example config:

old_dataset: ${OLD_DATASET}
new_dataset: ${NEW_DATASET}

Set variables:

export OLD_DATASET=examples/old.csv
export NEW_DATASET=examples/new.csv

Create Saved Profile¶

dift profile create nightly-check \
  --old examples/old.csv \
  --new examples/new.csv \
  --key customer_id \
  --report html

Run Saved Profile¶

dift profile run nightly-check

List Profiles¶

dift profile list

Show Profile Details¶

dift profile show nightly-check

Delete Profile¶

dift profile delete nightly-check

Generate Cron Schedule¶

dift schedule cron nightly-check

Example output:

0 2 * * * dift profile run nightly-check --history --strict-exit-codes

Create Saved Schedule¶

dift schedule create daily-check \
  --profile nightly-check \
  --cron "0 2 * * *"

List Schedules¶

dift schedule list

Run Schedule¶

dift schedule run daily-check

Delete Schedule¶

dift schedule delete daily-check

Batch Dataset Comparison¶

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --key id

Batch HTML Reports¶

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --report html \
  --output-dir reports/batch

Continue On Error¶

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --continue-on-error

Stop On Error¶

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --stop-on-error

Save Comparison History¶

dift examples/old.csv examples/new.csv \
  --history

Custom History Directory¶

dift examples/old.csv examples/new.csv \
  --history \
  --history-dir reports/history

List Comparison History¶

dift history list

Show History Record¶

dift history show 1

Clear History¶

dift history clear

Strict Exit Codes¶

dift prod.csv staging.csv \
  --strict-exit-codes

Quiet Mode¶

dift old.csv new.csv \
  --quiet

Disable Colored Output¶

dift old.csv new.csv \
  --no-color

Full Automation Workflow¶

dift prod.csv staging.csv \
  --key customer_id \
  --strict-exit-codes \
  --quiet \
  --no-color

DuckDB Comparison¶

dift duckdb:///examples/warehouse.duckdb:customers_old \
     duckdb:///examples/warehouse.duckdb:customers_new \
     --key customer_id

DuckDB URI Format¶

duckdb:///path/to/database.duckdb:table_name

SQLite Comparison¶

dift sqlite:///examples/data.db:customers_old \
     sqlite:///examples/data.db:customers_new \
     --key customer_id

PostgreSQL Comparison¶

dift postgresql://user:password@localhost:5432/sales_db:customers_old \
     postgresql://user:password@localhost:5432/sales_db:customers_new \
     --key customer_id

PostgreSQL Psycopg Example¶

dift postgresql+psycopg://user:password@localhost:5432/sales_db:customers_old \
     postgresql+psycopg://user:password@localhost:5432/sales_db:customers_new \
     --key customer_id

MySQL Comparison¶

dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_old \
     dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_new \
     --key customer_id

Redshift Comparison¶

dift redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_old \
     redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_new \
     --key order_id

Snowflake Comparison¶

dift snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_old \
     snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_new \
     --key order_id

BigQuery Comparison¶

dift bigquery://analytics.sales.orders_old \
     bigquery://analytics.sales.orders_new \
     --key order_id

Install SQL Support¶

pip install sqlalchemy

Install PostgreSQL Driver¶

pip install psycopg2-binary

Install MySQL Driver¶

pip install pymysql

Install Redshift Dependencies¶

pip install sqlalchemy-redshift redshift-connector

Install Snowflake Support¶

pip install snowflake-sqlalchemy

Install BigQuery Support¶

pip install google-cloud-bigquery db-dtypes

Install DuckDB Support¶

pip install duckdb

GitHub Actions Example¶

name: Dift Validation

on:
  push:

jobs:
  validate:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Install Dift
        run: pip install dift-cli

      - name: Run Validation
        run: |
          dift old.csv new.csv \
            --strict-exit-codes \
            --quiet \
            --no-color

Airflow Example¶

from airflow.operators.bash import BashOperator

validate = BashOperator(
    task_id="validate_data",
    bash_command="""
    dift old.csv new.csv \
      --strict-exit-codes
    """
)

ETL Validation Example¶

dift before.csv after.csv

ML Drift Monitoring Example¶

dift train_v1.csv train_v2.csv \
  --threshold 0.1

Production vs Staging Example¶

dift prod.csv staging.csv \
  --key id

Multi-Table Validation Example¶

dift batch \
  --old-dir warehouse_snapshot_1 \
  --new-dir warehouse_snapshot_2 \
  --report html

Historical Drift Monitoring Example¶

dift prod.csv staging.csv \
  --history

Example Console Output¶

╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM      │
╰─────────────────────────╯

Warnings

Numeric drift:
'revenue'
mean shift 900.00%
(high, threshold 0.1)

Outlier spike:
'revenue' increased by 100.00%
(high)

Categorical shift:
'segment' max frequency shift 60.00%
(high)

Example Directory Structure¶

Most examples in this documentation use datasets and configuration files located in the project's examples/ directory.

examples/
├── old.csv
├── new.csv
├── old.parquet
├── new.parquet
├── old.xlsx
├── new.xlsx
├── old.json
├── new.json
├── old_drift.csv
├── new_drift.csv
├── config_sample.yaml
├── config_sample.toml
├── config_sample.json
├── config_thresholds.yaml
├── config_env.yaml
└── warehouse.duckdb