Skip to content

Examples

This document contains practical examples for using Dift across local datasets, databases, warehouses, automation workflows, reporting, scheduling, and validation pipelines.

Examples are organized by workflow category.


Basic Dataset Comparison

Compare two CSV datasets:

dift examples/old.csv examples/new.csv \
  --key customer_id

Compare Parquet Files

dift examples/old.parquet examples/new.parquet \
  --key customer_id

Compare Excel Files

dift examples/old.xlsx examples/new.xlsx \
  --key customer_id

Compare JSON Files

dift examples/old.json examples/new.json \
  --key customer_id

Numeric Drift Detection

Detect numeric drift using thresholds:

dift examples/old_drift.csv examples/new_drift.csv \
  --key id \
  --threshold 0.1

Generate JSON Report

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report json \
  --output report.json

Generate CSV Report

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report csv \
  --output report.csv

Generate Excel Report

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report excel \
  --output report.xlsx

Generate HTML Report

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report html \
  --output report.html

Use HTML Templates

Generate report using a template:

dift examples/old.csv examples/new.csv \
  --key customer_id \
  --report html \
  --template dark \
  --output report.html

Available HTML Templates

Supported templates:

  • default
  • clean
  • compact
  • enterprise
  • dark

Save Reports to Output Directory

dift examples/old.csv examples/new.csv \
  --report json \
  --output-dir reports/

Generated filenames:

dift_report.json
dift_report.csv
dift_report.xlsx
dift_report.html

Use YAML Config File

dift --config examples/config_sample.yaml

Example YAML Config

old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html

Use TOML Config

dift --config examples/config_sample.toml

Example TOML Config

old_dataset = "examples/old.csv"
new_dataset = "examples/new.csv"
key = "customer_id"
report = "json"

Use JSON Config

dift --config examples/config_sample.json

Example JSON Config

{
  "old_dataset": "examples/old.csv",
  "new_dataset": "examples/new.csv",
  "key": "customer_id",
  "report": "csv"
}

Dataset Paths Inside Configs

dift --config examples/config_with_datasets.yaml

CLI Override Example

dift examples/override_old.csv examples/override_new.csv \
  --config examples/config_sample.yaml \
  --report json

CLI arguments override config values.


Threshold Config Example

thresholds:
  numeric: 0.1
  categorical: 0.2
  outlier: 0.15

Column-Level Threshold Override

thresholds:
  columns:
    revenue:
      numeric: 0.05
      outlier: 0.1

Use Threshold Config

dift --config examples/config_thresholds.yaml

Environment-Based Configs

Run environment-specific workflow:

dift --config examples/config_env.yaml \
  --env production

Example Environment Config

environments:
  development:
    threshold: 0.2

  production:
    threshold: 0.05

Environment Variable Interpolation

Example config:

old_dataset: ${OLD_DATASET}
new_dataset: ${NEW_DATASET}

Set variables:

export OLD_DATASET=examples/old.csv
export NEW_DATASET=examples/new.csv

Create Saved Profile

dift profile create nightly-check \
  --old examples/old.csv \
  --new examples/new.csv \
  --key customer_id \
  --report html

Run Saved Profile

dift profile run nightly-check

List Profiles

dift profile list

Show Profile Details

dift profile show nightly-check

Delete Profile

dift profile delete nightly-check

Generate Cron Schedule

dift schedule cron nightly-check

Example output:

0 2 * * * dift profile run nightly-check --history --strict-exit-codes

Create Saved Schedule

dift schedule create daily-check \
  --profile nightly-check \
  --cron "0 2 * * *"

List Schedules

dift schedule list

Run Schedule

dift schedule run daily-check

Delete Schedule

dift schedule delete daily-check

Batch Dataset Comparison

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --key id

Batch HTML Reports

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --report html \
  --output-dir reports/batch

Continue On Error

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --continue-on-error

Stop On Error

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --stop-on-error

Save Comparison History

dift examples/old.csv examples/new.csv \
  --history

Custom History Directory

dift examples/old.csv examples/new.csv \
  --history \
  --history-dir reports/history

List Comparison History

dift history list

Show History Record

dift history show 1

Clear History

dift history clear

Strict Exit Codes

dift prod.csv staging.csv \
  --strict-exit-codes

Quiet Mode

dift old.csv new.csv \
  --quiet

Disable Colored Output

dift old.csv new.csv \
  --no-color

Full Automation Workflow

dift prod.csv staging.csv \
  --key customer_id \
  --strict-exit-codes \
  --quiet \
  --no-color

DuckDB Comparison

dift duckdb:///examples/warehouse.duckdb:customers_old \
     duckdb:///examples/warehouse.duckdb:customers_new \
     --key customer_id

DuckDB URI Format

duckdb:///path/to/database.duckdb:table_name

SQLite Comparison

dift sqlite:///examples/data.db:customers_old \
     sqlite:///examples/data.db:customers_new \
     --key customer_id

PostgreSQL Comparison

dift postgresql://user:password@localhost:5432/sales_db:customers_old \
     postgresql://user:password@localhost:5432/sales_db:customers_new \
     --key customer_id

PostgreSQL Psycopg Example

dift postgresql+psycopg://user:password@localhost:5432/sales_db:customers_old \
     postgresql+psycopg://user:password@localhost:5432/sales_db:customers_new \
     --key customer_id

MySQL Comparison

dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_old \
     dift mysql+pymysql://user:password@localhost:3306/sales_db:customers_new \
     --key customer_id

Redshift Comparison

dift redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_old \
     redshift+redshift_connector://user:password@cluster.region.redshift.amazonaws.com:5439/dev:orders_new \
     --key order_id

Snowflake Comparison

dift snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_old \
     snowflake://user:password@account/db/schema?warehouse=compute_wh:orders_new \
     --key order_id

BigQuery Comparison

dift bigquery://analytics.sales.orders_old \
     bigquery://analytics.sales.orders_new \
     --key order_id

Install SQL Support

pip install sqlalchemy

Install PostgreSQL Driver

pip install psycopg2-binary

Install MySQL Driver

pip install pymysql

Install Redshift Dependencies

pip install sqlalchemy-redshift redshift-connector

Install Snowflake Support

pip install snowflake-sqlalchemy

Install BigQuery Support

pip install google-cloud-bigquery db-dtypes

Install DuckDB Support

pip install duckdb

GitHub Actions Example

name: Dift Validation

on:
  push:

jobs:
  validate:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Install Dift
        run: pip install dift-cli

      - name: Run Validation
        run: |
          dift old.csv new.csv \
            --strict-exit-codes \
            --quiet \
            --no-color

Airflow Example

from airflow.operators.bash import BashOperator

validate = BashOperator(
    task_id="validate_data",
    bash_command="""
    dift old.csv new.csv \
      --strict-exit-codes
    """
)

ETL Validation Example

dift before.csv after.csv

ML Drift Monitoring Example

dift train_v1.csv train_v2.csv \
  --threshold 0.1

Production vs Staging Example

dift prod.csv staging.csv \
  --key id

Multi-Table Validation Example

dift batch \
  --old-dir warehouse_snapshot_1 \
  --new-dir warehouse_snapshot_2 \
  --report html

Historical Drift Monitoring Example

dift prod.csv staging.csv \
  --history

Example Console Output

╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM      │
╰─────────────────────────╯

Warnings

Numeric drift:
'revenue'
mean shift 900.00%
(high, threshold 0.1)

Outlier spike:
'revenue' increased by 100.00%
(high)

Categorical shift:
'segment' max frequency shift 60.00%
(high)

Example Directory Structure

Most examples in this documentation use datasets and configuration files located in the project's examples/ directory.

examples/
├── old.csv
├── new.csv
├── old.parquet
├── new.parquet
├── old.xlsx
├── new.xlsx
├── old.json
├── new.json
├── old_drift.csv
├── new_drift.csv
├── config_sample.yaml
├── config_sample.toml
├── config_sample.json
├── config_thresholds.yaml
├── config_env.yaml
└── warehouse.duckdb

Related Documentation

See also:

  • configuration.md
  • automation.md
  • profiles.md
  • history.md
  • connectors/sql.md
  • reports.md