Skip to content

Reports

Dift provides multiple report formats for dataset comparison, drift analysis, validation workflows, and automation pipelines.

Reports help teams:

  • understand dataset changes
  • identify drift risks
  • validate ETL workflows
  • monitor production datasets
  • share validation results
  • automate quality checks

Supported Report Formats

Dift supports:

  • Console
  • JSON
  • CSV
  • Excel
  • HTML

Console Reports

Console reports are the default output format.

Example:

dift old.csv new.csv --key customer_id

Console reports display:

  • schema changes
  • row changes
  • drift warnings
  • outlier spikes
  • quality issues
  • overall risk level

Example Console Output

╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM      │
╰─────────────────────────╯

Warnings

Numeric drift:
'revenue'
mean shift 900.00%
(high, threshold 0.1)

Outlier spike:
'revenue' increased by 100.00%
(high)

Categorical shift:
'segment' max frequency shift 60.00%
(high)

JSON Reports

JSON reports are useful for:

  • APIs
  • CI/CD pipelines
  • machine-readable workflows
  • downstream automation
  • audit systems

Generate JSON Report

dift old.csv new.csv \
  --key customer_id \
  --report json \
  --output report.json

Example Structure

{
  "metadata": {},
  "summary": {},
  "schema": {},
  "rows": {},
  "quality": {},
  "numeric": {},
  "categorical": {}
}

CSV Reports

CSV reports provide lightweight summaries for:

  • spreadsheets
  • dashboards
  • quick audits
  • ETL monitoring

Generate CSV Report

dift old.csv new.csv \
  --report csv \
  --output report.csv

Example CSV Output

metric,value
old_rows,1000
new_rows,1100
row_delta,100
risk_level,medium

Excel Reports

Excel reports provide structured workbook-based analysis.

Useful for:

  • analysts
  • audits
  • management reviews
  • QA workflows

Generate Excel Report

dift old.csv new.csv \
  --report excel \
  --output report.xlsx

Excel Workbook Structure

Typical sheets include:

  • Summary
  • Schema
  • Rows
  • Quality
  • Numeric Drift
  • Categorical Drift

Excel Features

Excel reports support:

  • severity color coding
  • formatted headers
  • readable layouts
  • autosized columns
  • worksheet separation

HTML Reports

HTML reports provide dashboard-style visualization.

Useful for:

  • stakeholders
  • audits
  • monitoring workflows
  • production validation
  • presentation-ready reporting

Generate HTML Report

dift old.csv new.csv \
  --report html \
  --output report.html

HTML Templates

Customize HTML appearance using templates.


Example

dift old.csv new.csv \
  --report html \
  --template dark \
  --output report.html

Available Templates

Template Description
default Standard layout
clean Minimal clean layout
compact Dense information layout
enterprise Executive dashboard styling
dark Dark mode report

HTML Features

HTML reports support:

  • responsive layouts
  • drift highlighting
  • severity badges
  • visual summaries
  • section grouping
  • dashboard-style presentation

Output Directory Support

Automatically generate filenames using --output-dir.


Example

dift old.csv new.csv \
  --report html \
  --output-dir reports/

Generated Filenames

Report Type Filename
JSON dift_report.json
CSV dift_report.csv
Excel dift_report.xlsx
HTML dift_report.html

Batch Report Generation

Generate reports for multiple dataset comparisons.


Example

dift batch \
  --old-dir data/old \
  --new-dir data/new \
  --report html \
  --output-dir reports/

Example Batch Structure

reports/
├── customers/
│   └── dift_report.html
├── orders/
│   └── dift_report.html
└── products/
    └── dift_report.html

Progress Indicators

Large report generation workflows display progress indicators.

Examples:

  • JSON generation
  • Excel export
  • HTML rendering
  • warehouse extraction
  • SQL loading

This improves visibility during long-running workflows.


Report Metadata

Reports include metadata such as:

  • execution timestamps
  • report format
  • Dift version
  • dataset source metadata
  • thresholds used
  • runtime information

Risk Levels

Dift classifies comparisons into:

Risk Level Meaning
low Safe dataset changes
medium Moderate drift detected
high Significant risk detected

Drift Reporting

Reports may include:

  • numeric drift
  • categorical drift
  • outlier spikes
  • frequency shifts
  • null spikes
  • duplicate spikes
  • schema changes
  • row additions/removals

Numeric Drift Reporting

Examples include:

  • mean shift
  • standard deviation changes
  • range drift
  • distribution changes

Categorical Drift Reporting

Examples include:

  • new values
  • removed values
  • frequency shifts
  • category instability

Outlier Reporting

Outlier analysis includes:

  • IQR-based detection
  • outlier spike tracking
  • severity classification
  • risk integration

Automation Workflows

Reports integrate well with:

  • GitHub Actions
  • Airflow
  • Prefect
  • Dagster
  • Jenkins
  • cron jobs
  • CI/CD systems

Example CI Workflow

dift prod.csv staging.csv \
  --report json \
  --output report.json \
  --strict-exit-codes \
  --quiet

Recommended Report Formats

Workflow Recommended Format
CI/CD JSON
Analysts Excel
Dashboards HTML
Lightweight summaries CSV
Interactive review Console

Common Use Cases


ETL Validation

dift before.csv after.csv \
  --report html

ML Drift Monitoring

dift train_v1.csv train_v2.csv \
  --report json

Executive Reporting

dift prod.csv staging.csv \
  --report enterprise

Historical Monitoring

dift prod.csv staging.csv \
  --history \
  --report html

Next Steps

Continue with:

  • Configuration
  • Thresholds
  • Profiles
  • Batch Comparisons
  • Scheduling
  • Automation
  • Connectors