Skip to content

Dift v0.1.0 Release Notes

Release Date: Apr 26, 2026


Dift v0.1.0

Initial public release of Dift — an open-source CLI platform for dataset comparison, drift detection, and data trust validation.

This release establishes the foundation of the Dift comparison engine, including schema comparison, row-level validation, quality analysis, and risk scoring workflows.


Highlights

Dift v0.1.0 introduces:

  • dataset comparison workflows
  • schema change detection
  • row-level comparison
  • null and duplicate analysis
  • risk scoring
  • console reporting
  • JSON reporting
  • CSV reporting
  • Excel reporting
  • HTML reporting
  • multi-format dataset support
  • CLI-based comparison workflows

Initial Features


Dataset Comparison Engine

Core comparison capabilities include:

  • row count comparison
  • added row detection
  • removed row detection
  • key-based matching
  • schema validation

Schema Comparison

Dift can detect:

  • added columns
  • removed columns
  • datatype mismatches

Quality Validation

Initial quality analysis includes:

  • null spike detection
  • duplicate spike detection
  • quality degradation warnings

Risk Scoring

Dift introduces a built-in risk scoring engine.

Initial risk levels:

  • low
  • medium
  • high

Risk scoring is based on:

  • schema changes
  • row differences
  • quality degradation

Supported Dataset Formats

Initial supported formats:

  • CSV
  • Parquet
  • Excel (.xlsx, .xls)
  • JSON

Report Formats

Supported report outputs:

  • console report
  • JSON report
  • CSV summary report
  • Excel workbook report
  • HTML report

HTML Reports

Initial HTML reporting support includes:

  • summary dashboards
  • warning sections
  • risk visibility
  • dataset metrics

Excel Reports

Excel reports include:

  • summary sheets
  • schema comparison sheets
  • quality summaries

JSON Reports

JSON reports support:

  • machine-readable workflows
  • automation pipelines
  • downstream integrations

CLI Workflows

Example usage:

dift old.csv new.csv --key customer_id

Generate JSON report:

dift old.csv new.csv \
  --key customer_id \
  --report json \
  --output report.json

Example Output

╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM      │
╰─────────────────────────╯

Initial Project Structure

dift/
├── cli.py
├── core/
├── io/
├── reports/
└── utils/

Technical Foundations

Dift v0.1.0 establishes the initial architecture for:

  • modular comparison workflows
  • report rendering
  • reusable risk scoring
  • extensible dataset readers

Supported Python Versions

Python 3.10+

Installation

Install from PyPI:

pip install dift-cli

Verify installation:

dift --help

Development Tooling

Development workflows include:

pytest
ruff check .

Known Limitations

Initial release limitations:

  • local datasets only
  • no SQL database connectors
  • no warehouse integrations
  • no batch workflows
  • no saved profiles
  • no scheduling workflows

These capabilities are planned for future releases.


Vision

Dift aims to become the open-source standard for:

  • dataset regression testing
  • data drift monitoring
  • ML dataset validation
  • warehouse trust checks
  • automated data quality validation

Thank You

Thank you to everyone supporting the early development of Dift.

This release marks the beginning of the Dift ecosystem and future platform expansion.