Dift v0.1.0 Release Notes¶
Release Date: Apr 26, 2026
Dift v0.1.0¶
Initial public release of Dift — an open-source CLI platform for dataset comparison, drift detection, and data trust validation.
This release establishes the foundation of the Dift comparison engine, including schema comparison, row-level validation, quality analysis, and risk scoring workflows.
Highlights¶
Dift v0.1.0 introduces:
- dataset comparison workflows
- schema change detection
- row-level comparison
- null and duplicate analysis
- risk scoring
- console reporting
- JSON reporting
- CSV reporting
- Excel reporting
- HTML reporting
- multi-format dataset support
- CLI-based comparison workflows
Initial Features¶
Dataset Comparison Engine¶
Core comparison capabilities include:
- row count comparison
- added row detection
- removed row detection
- key-based matching
- schema validation
Schema Comparison¶
Dift can detect:
- added columns
- removed columns
- datatype mismatches
Quality Validation¶
Initial quality analysis includes:
- null spike detection
- duplicate spike detection
- quality degradation warnings
Risk Scoring¶
Dift introduces a built-in risk scoring engine.
Initial risk levels:
- low
- medium
- high
Risk scoring is based on:
- schema changes
- row differences
- quality degradation
Supported Dataset Formats¶
Initial supported formats:
- CSV
- Parquet
- Excel (
.xlsx,.xls) - JSON
Report Formats¶
Supported report outputs:
- console report
- JSON report
- CSV summary report
- Excel workbook report
- HTML report
HTML Reports¶
Initial HTML reporting support includes:
- summary dashboards
- warning sections
- risk visibility
- dataset metrics
Excel Reports¶
Excel reports include:
- summary sheets
- schema comparison sheets
- quality summaries
JSON Reports¶
JSON reports support:
- machine-readable workflows
- automation pipelines
- downstream integrations
CLI Workflows¶
Example usage:
dift old.csv new.csv --key customer_id
Generate JSON report:
dift old.csv new.csv \
--key customer_id \
--report json \
--output report.json
Example Output¶
╭─────────────────────────╮
│ Dift Dataset Comparison │
│ Risk Level: MEDIUM │
╰─────────────────────────╯
Initial Project Structure¶
dift/
├── cli.py
├── core/
├── io/
├── reports/
└── utils/
Technical Foundations¶
Dift v0.1.0 establishes the initial architecture for:
- modular comparison workflows
- report rendering
- reusable risk scoring
- extensible dataset readers
Supported Python Versions¶
Python 3.10+
Installation¶
Install from PyPI:
pip install dift-cli
Verify installation:
dift --help
Development Tooling¶
Development workflows include:
pytest
ruff check .
Known Limitations¶
Initial release limitations:
- local datasets only
- no SQL database connectors
- no warehouse integrations
- no batch workflows
- no saved profiles
- no scheduling workflows
These capabilities are planned for future releases.
Vision¶
Dift aims to become the open-source standard for:
- dataset regression testing
- data drift monitoring
- ML dataset validation
- warehouse trust checks
- automated data quality validation
Thank You¶
Thank you to everyone supporting the early development of Dift.
This release marks the beginning of the Dift ecosystem and future platform expansion.