Skip to content

Dift v0.3.0 Release Notes

Release Date: May 3, 2026


Dift v0.3.0

Dift v0.3.0 introduces major improvements focused on reporting workflows, configuration-driven execution, reusable validation settings, and developer experience improvements.

This release significantly expands Dift beyond simple dataset comparison into a more reusable and automation-friendly validation platform.


Highlights

Dift v0.3.0 introduces:

  • reusable configuration files
  • JSON configuration support
  • TOML configuration support
  • YAML configuration support
  • improved JSON reports
  • CSV summary reporting
  • Excel report improvements
  • HTML report improvements
  • reusable threshold configurations
  • column-level threshold overrides
  • environment-based configurations
  • environment variable support
  • output directory support
  • improved validation errors
  • stronger CLI workflows

Major New Features


Configuration File Support

Dift now supports reusable configuration files.

Supported formats:

  • YAML
  • TOML
  • JSON

This allows teams to standardize and reuse comparison workflows.


Example YAML Config

old_dataset: examples/old.csv
new_dataset: examples/new.csv
key: customer_id
threshold: 0.1
report: html

Run using:

dift --config config.yaml

TOML Configuration Support

Example:

old_dataset = "examples/old.csv"
new_dataset = "examples/new.csv"
key = "customer_id"
report = "json"

JSON Configuration Support

Example:

{
  "old_dataset": "examples/old.csv",
  "new_dataset": "examples/new.csv",
  "key": "customer_id",
  "report": "csv"
}

Dataset Paths Inside Configs

Dataset paths can now be fully defined inside config files.

This enables cleaner automation workflows:

dift --config config_with_datasets.yaml

CLI Override Support

CLI arguments now override config values.

Priority order:

CLI arguments > Config values > Defaults

This enables flexible workflow customization.


Reusable Threshold Configurations

Dift now supports reusable threshold policies.

Thresholds can be configured globally or per-column.


Global Threshold Example

thresholds:
  numeric: 0.1
  categorical: 0.2
  outlier: 0.15

Column-Level Threshold Overrides

Example:

columns:
  revenue:
    numeric: 0.05

  segment:
    categorical: 0.3

This enables highly customized validation behavior.


Environment-Based Configurations

Dift now supports reusable environment workflows.

Example:

environments:
  development:
    threshold: 0.2

  production:
    threshold: 0.05

Run using:

dift --config config_env.yaml --env production

Environment Variable Support

Dift now supports environment variable interpolation inside config files.

Example:

old_dataset: ${OLD_DATASET}
new_dataset: ${NEW_DATASET}

This improves CI/CD and secret-management preparation.


Output Directory Support

Reports can now be written directly to output directories.

Example:

dift old.csv new.csv \
  --report json \
  --output-dir reports/

Dift automatically generates report filenames.


Auto-Generated Report Names

Examples:

dift_report.json
dift_report.csv
dift_report.xlsx
dift_report.html

Improved JSON Report Structure

JSON reports were redesigned for:

  • cleaner automation support
  • improved consistency
  • future extensibility

New report sections include:

  • metadata
  • summary
  • schema
  • rows
  • quality
  • numeric
  • categorical

Metadata Support

JSON reports now include metadata such as:

  • tool name
  • version
  • report type

Example:

"metadata": {
  "tool": "dift",
  "version": "0.3.0"
}

Better CSV Summary Reports

CSV reporting now provides improved summary consistency for automation workflows and lightweight monitoring.


Improved Excel Reports

Excel reports received multiple improvements:

  • better formatting
  • improved worksheet organization
  • cleaner readability
  • improved summaries

Improved HTML Reports

HTML reports now include:

  • better layouts
  • improved summaries
  • cleaner warning visibility
  • improved readability

Better Validation Errors

Validation workflows were significantly improved.

Examples include:

  • clearer missing dataset guidance
  • improved unsupported file type errors
  • actionable configuration validation

Example Unsupported File Error

Unsupported dataset type '.txt'.

Supported local file types:
.csv, .json, .parquet, .xlsx

Improved CLI UX

CLI workflows were improved through:

  • clearer help output
  • better command guidance
  • improved validation behavior
  • cleaner execution workflows

Testing Improvements

Testing coverage expanded significantly.

New focus areas include:

  • config loading
  • output directory workflows
  • JSON schema consistency
  • validation stability
  • CLI regression protection

Example Usage

Run using config:

dift --config config_sample.yaml

Generate HTML report:

dift old.csv new.csv \
  --report html \
  --template clean

Supported Dataset Formats

Supported formats remain:

  • CSV
  • Parquet
  • Excel (.xlsx, .xls)
  • JSON

Report Formats

Supported outputs:

  • console report
  • JSON report
  • CSV report
  • Excel report
  • HTML report

Internal Improvements

Internal improvements include:

  • cleaner report architecture
  • improved config loading
  • reusable threshold handling
  • better validation organization

Developer Experience Improvements

Developer workflows were improved through:

  • expanded testing coverage
  • clearer validation behavior
  • improved report consistency

Installation

Install:

pip install dift-cli

Upgrade:

pip install --upgrade dift-cli

Looking Ahead

Future releases will focus on:

  • SQL database support
  • warehouse integrations
  • automation workflows
  • saved profiles
  • scheduling systems
  • historical drift tracking
  • batch comparisons

Known Limitations

Current limitations:

  • no SQL connectors yet
  • no warehouse integrations yet
  • no scheduling system yet
  • no batch comparison workflows yet

These are planned for future releases.


Vision

Dift continues evolving toward becoming the open-source standard for:

  • dataset regression testing
  • data trust validation
  • ML dataset drift monitoring
  • warehouse comparison workflows
  • automated data quality enforcement

Thank You

Thank you to everyone contributing feedback, ideas, testing, documentation improvements, and early feature discussions during Dift’s rapid growth.

Dift v0.3.0 represents a major step toward scalable and reusable data trust workflows.