Skip to content

Plugin Preparation Architecture

This document explains how Dift is being internally prepared for future plugin-based connector and ecosystem expansion.

Although Dift does not yet support external plugins, the internal architecture has been intentionally designed to make future plugin support technically feasible without major rewrites.


Why Plugin Preparation Matters

As Dift grows, users and organizations will want support for:

  • cloud platforms
  • enterprise warehouses
  • streaming systems
  • storage platforms
  • proprietary connectors
  • organization-specific integrations

Embedding every connector directly into the core codebase would eventually create:

  • tight coupling
  • dependency bloat
  • slower releases
  • maintenance complexity
  • unstable scaling

Plugin preparation helps avoid this.


Long-Term Vision

Dift aims to support an ecosystem where connectors can evolve independently from the core engine.

Potential future ecosystem:

dift/plugins/
├── snowflake/
├── databricks/
├── kafka/
├── s3/
├── spark/
├── clickhouse/
└── custom/

Current State

Today, all connectors still live inside the core repository.

Examples:

dift/io/
├── sql_reader.py
├── duckdb_reader.py
└── bigquery_reader.py

However, the architecture has already been refactored to reduce coupling and prepare future separation.


Architectural Goals

The plugin preparation work focuses on:

  • modular connector loading
  • connector isolation
  • dynamic registration
  • reusable interfaces
  • optional dependencies
  • future external packages
  • enterprise extensibility

Core Design Principles

The plugin architecture preparation follows these principles:

  • connectors should be isolated
  • connectors should be optional
  • the comparison engine should remain connector-agnostic
  • readers should follow a shared contract
  • connector routing should be centralized
  • new connectors should require minimal core changes

Current Foundations Already Implemented

Several important architectural changes have already been completed.


Reader Abstraction

Dift now uses a shared reader interface.

Example:

class BaseReader:
    def can_handle(self, source: str) -> bool:
        ...

    def read(self, source: str):
        ...

Benefits:

  • standardized connector behavior
  • reusable routing
  • simplified plugin contracts

Centralized Registry System

Dift now uses a centralized reader registry.

Example:

registry.register(MyReader())
reader = registry.get_reader(source)

Benefits:

  • dynamic registration
  • centralized routing
  • future plugin discovery
  • connector prioritization

Connector Isolation

Each connector now lives independently.

Examples:

sql_reader.py
duckdb_reader.py
bigquery_reader.py

Benefits:

  • isolated dependencies
  • independent testing
  • future package extraction

Unified Dataset Contract

All readers return:

polars.DataFrame

This is extremely important.

The comparison engine never needs to understand:

  • SQLAlchemy
  • DuckDB
  • BigQuery
  • storage APIs
  • warehouse authentication

It only receives standardized DataFrames.


Why This Matters

This allows connectors to evolve independently while keeping the comparison engine stable.


Dependency Isolation

Connectors use optional imports.

Example:

try:
    import duckdb
except ImportError:
    duckdb = None

Benefits:

  • lightweight installs
  • optional features
  • smaller dependency footprint
  • future plugin extraction

Plugin-Safe Error Handling

Connectors now expose actionable errors.

Example:

Snowflake support requires:
  pip install snowflake-sqlalchemy

Benefits:

  • cleaner plugin UX
  • dependency guidance
  • safer optional loading

Future Plugin Loading

Future architecture may support:

registry.load_plugins()

Potential behaviors:

  • auto-discovery
  • entry-point loading
  • optional registration
  • lazy imports

Possible Future Plugin Structure

Potential future package layout:

dift-snowflake/
├── plugin.py
├── reader.py
├── auth.py
└── requirements.txt

Possible Registration Workflow

Potential future behavior:

from dift_snowflake import SnowflakeReader

registry.register(SnowflakeReader())

Why Dynamic Registration Matters

Dynamic registration allows:

  • third-party connectors
  • enterprise-only integrations
  • community-maintained plugins
  • experimentation without core changes

Optional Connector Loading

Future connectors may become separately installable.

Examples:

pip install dift-snowflake
pip install dift-kafka
pip install dift-databricks

Benefits:

  • smaller core package
  • faster installs
  • reduced dependency conflicts
  • modular ecosystems

Enterprise Connector Possibilities

Plugin preparation also supports future enterprise integrations.

Examples:

  • Snowflake
  • Databricks
  • S3
  • Kafka
  • Delta Lake
  • Iceberg
  • proprietary warehouses

Connector Lifecycle Preparation

Future plugins may support:

  • initialization hooks
  • teardown hooks
  • authentication providers
  • configuration validation
  • capability inspection

Potential future interface:

class Plugin:
    def initialize(self):
        ...

    def shutdown(self):
        ...

Plugin Metadata (Future)

Future plugin metadata may include:

class PluginMetadata:
    name: str
    version: str
    supported_sources: list[str]
    dependencies: list[str]

This would support:

  • plugin discovery
  • compatibility checks
  • UI integrations
  • debugging workflows

Plugin Capability Discovery

Potential future capability inspection:

registry.list_capabilities()

Example output:

- SQL databases
- BigQuery
- Snowflake
- Kafka
- S3

Why Plugin Isolation Is Important

Isolation reduces risk.

A broken connector should not:

  • crash the core engine
  • break unrelated connectors
  • block local dataset comparisons

Connector Failure Philosophy

Connector failures should be:

  • isolated
  • actionable
  • recoverable
  • dependency-aware

Current Limitations

Dift does NOT yet support:

  • external plugins
  • auto-discovery
  • plugin installation APIs
  • entry-point registration
  • runtime plugin loading

However, the internal architecture is now being prepared for these features.


Why Preparation Happens Early

Architectural preparation is easier before ecosystem growth becomes large.

Retrofitting plugin systems later becomes significantly more difficult.


Current Connector Flow

Current behavior:

CLI
  ↓
Registry
  ↓
Reader
  ↓
Polars DataFrame
  ↓
Comparison Engine

This already resembles a plugin-friendly architecture.


Future Plugin Flow

Potential future architecture:

CLI
  ↓
Plugin Loader
  ↓
Registry
  ↓
External Readers
  ↓
Polars DataFrame

Stability Goals

Plugin preparation should NOT compromise:

  • comparison stability
  • report consistency
  • risk scoring
  • existing workflows

Core functionality remains stable and connector-agnostic.


Benefits of Plugin Preparation

This architecture enables future:

  • enterprise ecosystems
  • community integrations
  • connector marketplaces
  • warehouse ecosystems
  • streaming integrations
  • cloud-native workflows

Design Philosophy

The plugin preparation architecture prioritizes:

  • long-term scalability
  • loose coupling
  • modularity
  • maintainability
  • ecosystem growth

Future Areas of Expansion

Potential future integrations:

Area Examples
Warehouses Snowflake, Databricks
Streaming Kafka, Pulsar
Storage S3, GCS, Azure Blob
Lakehouses Delta Lake, Iceberg
APIs REST, GraphQL
Distributed Engines Spark, Ray

Related Developer Docs

See also:

  • architecture.md
  • reader-registry.md
  • testing.md
  • report-system.md
  • codebase-overview.md