Plugin Preparation Architecture¶
This document explains how Dift is being internally prepared for future plugin-based connector and ecosystem expansion.
Although Dift does not yet support external plugins, the internal architecture has been intentionally designed to make future plugin support technically feasible without major rewrites.
Why Plugin Preparation Matters¶
As Dift grows, users and organizations will want support for:
- cloud platforms
- enterprise warehouses
- streaming systems
- storage platforms
- proprietary connectors
- organization-specific integrations
Embedding every connector directly into the core codebase would eventually create:
- tight coupling
- dependency bloat
- slower releases
- maintenance complexity
- unstable scaling
Plugin preparation helps avoid this.
Long-Term Vision¶
Dift aims to support an ecosystem where connectors can evolve independently from the core engine.
Potential future ecosystem:
dift/plugins/
├── snowflake/
├── databricks/
├── kafka/
├── s3/
├── spark/
├── clickhouse/
└── custom/
Current State¶
Today, all connectors still live inside the core repository.
Examples:
dift/io/
├── sql_reader.py
├── duckdb_reader.py
└── bigquery_reader.py
However, the architecture has already been refactored to reduce coupling and prepare future separation.
Architectural Goals¶
The plugin preparation work focuses on:
- modular connector loading
- connector isolation
- dynamic registration
- reusable interfaces
- optional dependencies
- future external packages
- enterprise extensibility
Core Design Principles¶
The plugin architecture preparation follows these principles:
- connectors should be isolated
- connectors should be optional
- the comparison engine should remain connector-agnostic
- readers should follow a shared contract
- connector routing should be centralized
- new connectors should require minimal core changes
Current Foundations Already Implemented¶
Several important architectural changes have already been completed.
Reader Abstraction¶
Dift now uses a shared reader interface.
Example:
class BaseReader:
def can_handle(self, source: str) -> bool:
...
def read(self, source: str):
...
Benefits:
- standardized connector behavior
- reusable routing
- simplified plugin contracts
Centralized Registry System¶
Dift now uses a centralized reader registry.
Example:
registry.register(MyReader())
reader = registry.get_reader(source)
Benefits:
- dynamic registration
- centralized routing
- future plugin discovery
- connector prioritization
Connector Isolation¶
Each connector now lives independently.
Examples:
sql_reader.py
duckdb_reader.py
bigquery_reader.py
Benefits:
- isolated dependencies
- independent testing
- future package extraction
Unified Dataset Contract¶
All readers return:
polars.DataFrame
This is extremely important.
The comparison engine never needs to understand:
- SQLAlchemy
- DuckDB
- BigQuery
- storage APIs
- warehouse authentication
It only receives standardized DataFrames.
Why This Matters¶
This allows connectors to evolve independently while keeping the comparison engine stable.
Dependency Isolation¶
Connectors use optional imports.
Example:
try:
import duckdb
except ImportError:
duckdb = None
Benefits:
- lightweight installs
- optional features
- smaller dependency footprint
- future plugin extraction
Plugin-Safe Error Handling¶
Connectors now expose actionable errors.
Example:
Snowflake support requires:
pip install snowflake-sqlalchemy
Benefits:
- cleaner plugin UX
- dependency guidance
- safer optional loading
Future Plugin Loading¶
Future architecture may support:
registry.load_plugins()
Potential behaviors:
- auto-discovery
- entry-point loading
- optional registration
- lazy imports
Possible Future Plugin Structure¶
Potential future package layout:
dift-snowflake/
├── plugin.py
├── reader.py
├── auth.py
└── requirements.txt
Possible Registration Workflow¶
Potential future behavior:
from dift_snowflake import SnowflakeReader
registry.register(SnowflakeReader())
Why Dynamic Registration Matters¶
Dynamic registration allows:
- third-party connectors
- enterprise-only integrations
- community-maintained plugins
- experimentation without core changes
Optional Connector Loading¶
Future connectors may become separately installable.
Examples:
pip install dift-snowflake
pip install dift-kafka
pip install dift-databricks
Benefits:
- smaller core package
- faster installs
- reduced dependency conflicts
- modular ecosystems
Enterprise Connector Possibilities¶
Plugin preparation also supports future enterprise integrations.
Examples:
- Snowflake
- Databricks
- S3
- Kafka
- Delta Lake
- Iceberg
- proprietary warehouses
Connector Lifecycle Preparation¶
Future plugins may support:
- initialization hooks
- teardown hooks
- authentication providers
- configuration validation
- capability inspection
Potential future interface:
class Plugin:
def initialize(self):
...
def shutdown(self):
...
Plugin Metadata (Future)¶
Future plugin metadata may include:
class PluginMetadata:
name: str
version: str
supported_sources: list[str]
dependencies: list[str]
This would support:
- plugin discovery
- compatibility checks
- UI integrations
- debugging workflows
Plugin Capability Discovery¶
Potential future capability inspection:
registry.list_capabilities()
Example output:
- SQL databases
- BigQuery
- Snowflake
- Kafka
- S3
Why Plugin Isolation Is Important¶
Isolation reduces risk.
A broken connector should not:
- crash the core engine
- break unrelated connectors
- block local dataset comparisons
Connector Failure Philosophy¶
Connector failures should be:
- isolated
- actionable
- recoverable
- dependency-aware
Current Limitations¶
Dift does NOT yet support:
- external plugins
- auto-discovery
- plugin installation APIs
- entry-point registration
- runtime plugin loading
However, the internal architecture is now being prepared for these features.
Why Preparation Happens Early¶
Architectural preparation is easier before ecosystem growth becomes large.
Retrofitting plugin systems later becomes significantly more difficult.
Current Connector Flow¶
Current behavior:
CLI
↓
Registry
↓
Reader
↓
Polars DataFrame
↓
Comparison Engine
This already resembles a plugin-friendly architecture.
Future Plugin Flow¶
Potential future architecture:
CLI
↓
Plugin Loader
↓
Registry
↓
External Readers
↓
Polars DataFrame
Stability Goals¶
Plugin preparation should NOT compromise:
- comparison stability
- report consistency
- risk scoring
- existing workflows
Core functionality remains stable and connector-agnostic.
Benefits of Plugin Preparation¶
This architecture enables future:
- enterprise ecosystems
- community integrations
- connector marketplaces
- warehouse ecosystems
- streaming integrations
- cloud-native workflows
Design Philosophy¶
The plugin preparation architecture prioritizes:
- long-term scalability
- loose coupling
- modularity
- maintainability
- ecosystem growth
Future Areas of Expansion¶
Potential future integrations:
| Area | Examples |
|---|---|
| Warehouses | Snowflake, Databricks |
| Streaming | Kafka, Pulsar |
| Storage | S3, GCS, Azure Blob |
| Lakehouses | Delta Lake, Iceberg |
| APIs | REST, GraphQL |
| Distributed Engines | Spark, Ray |
Related Developer Docs¶
See also:
- architecture.md
- reader-registry.md
- testing.md
- report-system.md
- codebase-overview.md