Architecture¶
This document explains the architecture of Aniwa, including:
- system structure
- execution flow
- core modules
- profiling pipeline
- reporting systems
- scalability strategy
- future architectural direction
Aniwa is designed as:
not merely:
Purpose of the Architecture¶
The architecture exists to ensure:
- scalability
- maintainability
- extensibility
- performance
- contributor friendliness
Architectural Philosophy¶
Aniwa architecture is built around several principles:
| Principle |
|---|
| modular |
| scalable |
| developer-first |
| extensible |
| automation-friendly |
| maintainable |
Core Philosophy¶
Aniwa is designed around:
This means each system should own:
High-Level Architecture¶
Aniwa currently consists of several major layers:
High-Level Execution Flow¶
The current execution pipeline:
User Command
→ CLI Parsing
→ Config Resolution
→ Dataset Reading
→ Profiling Engine
→ Insight Generation
→ Report Rendering
→ Output Export
Current Project Structure¶
Aniwa/
│
├── aniwa/
│ ├── cli.py
│ │
│ ├── config_loader.py
│ │
│ ├── core/
│ │
│ ├── io/
│ │
│ ├── models/
| |
| ├── utils/
│ │
│ └── reports/
│
│
├── docs/
├── tests/
├── examples/
│
├── README.md
├── pyproject.toml
└── requirements.txt
Why the Structure Matters¶
This structure enables:
- isolated systems
- easier testing
- contributor scalability
- architectural clarity
CLI Layer¶
Primary file:
Responsibilities of the CLI¶
The CLI layer handles:
- argument parsing
- command orchestration
- config loading
- report routing
- execution flow
Why the CLI Exists¶
The CLI acts as:
for the entire system.
Current CLI Responsibilities¶
The CLI currently controls:
| Responsibility |
|---|
| input parsing |
| config resolution |
| report selection |
| profiling mode |
| section filtering |
| metadata generation |
Configuration Layer¶
Primary area:
Responsibilities of the Config System¶
The configuration system handles:
- YAML configs
- TOML configs
- JSON configs
- config flattening
- validation
- precedence resolution
Config Priority Order¶
Aniwa follows:
Why This Matters¶
This provides:
- reproducibility
- flexibility
- automation compatibility
Automatic Config Discovery¶
Aniwa automatically searches for:
Reader Layer¶
Primary location:
Responsibilities of Readers¶
Readers handle:
- dataset ingestion
- format detection
- conversion to Polars DataFrames
Supported Formats¶
Current formats:
| Format |
|---|
| CSV |
| Excel |
| JSON |
| Parquet |
Why Readers Are Isolated¶
Reader isolation enables:
- easier extensibility
- cleaner debugging
- modular format support
Reader Execution Flow¶
Typical reader flow:
Why Polars Was Chosen¶
Aniwa uses:
because it provides:
- high performance
- vectorized computation
- modern dataframe architecture
Core Profiling Engine¶
Primary location:
Responsibilities of the Profiling Engine¶
The profiling engine handles:
- schema analysis
- quality analysis
- statistics
- insights
- metadata generation
Why the Profiling Engine Matters¶
The profiling engine is:
Profiling Pipeline¶
Current profiling flow:
Profiling Modes¶
Current modes:
| Mode |
|---|
| fast |
| deep |
Fast Mode¶
Fast mode prioritizes:
Deep Mode¶
Deep mode prioritizes:
Why Multiple Modes Matter¶
Different workflows require different tradeoffs between:
- speed
- depth
- resource usage
Insight System¶
The insight layer detects:
- suspicious patterns
- quality issues
- sparse columns
- possible PII
- duplicate rows
Why Insights Matter¶
Users want:
not merely:
Models Layer¶
Primary location:
Responsibilities of Models¶
Models provide:
- structured data representation
- type-safe architecture
- report consistency
Why Models Matter¶
Models create:
- predictable data structures
- cleaner serialization
- safer extensibility
Current Model Categories¶
Current models include:
| Model Type |
|---|
| metadata |
| schema |
| statistics |
| insights |
| reports |
Reporting Layer¶
Primary location:
Responsibilities of Reports¶
The reporting system handles:
- rendering
- serialization
- export generation
Current Report Formats¶
Supported formats:
| Format |
|---|
| console |
| JSON |
| HTML |
| Markdown |
| Excel |
Why Reports Are Modular¶
Modular reports enable:
- independent extensions
- cleaner maintenance
- easier customization
Report Rendering Flow¶
Current rendering flow:
Console Reports¶
Console reports prioritize:
- readability
- developer UX
- terminal clarity
HTML Reports¶
HTML reports prioritize:
- shareability
- visualization
- presentation
JSON Reports¶
JSON reports prioritize:
- machine-readability
- automation
- integrations
Markdown Reports¶
Markdown reports prioritize:
- documentation workflows
- GitHub compatibility
- portability
Template System¶
Primary location:
Responsibilities of Templates¶
Templates control:
- layout
- styling
- presentation structure
Why Templates Matter¶
Templates separate:
Template Philosophy¶
Templates should remain:
- reusable
- isolated
- customizable
Utility Layer¶
Primary location:
Responsibilities of Utilities¶
Utilities contain:
- helper functions
- reusable shared logic
- formatting systems
Why Utilities Exist¶
Utilities reduce:
- duplication
- tightly coupled code
- architectural clutter
Metadata System¶
Aniwa automatically generates metadata including:
| Metadata |
|---|
| runtime |
| dataset size |
| file type |
| version |
| execution command |
Why Metadata Matters¶
Metadata improves:
- reproducibility
- debugging
- governance
- auditing
Section-Based Architecture¶
Aniwa supports selective report sections.
Current Sections¶
Current sections include:
| Section |
|---|
| summary |
| schema |
| quality |
| statistics |
| insights |
| charts |
Why Section Modularity Matters¶
This enables:
- smaller reports
- customizable workflows
- automation flexibility
Error Handling Philosophy¶
Aniwa prioritizes:
Good Error Example¶
Example:
Bad Error Example¶
Avoid:
Scalability Philosophy¶
Aniwa is designed with:
Why Scalability Matters¶
The architecture should eventually support:
- massive datasets
- distributed systems
- cloud infrastructure
- enterprise workflows
Current Scalability Strategies¶
Current scalability considerations:
| Strategy |
|---|
| modular systems |
| isolated readers |
| layered architecture |
| reusable models |
Future Scalability Directions¶
Potential future systems:
| Future System |
|---|
| chunked processing |
| streaming profiling |
| distributed execution |
| async pipelines |
Future Connectivity Vision¶
Aniwa aims to support:
| System |
|---|
| PostgreSQL |
| MySQL |
| DuckDB |
| BigQuery |
| Snowflake |
Why Universal Connectivity Matters¶
Modern data ecosystems are:
Future Plugin Architecture¶
Potential future plugin systems:
| Plugin |
|---|
| readers |
| reports |
| governance |
| AI modules |
Why Plugins Matter¶
Plugins allow:
Future AI Architecture¶
Future AI systems may support:
- semantic understanding
- anomaly explanation
- dataset summarization
- intelligent recommendations
Future Governance Systems¶
Potential governance layers:
- PII detection
- trust scoring
- lineage tracking
- compliance validation
Cloud Architecture Vision¶
Future cloud systems may include:
Distributed Systems Vision¶
Long-term systems may evolve toward:
Why Distributed Systems Matter¶
Distributed systems enable:
- massive scale
- enterprise workloads
- parallel computation
Long-Term Architecture Philosophy¶
Aniwa architecture is designed with:
Why Long-Term Thinking Matters¶
Strong foundations reduce:
- rewrites
- technical debt
- architectural fragmentation
Architectural Anti-Patterns¶
Avoid:
| Anti-Pattern |
|---|
| tightly coupled systems |
| duplicated logic |
| monolithic reports |
| hidden side effects |
| unclear execution flow |
Contributor Architecture Philosophy¶
Contributors should prioritize:
- modularity
- maintainability
- readability
- scalability
Why Contributor Alignment Matters¶
Consistent architecture improves:
- ecosystem health
- contributor onboarding
- long-term maintainability
Strategic Importance¶
Aniwa architecture exists to ensure that:
Final Philosophy¶
Aniwa architecture is designed to remain:
Related Documentation¶
Continue with:
- developer-guide/profiling-engine.md
- developer-guide/reporting-system.md
- roadmap.md
- philosophy.md