Adding Insights¶

This guide explains how to create, extend, and maintain Aniwa’s insight generation system.

Insights are one of the most important parts of Aniwa because they transform:

raw statistics

into:

human-readable intelligence

Purpose of Insights¶

Insights help users quickly identify:

suspicious patterns
quality problems
unusual structures
possible risks
meaningful observations

without manually inspecting every metric.

Why Insights Matter¶

Most users do not want to manually analyze:

null percentages
uniqueness counts
cardinality distributions
statistics tables

Insights simplify interpretation.

Insight Philosophy¶

Aniwa’s insight system is designed to be:

Principle	Purpose
modular	isolated insight rules
explainable	understandable logic
scalable	future expansion
lightweight	fast execution
useful	actionable output

Insight Architecture Overview¶

High-level flow:

DataFrame
→ profiling metrics
→ insight rules
→ Insight objects
→ reports

Insight System Location¶

Current insight systems typically live in:

aniwa/core/

or:

aniwa/insights/

depending on future architecture evolution.

Current Insight Flow¶

Conceptually:

profile_dataframe()
↓
generate metrics
↓
evaluate rules
↓
create insights
↓
attach to DatasetProfile

What an Insight Contains¶

Each insight typically contains:

Field	Purpose
level	severity
message	human-readable explanation

Example Insight¶

Example:

Insight(
    level="warning",
    message="Column 'email' may contain sensitive information."
)

Insight Severity Levels¶

Current levels may include:

Level
info
warning
critical

Severity Philosophy¶

Severity levels should communicate:

importance and urgency

INFO Insights¶

INFO insights describe:

informational observations
low-risk findings
helpful context

WARNING Insights¶

WARNING insights describe:

potential issues
suspicious patterns
data quality concerns

CRITICAL Insights¶

CRITICAL insights describe:

severe quality issues
dangerous inconsistencies
major dataset problems

Current Insight Categories¶

Aniwa currently supports insights such as:

Insight
duplicate rows
sparse columns
possible PII
high-cardinality columns

Insight Design Principles¶

Good insights should be:

Principle	Meaning
actionable	users know what it means
concise	short and clear
relevant	avoids noise
explainable	predictable logic

Bad Insight Example¶

Avoid vague insights like:

Something looks suspicious.

Good Insight Example¶

Prefer:

Column 'customer_id' contains 100% unique values and may represent an identifier.

Insight Generation Strategy¶

Insights are typically rule-based.

Why Rule-Based Systems Matter¶

Rule-based systems are:

explainable
deterministic
testable
maintainable

Example Insight Rule¶

Example:

if duplicate_percent > 10:
    add_warning(...)

Step-by-Step Guide¶

Step 1 — Identify a Useful Pattern¶

Choose a dataset characteristic worth highlighting.

Examples:

Pattern
sparse columns
suspicious null rates
likely IDs
possible PII
outliers

Step 2 — Define the Detection Logic¶

Define how the insight is triggered.

Example Logic¶

Example:

if null_percent > 50:

Step 3 — Choose Severity Level¶

Select:

info
warning
critical

based on impact.

Step 4 — Create a Human-Readable Message¶

Messages should explain:

what happened
why it matters

Example Message¶

Example:

"Column 'email' contains 20% null values."

Step 5 — Append the Insight¶

Example:

insights.append(
    Insight(...)
)

Step 6 — Add Tests¶

Every insight must include tests.

Required Insight Tests¶

Tests should validate:

Test
insight triggers correctly
false positives avoided
severity correctness
edge cases

Step 7 — Update Documentation¶

Update:

docs/
examples
insight references

Example Insight Template¶

Basic example:

if unique_percent > 95:
    insights.append(
        Insight(
            level="info",
            message=f"Column '{column}' may contain unique identifiers."
        )
    )

Common Insight Categories¶

Quality Insights¶

Quality insights analyze:

duplicates
missing values
sparsity

Schema Insights¶

Schema insights analyze:

unusual datatypes
mixed structures
inconsistent schemas

Statistical Insights¶

Statistical insights analyze:

outliers
extreme distributions
anomalies

Semantic Insights¶

Semantic insights analyze:

emails
phone numbers
IDs
addresses

Governance Insights¶

Future governance insights may analyze:

compliance risks
PII exposure
policy violations

AI Insights¶

Future AI systems may generate:

natural-language explanations
recommendations
summaries

Performance Philosophy¶

Insights should remain:

lightweight and scalable

Why Performance Matters¶

Insight systems execute frequently.

Poor insight logic can slow profiling dramatically.

Performance Best Practices¶

Prefer:

vectorized operations
lightweight rules
reusable metrics

Avoid:

repeated dataframe scans
row-by-row loops

False Positive Philosophy¶

Insight systems should avoid:

excessive noise

Why Noise Is Dangerous¶

Too many unnecessary warnings reduce:

trust
usefulness
readability

Insight Prioritization Philosophy¶

Only generate insights that are:

meaningful and actionable

Insight Scalability Philosophy¶

The insight engine should scale toward:

enterprise-grade data intelligence

Future Insight Categories¶

Potential future systems:

Future Insight
drift detection
semantic classification
AI recommendations
governance scoring
anomaly explanations

Future Semantic Intelligence¶

Potential semantic systems:

email detection
credit card detection
phone number recognition
address inference

Future AI Insight Systems¶

Long-term AI systems may generate:

natural-language dataset explanations

Example Future AI Insight¶

Example:

This dataset appears to represent customer transaction history with moderate data quality issues.

Future Recommendation Engine¶

Future systems may recommend:

cleaning strategies
schema fixes
validation rules

Future Trust Scoring¶

Potential systems may compute:

dataset trust scores

Why Trust Systems Matter¶

Trust scoring helps users quickly evaluate:

reliability
risk
quality

Future Governance Intelligence¶

Potential future systems:

Governance Feature
PII scoring
compliance checks
lineage awareness
policy validation

Insight Modularity¶

Insights should remain isolated from:

report rendering
reader systems
CLI logic

Why Isolation Matters¶

Isolation improves:

maintainability
extensibility
testing

Insight Registry Vision¶

Future systems may introduce:

insight registries

Example Future Registry¶

Example:

register_insight(check_sparse_columns)

Plugin Insight Vision¶

Future plugins may add:

industry-specific rules
governance checks
AI models

Industry-Specific Insight Examples¶

Potential verticals:

Industry
finance
healthcare
retail
logistics

Logging Philosophy¶

Future insight systems may support:

verbose tracing
rule diagnostics
execution metrics

Testing Philosophy¶

Every insight rule should be:

predictable and testable

Example Insight Test¶

Example:

def test_duplicate_insight():
    profile = profile_dataframe(df)

    assert any(
        insight.level == "warning"
        for insight in profile.insights
    )

Insight Anti-Patterns¶

Avoid:

Anti-Pattern
vague messages
noisy warnings
repeated scans
duplicated rules
hardcoded assumptions

Contributor Best Practices¶

Contributors should prioritize:

clarity
usefulness
maintainability
performance

Pull Request Checklist¶

Before submitting:

insight logic works correctly
tests pass
documentation updated
false positives minimized

Long-Term Vision¶

Aniwa’s insight engine is evolving toward:

a universal data intelligence system

capable of helping users:

understand data
trust data
improve data quality
reason about datasets intelligently

Final Philosophy¶

The insight system exists to transform:

dataset metrics

into:

clear, actionable, intelligent understanding

Continue with:

architecture/profiling-engine.md
architecture/models.md
api/profiler.md
developer-guide/testing.md

Adding Insights¶

Purpose of Insights¶

Why Insights Matter¶

Insight Philosophy¶

Insight Architecture Overview¶

Insight System Location¶

Current Insight Flow¶

What an Insight Contains¶

Example Insight¶

Insight Severity Levels¶

Severity Philosophy¶

INFO Insights¶

WARNING Insights¶

CRITICAL Insights¶

Current Insight Categories¶

Insight Design Principles¶

Bad Insight Example¶

Good Insight Example¶

Insight Generation Strategy¶

Why Rule-Based Systems Matter¶

Example Insight Rule¶

Step-by-Step Guide¶

Step 1 — Identify a Useful Pattern¶

Step 2 — Define the Detection Logic¶

Example Logic¶

Step 3 — Choose Severity Level¶

Step 4 — Create a Human-Readable Message¶

Example Message¶

Step 5 — Append the Insight¶

Step 6 — Add Tests¶

Required Insight Tests¶

Step 7 — Update Documentation¶

Example Insight Template¶

Common Insight Categories¶

Quality Insights¶

Schema Insights¶

Statistical Insights¶

Semantic Insights¶

Governance Insights¶

AI Insights¶

Performance Philosophy¶

Why Performance Matters¶

Performance Best Practices¶

False Positive Philosophy¶

Why Noise Is Dangerous¶

Insight Prioritization Philosophy¶

Insight Scalability Philosophy¶

Future Insight Categories¶

Future Semantic Intelligence¶

Future AI Insight Systems¶

Example Future AI Insight¶

Future Recommendation Engine¶

Future Trust Scoring¶

Why Trust Systems Matter¶

Future Governance Intelligence¶

Insight Modularity¶

Why Isolation Matters¶

Insight Registry Vision¶

Example Future Registry¶

Plugin Insight Vision¶

Industry-Specific Insight Examples¶

Logging Philosophy¶

Testing Philosophy¶

Example Insight Test¶

Insight Anti-Patterns¶

Contributor Best Practices¶

Pull Request Checklist¶

Long-Term Vision¶

Final Philosophy¶

Related Documentation¶