AIECS Knowledge Graph

Status: Enhanced Capabilities Complete ✅

AIECS Knowledge Graph provides advanced graph-based knowledge storage, retrieval, reasoning, and fusion capabilities for AI applications.

Features

Core Capabilities (✅ Complete)

Domain Models: Entity, Relation, Path, Query, Result models with Pydantic validation
Schema Management: Type-safe schema definitions with caching for improved performance
Storage Backends: In-Memory, SQLite, and PostgreSQL support
Structured Data Import: Import CSV and JSON data with schema mapping (100-300 rows/second)
Text Similarity Utilities: BM25, Jaccard, Cosine similarity, Levenshtein distance, fuzzy matching

New Enhanced Features (✅ Complete)

Runnable Pattern: Composable graph operations with async/sync compatibility
Knowledge Fusion: Cross-document entity merging with conflict resolution (5 strategies)
Result Reranking: Improve search relevance with 4 reranking strategies (text, semantic, structural, hybrid)
Logical Query Parsing: Convert natural language to structured logical queries
Schema Caching: 3-5x performance improvement with 70-95% hit rate
Query Optimization: 40-70% faster query execution with cost-based optimization
Performance Benchmarks: Comprehensive benchmarks and optimization guides

Storage Backends

In-Memory: Fast, networkx-based storage for development (100K+ nodes)
SQLite: File-based persistence for single-user applications (1M+ nodes)
PostgreSQL: Production-grade storage with pgvector support (10M+ nodes)

Quick Start

Basic Usage

from aiecs.domain.knowledge_graph.models.entity import Entity
from aiecs.domain.knowledge_graph.models.relation import Relation
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

# Initialize store
store = InMemoryGraphStore()
await store.initialize()

# Add entities
alice = Entity(
    id="alice",
    entity_type="Person",
    properties={"name": "Alice", "age": 30}
)
await store.add_entity(alice)

bob = Entity(id="bob", entity_type="Person", properties={"name": "Bob"})
await store.add_entity(bob)

# Add relation
knows = Relation(
    id="r1",
    relation_type="KNOWS",
    source_id="alice",
    target_id="bob"
)
await store.add_relation(knows)

# Query neighbors (Tier 1 method)
neighbors = await store.get_neighbors("alice", direction="outgoing")
print(f"Alice knows: {[n.properties['name'] for n in neighbors]}")

# Graph traversal (Tier 2 method - works automatically!)
paths = await store.traverse("alice", max_depth=3)
print(f"Found {len(paths)} paths")

# Cleanup
await store.close()

Schema Management

from aiecs.domain.knowledge_graph.schema import (
    SchemaManager,
    EntityType,
    RelationType,
    PropertySchema,
    PropertyType
)

# Create schema manager
manager = SchemaManager()

# Define entity type
person_type = EntityType(
    name="Person",
    description="A person entity",
    properties={
        "name": PropertySchema(
            name="name",
            property_type=PropertyType.STRING,
            required=True
        ),
        "age": PropertySchema(
            name="age",
            property_type=PropertyType.INTEGER,
            min_value=0,
            max_value=150
        )
    }
)
manager.create_entity_type(person_type)

# Define relation type
knows_type = RelationType(
    name="KNOWS",
    description="Person knows another person",
    source_entity_types=["Person"],
    target_entity_types=["Person"]
)
manager.create_relation_type(knows_type)

# Validate entity
is_valid = manager.validate_entity("Person", {"name": "Alice", "age": 30})

Architecture

Two-Tier Storage Interface

AIECS’s knowledge graph uses an innovative two-tier interface design:

Tier 1 - Basic Interface (MUST IMPLEMENT):

add_entity(), get_entity()
add_relation(), get_relation()
get_neighbors()
initialize(), close()

Tier 2 - Advanced Interface (HAS DEFAULTS):

traverse() - Multi-hop graph traversal
find_paths() - Path finding between entities
subgraph_query() - Extract subgraphs
vector_search() - Semantic search
execute_query() - Generic query execution

Why This Matters

Minimal Implementation: Implement just 7 Tier 1 methods, get all Tier 2 methods for free
Gradual Optimization: Start with defaults, optimize later for your specific backend
Storage Agnostic: Application code works with any storage backend
Custom Adapters: Easy to integrate Neo4j, ArangoDB, or any graph database

Example:

class CustomGraphStore(GraphStore):
    # Implement only Tier 1 methods
    async def add_entity(self, entity): ...
    async def get_entity(self, entity_id): ...
    async def add_relation(self, relation): ...
    async def get_relation(self, relation_id): ...
    async def get_neighbors(self, entity_id, ...): ...
    async def initialize(self): ...
    async def close(self): ...
    
    # Tier 2 methods work automatically!
    # traverse(), find_paths(), etc. all work via defaults

# Later, optimize for your backend:
class OptimizedGraphStore(CustomGraphStore):
    async def traverse(self, ...):
        # Use database-specific optimization
        return await self._use_recursive_cte()

API Reference

See API Reference for detailed API reference.

Testing

All Phase 1 components have comprehensive test coverage:

# Run all knowledge graph tests
poetry run pytest test/unit_tests/knowledge_graph/ -v
poetry run pytest test/integration_tests/knowledge_graph/ -v

# Run with coverage
poetry run pytest test/unit_tests/knowledge_graph/ test/integration_tests/knowledge_graph/ --cov=aiecs.domain.knowledge_graph --cov=aiecs.infrastructure.graph_storage --cov-report=html

Test Results (Phase 1):

✅ 13 domain model tests
✅ 16 schema management tests
✅ 15 InMemoryGraphStore integration tests
Total: 44 tests, all passing

Documentation

Getting Started

User Guide - Comprehensive user guide with examples
Quick Start - Get started in 5 minutes
Migration Guide - Integrate knowledge graphs into existing apps

Reference Documentation

API Reference - Complete API documentation
Configuration Guide - Configuration options and examples
Performance Guide - Performance optimization tips
Troubleshooting - Common issues and solutions

Developer Resources

Developer Guide - Extend knowledge graph components
Backend Development - Custom backend development
- Custom Backend Guide
- SQLite Backend
Reasoning Guides - Advanced reasoning features

Tutorials

End-to-End Tutorial - Complete workflow
Domain-Specific Tutorial - Build a medical knowledge graph
Multi-Hop Reasoning Tutorial - Complex question answering
CSV-to-Graph Tutorial - Import structured data

Deployment

Production Deployment - Production best practices
Security Guide - Security considerations

Examples

Tutorials

CSV-to-Graph Tutorial - Step-by-step CSV import guide
JSON-to-Graph Tutorial - Step-by-step JSON import guide

Example Scripts

Complete working examples demonstrating new features:

18_spss_import_with_inference.py - Import SPSS files with automatic schema inference
19_wide_format_normalization.py - Reshape wide format data to normalized graph structure
20_statistical_aggregation.py - Compute statistics during import
21_data_quality_validation.py - Validate data quality during import

See all examples in the examples directory for complete working code.

Structured Data Import

Import CSV, JSON, SPSS, and Excel data into knowledge graphs:

Schema Mapping Guide: Complete guide to configuring schema mappings
StructuredDataPipeline Guide: Usage guide for importing structured data
CSV-to-Graph Tutorial: Step-by-step CSV import tutorial
JSON-to-Graph Tutorial: Step-by-step JSON import tutorial

New Features:

SPSS/Excel Support: Direct import from .sav, .por, .xlsx, .xls files
Automatic Schema Inference: Automatically generate schema mappings from data
Data Reshaping: Convert wide format to normalized graph structures
Statistical Aggregation: Compute mean, std dev, min, max during import
Data Quality Validation: Range checks, outlier detection, completeness validation
Performance Optimization: Parallel processing, bulk writes, streaming import

Quick Example

from aiecs.application.knowledge_graph.builder.schema_mapping import (
    SchemaMapping,
    EntityMapping
)
from aiecs.application.knowledge_graph.builder.structured_pipeline import StructuredDataPipeline
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

# Define mapping
mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["id", "name"],
            entity_type="Person",
            property_mapping={"id": "id", "name": "name"},
            id_column="id"
        )
    ]
)

# Import CSV
store = InMemoryGraphStore()
await store.initialize()
pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)
result = await pipeline.import_from_csv("data.csv")
print(f"Added {result.entities_added} entities")

Development

Project Structure

aiecs/
├── domain/knowledge_graph/          # Domain models and business logic
│   ├── models/                      # Entity, Relation, Path, Query models
│   └── schema/                      # Schema management
├── infrastructure/graph_storage/    # Storage implementations
│   ├── base.py                      # Two-tier GraphStore interface
│   └── in_memory.py                 # InMemory implementation
├── application/knowledge_graph/     # Application services (Phase 2+)
└── tools/knowledge_graph/           # Agent tools (Phase 2+)

test/
├── unit_tests/knowledge_graph/      # Unit tests
└── integration_tests/knowledge_graph/ # Integration tests

Contributing

When implementing new storage backends:

Inherit from GraphStore
Implement all Tier 1 methods (7 methods)
Test that Tier 2 methods work automatically
Optionally optimize Tier 2 methods for your backend
Add integration tests

New Features Quick Start

Structured Data Import

from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool

builder = KnowledgeGraphBuilderTool()
await builder._initialize()

# Import CSV with schema mapping
result = await builder.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="employees.csv",
    schema_mapping={
        "entity_mappings": [{
            "entity_type": "Person",
            "id_column": "person_id",
            "property_mappings": {"name": "full_name"}
        }]
    }
)

Search with Reranking

from aiecs.tools.knowledge_graph import GraphSearchTool

tool = GraphSearchTool()
result = await tool.run(
    op="graph_search",
    mode="hybrid",
    query="machine learning experts",
    enable_reranking=True,
    rerank_strategy="hybrid"
)

Knowledge Fusion

from aiecs.application.knowledge_graph.fusion import KnowledgeFusion

fusion = KnowledgeFusion(store, similarity_threshold=0.85)
stats = await fusion.fuse_cross_document_entities()

Documentation

Getting Started

Data Import

Search and Reasoning

Tools

Deployment

Performance

CSV Import: 100-300 rows/second
Reranking: 50-300ms latency
Schema Cache: 70-95% hit rate, 3-5x speedup
Query Optimization: 40-70% faster

See Performance Guide for details.

Roadmap

Phase 1: Foundation (Domain models, Schema, Two-tier interface, InMemory store)
Phase 2: Knowledge Graph Builder (Extract entities/relations from documents)
Phase 3: Storage Backends (SQLite, PostgreSQL with pgvector)
Phase 4: Enhanced Capabilities (Runnable pattern, Knowledge fusion, Reranking, Query optimization)
Phase 5: Testing & Documentation (111+ unit tests, 67 integration tests, comprehensive docs)
Phase 6: Advanced Features (Visualization, Advanced reasoning, Real-time updates)

License

MIT License - See the project root LICENSE file for details.