# AIECS Knowledge Graph

**Status**: Enhanced Capabilities Complete ✅

AIECS Knowledge Graph provides advanced graph-based knowledge storage, retrieval, reasoning, and fusion capabilities for AI applications.

## Features

### Core Capabilities (✅ Complete)

- **Domain Models**: Entity, Relation, Path, Query, Result models with Pydantic validation
- **Schema Management**: Type-safe schema definitions with caching for improved performance
- **Storage Backends**: In-Memory, SQLite, and PostgreSQL support
- **Structured Data Import**: Import CSV and JSON data with schema mapping (100-300 rows/second)
- **Text Similarity Utilities**: BM25, Jaccard, Cosine similarity, Levenshtein distance, fuzzy matching

### New Enhanced Features (✅ Complete)

- **Runnable Pattern**: Composable graph operations with async/sync compatibility
- **Knowledge Fusion**: Cross-document entity merging with conflict resolution (5 strategies)
- **Result Reranking**: Improve search relevance with 4 reranking strategies (text, semantic, structural, hybrid)
- **Logical Query Parsing**: Convert natural language to structured logical queries
- **Schema Caching**: 3-5x performance improvement with 70-95% hit rate
- **Query Optimization**: 40-70% faster query execution with cost-based optimization
- **Performance Benchmarks**: Comprehensive benchmarks and optimization guides

### Storage Backends

- **In-Memory**: Fast, networkx-based storage for development (100K+ nodes)
- **SQLite**: File-based persistence for single-user applications (1M+ nodes)
- **PostgreSQL**: Production-grade storage with pgvector support (10M+ nodes)

## Quick Start

### Basic Usage

```python
from aiecs.domain.knowledge_graph.models.entity import Entity
from aiecs.domain.knowledge_graph.models.relation import Relation
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

# Initialize store
store = InMemoryGraphStore()
await store.initialize()

# Add entities
alice = Entity(
    id="alice",
    entity_type="Person",
    properties={"name": "Alice", "age": 30}
)
await store.add_entity(alice)

bob = Entity(id="bob", entity_type="Person", properties={"name": "Bob"})
await store.add_entity(bob)

# Add relation
knows = Relation(
    id="r1",
    relation_type="KNOWS",
    source_id="alice",
    target_id="bob"
)
await store.add_relation(knows)

# Query neighbors (Tier 1 method)
neighbors = await store.get_neighbors("alice", direction="outgoing")
print(f"Alice knows: {[n.properties['name'] for n in neighbors]}")

# Graph traversal (Tier 2 method - works automatically!)
paths = await store.traverse("alice", max_depth=3)
print(f"Found {len(paths)} paths")

# Cleanup
await store.close()
```

### Schema Management

```python
from aiecs.domain.knowledge_graph.schema import (
    SchemaManager,
    EntityType,
    RelationType,
    PropertySchema,
    PropertyType
)

# Create schema manager
manager = SchemaManager()

# Define entity type
person_type = EntityType(
    name="Person",
    description="A person entity",
    properties={
        "name": PropertySchema(
            name="name",
            property_type=PropertyType.STRING,
            required=True
        ),
        "age": PropertySchema(
            name="age",
            property_type=PropertyType.INTEGER,
            min_value=0,
            max_value=150
        )
    }
)
manager.create_entity_type(person_type)

# Define relation type
knows_type = RelationType(
    name="KNOWS",
    description="Person knows another person",
    source_entity_types=["Person"],
    target_entity_types=["Person"]
)
manager.create_relation_type(knows_type)

# Validate entity
is_valid = manager.validate_entity("Person", {"name": "Alice", "age": 30})
```

## Architecture

### Two-Tier Storage Interface

AIECS's knowledge graph uses an innovative two-tier interface design:

**Tier 1 - Basic Interface (MUST IMPLEMENT)**:
- `add_entity()`, `get_entity()`
- `add_relation()`, `get_relation()`
- `get_neighbors()`
- `initialize()`, `close()`

**Tier 2 - Advanced Interface (HAS DEFAULTS)**:
- `traverse()` - Multi-hop graph traversal
- `find_paths()` - Path finding between entities
- `subgraph_query()` - Extract subgraphs
- `vector_search()` - Semantic search
- `execute_query()` - Generic query execution

### Why This Matters

1. **Minimal Implementation**: Implement just 7 Tier 1 methods, get all Tier 2 methods for free
2. **Gradual Optimization**: Start with defaults, optimize later for your specific backend
3. **Storage Agnostic**: Application code works with any storage backend
4. **Custom Adapters**: Easy to integrate Neo4j, ArangoDB, or any graph database

Example:

```python
class CustomGraphStore(GraphStore):
    # Implement only Tier 1 methods
    async def add_entity(self, entity): ...
    async def get_entity(self, entity_id): ...
    async def add_relation(self, relation): ...
    async def get_relation(self, relation_id): ...
    async def get_neighbors(self, entity_id, ...): ...
    async def initialize(self): ...
    async def close(self): ...
    
    # Tier 2 methods work automatically!
    # traverse(), find_paths(), etc. all work via defaults

# Later, optimize for your backend:
class OptimizedGraphStore(CustomGraphStore):
    async def traverse(self, ...):
        # Use database-specific optimization
        return await self._use_recursive_cte()
```

## API Reference

See [API Reference](./API_REFERENCE.md) for detailed API reference.

## Testing

All Phase 1 components have comprehensive test coverage:

```bash
# Run all knowledge graph tests
poetry run pytest test/unit_tests/knowledge_graph/ -v
poetry run pytest test/integration_tests/knowledge_graph/ -v

# Run with coverage
poetry run pytest test/unit_tests/knowledge_graph/ test/integration_tests/knowledge_graph/ --cov=aiecs.domain.knowledge_graph --cov=aiecs.infrastructure.graph_storage --cov-report=html
```

Test Results (Phase 1):
- ✅ 13 domain model tests
- ✅ 16 schema management tests
- ✅ 15 InMemoryGraphStore integration tests
- **Total: 44 tests, all passing**

## Documentation

### Getting Started
- **[User Guide](./USER_GUIDE.md)** - Comprehensive user guide with examples
- **[Quick Start](#quick-start)** - Get started in 5 minutes
- **[Migration Guide](../../developer/knowledge_graph/MIGRATION_GUIDE.md)** - Integrate knowledge graphs into existing apps

### Reference Documentation
- **[API Reference](./API_REFERENCE.md)** - Complete API documentation
- **[Configuration Guide](./CONFIGURATION_GUIDE.md)** - Configuration options and examples
- **[Performance Guide](./PERFORMANCE_GUIDE.md)** - Performance optimization tips
- **[Troubleshooting](./TROUBLESHOOTING.md)** - Common issues and solutions

### Developer Resources
- **[Developer Guide](../../developer/knowledge_graph/DEVELOPER_GUIDE.md)** - Extend knowledge graph components
- **Backend Development** - Custom backend development
  - [Custom Backend Guide](../../developer/knowledge_graph/backend/CUSTOM_BACKEND_GUIDE.md)
  - [SQLite Backend](../../developer/knowledge_graph/storage/SQLITE_BACKEND.md)
- **[Reasoning Guides](./reasoning/REASONING_ENGINE.md)** - Advanced reasoning features
  - [Reasoning Engine](./reasoning/REASONING_ENGINE.md)
  - [Logic Query Parser](./reasoning/logic_query_parser.md)
  - [Reranking Strategies Guide](./reasoning/reranking-strategies-guide.md)
  - [Schema Caching Guide](./reasoning/schema-caching-guide.md)

### Tutorials
- **[End-to-End Tutorial](./tutorials/END_TO_END_TUTORIAL.md)** - Complete workflow
- **[Domain-Specific Tutorial](./tutorials/DOMAIN_SPECIFIC_TUTORIAL.md)** - Build a medical knowledge graph
- **[Multi-Hop Reasoning Tutorial](./tutorials/MULTI_HOP_REASONING_TUTORIAL.md)** - Complex question answering
- **[CSV-to-Graph Tutorial](./examples/csv_to_graph_tutorial.md)** - Import structured data

### Deployment
- **[Production Deployment](./deployment/PRODUCTION_DEPLOYMENT.md)** - Production best practices
- **[Security Guide](./deployment/SECURITY.md)** - Security considerations

## Examples

### Tutorials
- [CSV-to-Graph Tutorial](./examples/csv_to_graph_tutorial.md) - Step-by-step CSV import guide
- [JSON-to-Graph Tutorial](./examples/json_to_graph_tutorial.md) - Step-by-step JSON import guide

### Example Scripts
Complete working examples demonstrating new features:

- **[18_spss_import_with_inference.py](./examples/18_spss_import_with_inference.py)** - Import SPSS files with automatic schema inference
- **[19_wide_format_normalization.py](./examples/19_wide_format_normalization.py)** - Reshape wide format data to normalized graph structure
- **[20_statistical_aggregation.py](./examples/20_statistical_aggregation.py)** - Compute statistics during import
- **[21_data_quality_validation.py](./examples/21_data_quality_validation.py)** - Validate data quality during import

See all examples in the [examples directory](./examples/) for complete working code.

## Structured Data Import

Import CSV, JSON, SPSS, and Excel data into knowledge graphs:

- **[Schema Mapping Guide](./SCHEMA_MAPPING_GUIDE.md)**: Complete guide to configuring schema mappings
- **[StructuredDataPipeline Guide](./STRUCTURED_DATA_PIPELINE.md)**: Usage guide for importing structured data
- **[CSV-to-Graph Tutorial](./examples/csv_to_graph_tutorial.md)**: Step-by-step CSV import tutorial
- **[JSON-to-Graph Tutorial](./examples/json_to_graph_tutorial.md)**: Step-by-step JSON import tutorial

**New Features:**
- **SPSS/Excel Support**: Direct import from `.sav`, `.por`, `.xlsx`, `.xls` files
- **Automatic Schema Inference**: Automatically generate schema mappings from data
- **Data Reshaping**: Convert wide format to normalized graph structures
- **Statistical Aggregation**: Compute mean, std dev, min, max during import
- **Data Quality Validation**: Range checks, outlier detection, completeness validation
- **Performance Optimization**: Parallel processing, bulk writes, streaming import

### Quick Example

```python
from aiecs.application.knowledge_graph.builder.schema_mapping import (
    SchemaMapping,
    EntityMapping
)
from aiecs.application.knowledge_graph.builder.structured_pipeline import StructuredDataPipeline
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

# Define mapping
mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["id", "name"],
            entity_type="Person",
            property_mapping={"id": "id", "name": "name"},
            id_column="id"
        )
    ]
)

# Import CSV
store = InMemoryGraphStore()
await store.initialize()
pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)
result = await pipeline.import_from_csv("data.csv")
print(f"Added {result.entities_added} entities")
```

## Development

### Project Structure

```
aiecs/
├── domain/knowledge_graph/          # Domain models and business logic
│   ├── models/                      # Entity, Relation, Path, Query models
│   └── schema/                      # Schema management
├── infrastructure/graph_storage/    # Storage implementations
│   ├── base.py                      # Two-tier GraphStore interface
│   └── in_memory.py                 # InMemory implementation
├── application/knowledge_graph/     # Application services (Phase 2+)
└── tools/knowledge_graph/           # Agent tools (Phase 2+)

test/
├── unit_tests/knowledge_graph/      # Unit tests
└── integration_tests/knowledge_graph/ # Integration tests
```

### Contributing

When implementing new storage backends:

1. Inherit from `GraphStore`
2. Implement all Tier 1 methods (7 methods)
3. Test that Tier 2 methods work automatically
4. Optionally optimize Tier 2 methods for your backend
5. Add integration tests

## New Features Quick Start

### Structured Data Import

```python
from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool

builder = KnowledgeGraphBuilderTool()
await builder._initialize()

# Import CSV with schema mapping
result = await builder.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="employees.csv",
    schema_mapping={
        "entity_mappings": [{
            "entity_type": "Person",
            "id_column": "person_id",
            "property_mappings": {"name": "full_name"}
        }]
    }
)
```

### Search with Reranking

```python
from aiecs.tools.knowledge_graph import GraphSearchTool

tool = GraphSearchTool()
result = await tool.run(
    op="graph_search",
    mode="hybrid",
    query="machine learning experts",
    enable_reranking=True,
    rerank_strategy="hybrid"
)
```

### Knowledge Fusion

```python
from aiecs.application.knowledge_graph.fusion import KnowledgeFusion

fusion = KnowledgeFusion(store, similarity_threshold=0.85)
stats = await fusion.fuse_cross_document_entities()
```

## Documentation

### Getting Started
- [Configuration Guide](./CONFIGURATION_GUIDE.md)
- [Performance Guide](./PERFORMANCE_GUIDE.md)
- [Runnable Pattern](./RUNNABLE_PATTERN.md)

### Data Import
- [Structured Data Pipeline](./STRUCTURED_DATA_PIPELINE.md)
- [Schema Mapping Guide](./SCHEMA_MAPPING_GUIDE.md)
- [CSV Tutorial](./examples/csv_to_graph_tutorial.md)
- [JSON Tutorial](./examples/json_to_graph_tutorial.md)

### Search and Reasoning
- [Result Reranker API](./reasoning/result-reranker-api.md)
- [Schema Caching Guide](./reasoning/schema-caching-guide.md)
- [Logic Query Parser](./reasoning/logic_query_parser.md)

### Tools
- [Graph Builder Tool](./tools/GRAPH_BUILDER_TOOL.md)
- [Graph Search Tool](./tools/GRAPH_SEARCH_TOOL.md)
- [Graph Reasoning Tool](./tools/GRAPH_REASONING_TOOL.md)

### Deployment
- [Production Deployment](./deployment/PRODUCTION_DEPLOYMENT.md)
- [Security Guide](./deployment/SECURITY.md)

## Performance

- **CSV Import**: 100-300 rows/second
- **Reranking**: 50-300ms latency
- **Schema Cache**: 70-95% hit rate, 3-5x speedup
- **Query Optimization**: 40-70% faster

See [Performance Guide](./PERFORMANCE_GUIDE.md) for details.

## Roadmap

- [x] **Phase 1**: Foundation (Domain models, Schema, Two-tier interface, InMemory store)
- [x] **Phase 2**: Knowledge Graph Builder (Extract entities/relations from documents)
- [x] **Phase 3**: Storage Backends (SQLite, PostgreSQL with pgvector)
- [x] **Phase 4**: Enhanced Capabilities (Runnable pattern, Knowledge fusion, Reranking, Query optimization)
- [x] **Phase 5**: Testing & Documentation (111+ unit tests, 67 integration tests, comprehensive docs)
- [ ] **Phase 6**: Advanced Features (Visualization, Advanced reasoning, Real-time updates)

## License

MIT License - See the project root LICENSE file for details.