# AIECS Knowledge Graph User Guide

## Introduction

Welcome to the AIECS Knowledge Graph User Guide! This guide will help you get started with building, querying, and reasoning over knowledge graphs in your AI applications.

### What is a Knowledge Graph?

A knowledge graph is a structured representation of knowledge that captures:
- **Entities**: Things in your domain (people, companies, products, concepts)
- **Relations**: Connections between entities (works_for, located_in, knows)
- **Properties**: Attributes of entities and relations (name, age, start_date)

Knowledge graphs enable:
- **Structured Knowledge Storage**: Organize information in a queryable format
- **Multi-Hop Reasoning**: Answer complex questions by traversing relationships
- **Knowledge Fusion**: Merge information from multiple sources
- **Semantic Search**: Find relevant information using meaning, not just keywords

### Why Use AIECS Knowledge Graph?

- **Self-Contained**: No external graph database required
- **Multiple Backends**: InMemory, SQLite, PostgreSQL - choose what fits your needs
- **Easy to Use**: Simple API for common operations
- **Powerful**: Advanced features like reasoning, fusion, and optimization
- **Extensible**: Add custom storage backends easily

## Quick Start

### Installation

AIECS Knowledge Graph is included with AIECS. Install the optional dependencies for specific backends:

```bash
# For SQLite support (included by default)
pip install aiecs

# For PostgreSQL support
pip install aiecs[postgres]

# For all features
pip install aiecs[all]
```

### Your First Knowledge Graph

Let's create a simple knowledge graph about people and companies:

```python
import asyncio
from aiecs.domain.knowledge_graph.models.entity import Entity
from aiecs.domain.knowledge_graph.models.relation import Relation
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

async def main():
    # 1. Initialize storage
    store = InMemoryGraphStore()
    await store.initialize()

    # 2. Create entities
    alice = Entity(
        id="alice",
        entity_type="Person",
        properties={"name": "Alice Smith", "age": 30, "role": "Engineer"}
    )

    bob = Entity(
        id="bob",
        entity_type="Person",
        properties={"name": "Bob Jones", "age": 25, "role": "Designer"}
    )

    tech_corp = Entity(
        id="tech_corp",
        entity_type="Company",
        properties={"name": "Tech Corp", "industry": "Technology"}
    )

    # 3. Add entities to graph
    await store.add_entity(alice)
    await store.add_entity(bob)
    await store.add_entity(tech_corp)

    # 4. Create relations
    alice_works = Relation(
        id="rel_1",
        relation_type="WORKS_FOR",
        source_id="alice",
        target_id="tech_corp",
        properties={"start_date": "2020-01-01"}
    )

    bob_works = Relation(
        id="rel_2",
        relation_type="WORKS_FOR",
        source_id="bob",
        target_id="tech_corp",
        properties={"start_date": "2021-06-01"}
    )

    alice_knows_bob = Relation(
        id="rel_3",
        relation_type="KNOWS",
        source_id="alice",
        target_id="bob"
    )

    # 5. Add relations to graph
    await store.add_relation(alice_works)
    await store.add_relation(bob_works)
    await store.add_relation(alice_knows_bob)

    # 6. Query the graph
    # Get Alice's neighbors
    neighbors = await store.get_neighbors("alice", direction="outgoing")
    print(f"Alice is connected to: {[n.properties['name'] for n in neighbors]}")

    # Find paths from Alice
    paths = await store.traverse("alice", max_depth=2)
    print(f"Found {len(paths)} paths from Alice")

    # 7. Cleanup
    await store.close()

# Run
asyncio.run(main())
```

**Output**:
```
Alice is connected to: ['Bob Jones', 'Tech Corp']
Found 3 paths from Alice
```

Congratulations! You've created your first knowledge graph.

## Core Concepts

### Entities

Entities represent nodes in your knowledge graph. Each entity has:
- **ID**: Unique identifier
- **Type**: Category (Person, Company, Product, etc.)
- **Properties**: Key-value attributes
- **Metadata**: Optional metadata (source, confidence, timestamps)

```python
entity = Entity(
    id="unique_id",
    entity_type="Person",
    properties={
        "name": "Alice",
        "age": 30,
        "email": "alice@example.com"
    },
    metadata={
        "source": "document_1",
        "confidence": 0.95
    }
)
```

### Relations

Relations represent edges connecting entities. Each relation has:
- **ID**: Unique identifier
- **Type**: Relationship type (WORKS_FOR, KNOWS, LOCATED_IN, etc.)
- **Source**: Starting entity ID
- **Target**: Ending entity ID
- **Properties**: Relationship attributes
- **Metadata**: Optional metadata

```python
relation = Relation(
    id="rel_id",
    relation_type="WORKS_FOR",
    source_id="person_1",
    target_id="company_1",
    properties={
        "role": "Engineer",
        "start_date": "2020-01-01"
    }
)
```

### Paths

Paths represent sequences of entities connected by relations:

```python
from aiecs.domain.knowledge_graph.models.path import Path

path = Path(
    entities=[alice, tech_corp, project],
    relations=[works_for_relation, manages_relation],
    score=0.85
)

print(f"Path length: {path.length()} hops")
print(f"Entities: {path.get_entity_ids()}")
```

### Storage Backends

AIECS provides three built-in storage backends:

#### InMemoryGraphStore

**Best for**: Development, testing, small graphs (< 100K nodes)

```python
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore

store = InMemoryGraphStore()
await store.initialize()
```

**Pros**: Very fast, no setup required
**Cons**: Data lost when process ends, limited by RAM

#### SQLiteGraphStore

**Best for**: Production apps, persistent storage, medium graphs (< 1M nodes)

```python
from aiecs.infrastructure.graph_storage.sqlite import SQLiteGraphStore

store = SQLiteGraphStore(db_path="knowledge.db")
await store.initialize()
```

**Pros**: Persistent, no server required, optimized queries
**Cons**: Single-process only, slower than in-memory

#### PostgreSQLGraphStore

**Best for**: Large-scale production, multi-user apps, huge graphs (10M+ nodes)

```python
from aiecs.infrastructure.graph_storage.postgresql import PostgreSQLGraphStore

store = PostgreSQLGraphStore(
    connection_string="postgresql://user:pass@localhost/db"
)
await store.initialize()
```

**Pros**: Scalable, concurrent access, pgvector support
**Cons**: Requires PostgreSQL server

## Common Tasks

### Task 1: Building a Graph from Text

Extract entities and relations from unstructured text:

```python
from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool

# Initialize tool
builder = KnowledgeGraphBuilderTool()
await builder._initialize()

# Extract from text
text = """
Alice Smith is a software engineer at Tech Corp in San Francisco.
She has been working there since 2020 and leads the AI team.
Bob Jones, a designer, also works at Tech Corp.
"""

result = await builder.run(
    op="kg_builder",
    action="build_from_text",
    text=text,
    entity_types=["Person", "Company", "Location"]
)

print(f"Extracted {result['entities_added']} entities")
print(f"Extracted {result['relations_added']} relations")
```

### Task 2: Importing CSV Data

Import structured data from CSV files:

```python
from aiecs.application.knowledge_graph.builder.schema_mapping import (
    SchemaMapping,
    EntityMapping,
    RelationMapping
)
from aiecs.application.knowledge_graph.builder.structured_pipeline import (
    StructuredDataPipeline
)

# Define schema mapping
mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["person_id", "name", "age"],
            entity_type="Person",
            property_mapping={
                "id": "person_id",
                "name": "name",
                "age": "age"
            },
            id_column="person_id"
        ),
        EntityMapping(
            source_columns=["company_id", "company_name"],
            entity_type="Company",
            property_mapping={
                "id": "company_id",
                "name": "company_name"
            },
            id_column="company_id"
        )
    ],
    relation_mappings=[
        RelationMapping(
            source_id_column="person_id",
            target_id_column="company_id",
            relation_type="WORKS_FOR",
            property_mapping={
                "role": "role",
                "start_date": "start_date"
            }
        )
    ]
)

# Import CSV
pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)
result = await pipeline.import_from_csv("employees.csv")

print(f"Imported {result.entities_added} entities")
print(f"Imported {result.relations_added} relations")
```

### Task 3: Searching the Graph

Perform different types of searches:

```python
from aiecs.tools.knowledge_graph import GraphSearchTool

search_tool = GraphSearchTool()
await search_tool._initialize()

# Vector search (semantic similarity)
result = await search_tool.run(
    op="graph_search",
    mode="vector",
    query="machine learning experts",
    top_k=10
)

# Graph traversal search
result = await search_tool.run(
    op="graph_search",
    mode="graph",
    start_entity_id="alice",
    max_depth=3,
    relation_types=["WORKS_FOR", "KNOWS"]
)

# Hybrid search (combines vector + graph)
result = await search_tool.run(
    op="graph_search",
    mode="hybrid",
    query="senior engineers in San Francisco",
    top_k=10,
    enable_reranking=True,
    rerank_strategy="hybrid"
)
```

### Task 4: Multi-Hop Reasoning

Answer complex questions by traversing the graph:

```python
from aiecs.tools.knowledge_graph import GraphReasoningTool

reasoning_tool = GraphReasoningTool()
await reasoning_tool._initialize()

# Multi-hop question answering
result = await reasoning_tool.run(
    op="graph_reasoning",
    mode="multi_hop",
    query="How is Alice connected to Project X?",
    start_entity_id="alice",
    end_entity_id="project_x",
    max_hops=5
)

print(f"Answer: {result['answer']}")
print(f"Reasoning steps: {result['reasoning_steps']}")
print(f"Evidence paths: {len(result['paths'])}")
```

### Task 5: Knowledge Fusion

Merge duplicate entities from multiple sources:

```python
from aiecs.application.knowledge_graph.fusion import KnowledgeFusion

# After importing data from multiple sources
fusion = KnowledgeFusion(
    graph_store=store,
    similarity_threshold=0.85,
    conflict_resolution_strategy="most_complete"
)

# Fuse entities
stats = await fusion.fuse_cross_document_entities(
    entity_types=["Person", "Company"]
)

print(f"Analyzed {stats['entities_analyzed']} entities")
print(f"Merged {stats['entities_merged']} duplicates")
print(f"Resolved {stats['conflicts_resolved']} conflicts")
```

## Schema Management

Define and validate your knowledge graph schema:

```python
from aiecs.domain.knowledge_graph.schema import (
    SchemaManager,
    EntityType,
    RelationType,
    PropertySchema,
    PropertyType
)

# Create schema manager
manager = SchemaManager()

# Define entity type
person_type = EntityType(
    name="Person",
    description="A person entity",
    properties={
        "name": PropertySchema(
            name="name",
            property_type=PropertyType.STRING,
            required=True
        ),
        "age": PropertySchema(
            name="age",
            property_type=PropertyType.INTEGER,
            min_value=0,
            max_value=150
        ),
        "email": PropertySchema(
            name="email",
            property_type=PropertyType.STRING,
            required=False
        )
    }
)
manager.create_entity_type(person_type)

# Define relation type
works_for_type = RelationType(
    name="WORKS_FOR",
    description="Employment relationship",
    source_entity_types=["Person"],
    target_entity_types=["Company"],
    properties={
        "role": PropertySchema(
            name="role",
            property_type=PropertyType.STRING
        ),
        "start_date": PropertySchema(
            name="start_date",
            property_type=PropertyType.DATE
        )
    }
)
manager.create_relation_type(works_for_type)

# Validate entities
is_valid = manager.validate_entity("Person", {
    "name": "Alice",
    "age": 30,
    "email": "alice@example.com"
})
print(f"Entity valid: {is_valid}")
```

## Best Practices

### 1. Choose the Right Storage Backend

- **Development/Testing**: Use `InMemoryGraphStore` for fast iteration
- **Small Production Apps**: Use `SQLiteGraphStore` for persistence without server
- **Large Production Apps**: Use `PostgreSQLGraphStore` for scale and concurrency

### 2. Define Your Schema

Always define entity and relation types before building your graph:

```python
# Define schema first
manager.create_entity_type(person_type)
manager.create_relation_type(works_for_type)

# Then build graph
await store.add_entity(entity)
```

### 3. Use Meaningful IDs

Use descriptive, stable IDs for entities:

```python
# Good: Stable, meaningful IDs
entity = Entity(id="person_alice_smith", ...)
entity = Entity(id="company_tech_corp", ...)

# Avoid: Random UUIDs unless necessary
entity = Entity(id="a1b2c3d4-...", ...)
```

### 4. Add Metadata

Include source and confidence information:

```python
entity = Entity(
    id="person_1",
    entity_type="Person",
    properties={"name": "Alice"},
    metadata={
        "source": "document_1",
        "confidence": 0.95,
        "extracted_at": "2025-11-15T10:00:00Z"
    }
)
```

### 5. Use Knowledge Fusion

When importing from multiple sources, use fusion to merge duplicates:

```python
# Import from multiple sources
await pipeline.import_from_csv("source1.csv")
await pipeline.import_from_csv("source2.csv")

# Fuse duplicates
fusion = KnowledgeFusion(store, similarity_threshold=0.85)
await fusion.fuse_cross_document_entities()
```

### 6. Enable Reranking for Better Results

Use reranking to improve search quality:

```python
result = await search_tool.run(
    op="graph_search",
    mode="hybrid",
    query="machine learning experts",
    enable_reranking=True,
    rerank_strategy="hybrid"  # Combines multiple signals
)
```

### 7. Optimize Performance

- Use schema caching for repeated queries
- Batch operations when possible
- Choose appropriate max_depth for traversals
- Use filters to reduce result sets

## Next Steps

- **Tutorials**: See [End-to-End Tutorial](./tutorials/END_TO_END_TUTORIAL.md) and [Multi-Hop Reasoning Tutorial](./tutorials/MULTI_HOP_REASONING_TUTORIAL.md) for step-by-step guides
- **Examples**: Check [CSV-to-Graph Tutorial](./examples/csv_to_graph_tutorial.md) and [JSON-to-Graph Tutorial](./examples/json_to_graph_tutorial.md) for working code
- **API Reference**: Read [API_REFERENCE.md](./API_REFERENCE.md) for detailed API docs
- **Performance**: See [PERFORMANCE_GUIDE.md](./PERFORMANCE_GUIDE.md) for optimization tips
- **Troubleshooting**: Check [TROUBLESHOOTING.md](./TROUBLESHOOTING.md) for common issues

## Getting Help

- **Documentation**: Browse the [docs/knowledge_graph/](.) directory
- **Examples**: See [CSV-to-Graph Tutorial](./examples/csv_to_graph_tutorial.md) and [JSON-to-Graph Tutorial](./examples/json_to_graph_tutorial.md) for examples
- **Issues**: Report bugs or request features on GitHub

Happy knowledge graphing! 🚀