AIECS Knowledge Graph User Guide
Introduction
Welcome to the AIECS Knowledge Graph User Guide! This guide will help you get started with building, querying, and reasoning over knowledge graphs in your AI applications.
What is a Knowledge Graph?
A knowledge graph is a structured representation of knowledge that captures:
Entities: Things in your domain (people, companies, products, concepts)
Relations: Connections between entities (works_for, located_in, knows)
Properties: Attributes of entities and relations (name, age, start_date)
Knowledge graphs enable:
Structured Knowledge Storage: Organize information in a queryable format
Multi-Hop Reasoning: Answer complex questions by traversing relationships
Knowledge Fusion: Merge information from multiple sources
Semantic Search: Find relevant information using meaning, not just keywords
Why Use AIECS Knowledge Graph?
Self-Contained: No external graph database required
Multiple Backends: InMemory, SQLite, PostgreSQL - choose what fits your needs
Easy to Use: Simple API for common operations
Powerful: Advanced features like reasoning, fusion, and optimization
Extensible: Add custom storage backends easily
Quick Start
Installation
AIECS Knowledge Graph is included with AIECS. Install the optional dependencies for specific backends:
# For SQLite support (included by default)
pip install aiecs
# For PostgreSQL support
pip install aiecs[postgres]
# For all features
pip install aiecs[all]
Your First Knowledge Graph
Let’s create a simple knowledge graph about people and companies:
import asyncio
from aiecs.domain.knowledge_graph.models.entity import Entity
from aiecs.domain.knowledge_graph.models.relation import Relation
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore
async def main():
# 1. Initialize storage
store = InMemoryGraphStore()
await store.initialize()
# 2. Create entities
alice = Entity(
id="alice",
entity_type="Person",
properties={"name": "Alice Smith", "age": 30, "role": "Engineer"}
)
bob = Entity(
id="bob",
entity_type="Person",
properties={"name": "Bob Jones", "age": 25, "role": "Designer"}
)
tech_corp = Entity(
id="tech_corp",
entity_type="Company",
properties={"name": "Tech Corp", "industry": "Technology"}
)
# 3. Add entities to graph
await store.add_entity(alice)
await store.add_entity(bob)
await store.add_entity(tech_corp)
# 4. Create relations
alice_works = Relation(
id="rel_1",
relation_type="WORKS_FOR",
source_id="alice",
target_id="tech_corp",
properties={"start_date": "2020-01-01"}
)
bob_works = Relation(
id="rel_2",
relation_type="WORKS_FOR",
source_id="bob",
target_id="tech_corp",
properties={"start_date": "2021-06-01"}
)
alice_knows_bob = Relation(
id="rel_3",
relation_type="KNOWS",
source_id="alice",
target_id="bob"
)
# 5. Add relations to graph
await store.add_relation(alice_works)
await store.add_relation(bob_works)
await store.add_relation(alice_knows_bob)
# 6. Query the graph
# Get Alice's neighbors
neighbors = await store.get_neighbors("alice", direction="outgoing")
print(f"Alice is connected to: {[n.properties['name'] for n in neighbors]}")
# Find paths from Alice
paths = await store.traverse("alice", max_depth=2)
print(f"Found {len(paths)} paths from Alice")
# 7. Cleanup
await store.close()
# Run
asyncio.run(main())
Output:
Alice is connected to: ['Bob Jones', 'Tech Corp']
Found 3 paths from Alice
Congratulations! You’ve created your first knowledge graph.
Core Concepts
Entities
Entities represent nodes in your knowledge graph. Each entity has:
ID: Unique identifier
Type: Category (Person, Company, Product, etc.)
Properties: Key-value attributes
Metadata: Optional metadata (source, confidence, timestamps)
entity = Entity(
id="unique_id",
entity_type="Person",
properties={
"name": "Alice",
"age": 30,
"email": "alice@example.com"
},
metadata={
"source": "document_1",
"confidence": 0.95
}
)
Relations
Relations represent edges connecting entities. Each relation has:
ID: Unique identifier
Type: Relationship type (WORKS_FOR, KNOWS, LOCATED_IN, etc.)
Source: Starting entity ID
Target: Ending entity ID
Properties: Relationship attributes
Metadata: Optional metadata
relation = Relation(
id="rel_id",
relation_type="WORKS_FOR",
source_id="person_1",
target_id="company_1",
properties={
"role": "Engineer",
"start_date": "2020-01-01"
}
)
Paths
Paths represent sequences of entities connected by relations:
from aiecs.domain.knowledge_graph.models.path import Path
path = Path(
entities=[alice, tech_corp, project],
relations=[works_for_relation, manages_relation],
score=0.85
)
print(f"Path length: {path.length()} hops")
print(f"Entities: {path.get_entity_ids()}")
Storage Backends
AIECS provides three built-in storage backends:
InMemoryGraphStore
Best for: Development, testing, small graphs (< 100K nodes)
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore
store = InMemoryGraphStore()
await store.initialize()
Pros: Very fast, no setup required Cons: Data lost when process ends, limited by RAM
SQLiteGraphStore
Best for: Production apps, persistent storage, medium graphs (< 1M nodes)
from aiecs.infrastructure.graph_storage.sqlite import SQLiteGraphStore
store = SQLiteGraphStore(db_path="knowledge.db")
await store.initialize()
Pros: Persistent, no server required, optimized queries Cons: Single-process only, slower than in-memory
PostgreSQLGraphStore
Best for: Large-scale production, multi-user apps, huge graphs (10M+ nodes)
from aiecs.infrastructure.graph_storage.postgresql import PostgreSQLGraphStore
store = PostgreSQLGraphStore(
connection_string="postgresql://user:pass@localhost/db"
)
await store.initialize()
Pros: Scalable, concurrent access, pgvector support Cons: Requires PostgreSQL server
Common Tasks
Task 1: Building a Graph from Text
Extract entities and relations from unstructured text:
from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool
# Initialize tool
builder = KnowledgeGraphBuilderTool()
await builder._initialize()
# Extract from text
text = """
Alice Smith is a software engineer at Tech Corp in San Francisco.
She has been working there since 2020 and leads the AI team.
Bob Jones, a designer, also works at Tech Corp.
"""
result = await builder.run(
op="kg_builder",
action="build_from_text",
text=text,
entity_types=["Person", "Company", "Location"]
)
print(f"Extracted {result['entities_added']} entities")
print(f"Extracted {result['relations_added']} relations")
Task 2: Importing CSV Data
Import structured data from CSV files:
from aiecs.application.knowledge_graph.builder.schema_mapping import (
SchemaMapping,
EntityMapping,
RelationMapping
)
from aiecs.application.knowledge_graph.builder.structured_pipeline import (
StructuredDataPipeline
)
# Define schema mapping
mapping = SchemaMapping(
entity_mappings=[
EntityMapping(
source_columns=["person_id", "name", "age"],
entity_type="Person",
property_mapping={
"id": "person_id",
"name": "name",
"age": "age"
},
id_column="person_id"
),
EntityMapping(
source_columns=["company_id", "company_name"],
entity_type="Company",
property_mapping={
"id": "company_id",
"name": "company_name"
},
id_column="company_id"
)
],
relation_mappings=[
RelationMapping(
source_id_column="person_id",
target_id_column="company_id",
relation_type="WORKS_FOR",
property_mapping={
"role": "role",
"start_date": "start_date"
}
)
]
)
# Import CSV
pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)
result = await pipeline.import_from_csv("employees.csv")
print(f"Imported {result.entities_added} entities")
print(f"Imported {result.relations_added} relations")
Task 3: Searching the Graph
Perform different types of searches:
from aiecs.tools.knowledge_graph import GraphSearchTool
search_tool = GraphSearchTool()
await search_tool._initialize()
# Vector search (semantic similarity)
result = await search_tool.run(
op="graph_search",
mode="vector",
query="machine learning experts",
top_k=10
)
# Graph traversal search
result = await search_tool.run(
op="graph_search",
mode="graph",
start_entity_id="alice",
max_depth=3,
relation_types=["WORKS_FOR", "KNOWS"]
)
# Hybrid search (combines vector + graph)
result = await search_tool.run(
op="graph_search",
mode="hybrid",
query="senior engineers in San Francisco",
top_k=10,
enable_reranking=True,
rerank_strategy="hybrid"
)
Task 4: Multi-Hop Reasoning
Answer complex questions by traversing the graph:
from aiecs.tools.knowledge_graph import GraphReasoningTool
reasoning_tool = GraphReasoningTool()
await reasoning_tool._initialize()
# Multi-hop question answering
result = await reasoning_tool.run(
op="graph_reasoning",
mode="multi_hop",
query="How is Alice connected to Project X?",
start_entity_id="alice",
end_entity_id="project_x",
max_hops=5
)
print(f"Answer: {result['answer']}")
print(f"Reasoning steps: {result['reasoning_steps']}")
print(f"Evidence paths: {len(result['paths'])}")
Task 5: Knowledge Fusion
Merge duplicate entities from multiple sources:
from aiecs.application.knowledge_graph.fusion import KnowledgeFusion
# After importing data from multiple sources
fusion = KnowledgeFusion(
graph_store=store,
similarity_threshold=0.85,
conflict_resolution_strategy="most_complete"
)
# Fuse entities
stats = await fusion.fuse_cross_document_entities(
entity_types=["Person", "Company"]
)
print(f"Analyzed {stats['entities_analyzed']} entities")
print(f"Merged {stats['entities_merged']} duplicates")
print(f"Resolved {stats['conflicts_resolved']} conflicts")
Schema Management
Define and validate your knowledge graph schema:
from aiecs.domain.knowledge_graph.schema import (
SchemaManager,
EntityType,
RelationType,
PropertySchema,
PropertyType
)
# Create schema manager
manager = SchemaManager()
# Define entity type
person_type = EntityType(
name="Person",
description="A person entity",
properties={
"name": PropertySchema(
name="name",
property_type=PropertyType.STRING,
required=True
),
"age": PropertySchema(
name="age",
property_type=PropertyType.INTEGER,
min_value=0,
max_value=150
),
"email": PropertySchema(
name="email",
property_type=PropertyType.STRING,
required=False
)
}
)
manager.create_entity_type(person_type)
# Define relation type
works_for_type = RelationType(
name="WORKS_FOR",
description="Employment relationship",
source_entity_types=["Person"],
target_entity_types=["Company"],
properties={
"role": PropertySchema(
name="role",
property_type=PropertyType.STRING
),
"start_date": PropertySchema(
name="start_date",
property_type=PropertyType.DATE
)
}
)
manager.create_relation_type(works_for_type)
# Validate entities
is_valid = manager.validate_entity("Person", {
"name": "Alice",
"age": 30,
"email": "alice@example.com"
})
print(f"Entity valid: {is_valid}")
Best Practices
1. Choose the Right Storage Backend
Development/Testing: Use
InMemoryGraphStorefor fast iterationSmall Production Apps: Use
SQLiteGraphStorefor persistence without serverLarge Production Apps: Use
PostgreSQLGraphStorefor scale and concurrency
2. Define Your Schema
Always define entity and relation types before building your graph:
# Define schema first
manager.create_entity_type(person_type)
manager.create_relation_type(works_for_type)
# Then build graph
await store.add_entity(entity)
3. Use Meaningful IDs
Use descriptive, stable IDs for entities:
# Good: Stable, meaningful IDs
entity = Entity(id="person_alice_smith", ...)
entity = Entity(id="company_tech_corp", ...)
# Avoid: Random UUIDs unless necessary
entity = Entity(id="a1b2c3d4-...", ...)
4. Add Metadata
Include source and confidence information:
entity = Entity(
id="person_1",
entity_type="Person",
properties={"name": "Alice"},
metadata={
"source": "document_1",
"confidence": 0.95,
"extracted_at": "2025-11-15T10:00:00Z"
}
)
5. Use Knowledge Fusion
When importing from multiple sources, use fusion to merge duplicates:
# Import from multiple sources
await pipeline.import_from_csv("source1.csv")
await pipeline.import_from_csv("source2.csv")
# Fuse duplicates
fusion = KnowledgeFusion(store, similarity_threshold=0.85)
await fusion.fuse_cross_document_entities()
6. Enable Reranking for Better Results
Use reranking to improve search quality:
result = await search_tool.run(
op="graph_search",
mode="hybrid",
query="machine learning experts",
enable_reranking=True,
rerank_strategy="hybrid" # Combines multiple signals
)
7. Optimize Performance
Use schema caching for repeated queries
Batch operations when possible
Choose appropriate max_depth for traversals
Use filters to reduce result sets
Next Steps
Tutorials: See End-to-End Tutorial and Multi-Hop Reasoning Tutorial for step-by-step guides
Examples: Check CSV-to-Graph Tutorial and JSON-to-Graph Tutorial for working code
API Reference: Read API_REFERENCE.md for detailed API docs
Performance: See PERFORMANCE_GUIDE.md for optimization tips
Troubleshooting: Check TROUBLESHOOTING.md for common issues
Getting Help
Documentation: Browse the docs/knowledge_graph/ directory
Examples: See CSV-to-Graph Tutorial and JSON-to-Graph Tutorial for examples
Issues: Report bugs or request features on GitHub
Happy knowledge graphing! 🚀