Knowledge Graph Configuration Guide
This guide explains how to configure the knowledge graph capabilities in AIECS.
Table of Contents
Quick Start
Minimal Configuration (In-Memory)
No configuration needed! The default settings use in-memory storage:
from aiecs.config import get_settings
settings = get_settings()
# Uses inmemory backend by default
Development Configuration (SQLite)
Add to your .env file:
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db
Production Configuration (PostgreSQL)
Add to your .env file:
KG_STORAGE_BACKEND=postgresql
# Use main database (default)
# OR use a separate database:
KG_DB_HOST=localhost
KG_DB_PORT=5432
KG_DB_USER=kg_user
KG_DB_PASSWORD=your_password
KG_DB_NAME=aiecs_knowledge_graph
Storage Backends
AIECS supports three storage backends for knowledge graphs:
1. In-Memory (Default)
Use Case: Development, testing, small graphs
Pros: Fast, no setup required
Cons: Data lost on restart, limited by RAM
Max Nodes: 100,000 (configurable)
KG_STORAGE_BACKEND=inmemory
2. SQLite
Use Case: Development, embedded applications, file-based persistence
Pros: Simple, portable, ACID transactions
Cons: Single-writer, limited concurrency
Best For: Single-user applications, up to ~1M nodes
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db
3. PostgreSQL (Recommended for Production)
Use Case: Production, multi-user, large-scale graphs
Pros: Scalable, concurrent, ACID transactions, connection pooling
Cons: Requires database setup
Best For: Production applications, millions of nodes
KG_STORAGE_BACKEND=postgresql
Environment Variables
Core Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Storage backend: |
SQLite Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Path to SQLite database file |
PostgreSQL Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
PostgreSQL connection string (DSN) |
|
string |
|
Database host (falls back to main |
|
int |
|
Database port |
|
string |
|
Database user (falls back to main |
|
string |
|
Database password (falls back to main |
|
string |
|
Database name (default: |
|
int |
|
Minimum connection pool size |
|
int |
|
Maximum connection pool size |
|
bool |
|
Enable pgvector extension for optimized vector search |
In-Memory Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Maximum number of nodes for in-memory storage |
Query Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Default number of results to return in searches |
|
int |
|
Maximum depth for graph traversal queries (1-10) |
|
int |
|
Dimension of embedding vectors (OpenAI ada-002 default) |
Cache Configuration
Variable |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Enable caching of query results |
|
int |
|
Time-to-live for cached query results (seconds) |
Configuration Properties
Access configuration programmatically:
from aiecs.config import get_settings
settings = get_settings()
# Get database configuration for current backend
db_config = settings.kg_database_config
# Returns different config based on backend:
# - PostgreSQL: {"host": ..., "port": ..., "user": ..., etc.}
# - SQLite: {"db_path": ...}
# - In-memory: {"max_nodes": ...}
# Get query configuration
query_config = settings.kg_query_config
# Returns: {
# "default_search_limit": 10,
# "max_traversal_depth": 5,
# "vector_dimension": 1536
# }
# Get cache configuration
cache_config = settings.kg_cache_config
# Returns: {
# "enable_query_cache": True,
# "cache_ttl_seconds": 300
# }
Backend-Specific Configuration
PostgreSQL: Using Main Database
By default, if you don’t set KG-specific database parameters, the knowledge graph uses your main AIECS database:
# Main database config
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs
# Knowledge graph uses main DB
KG_STORAGE_BACKEND=postgresql
The knowledge graph creates its own tables (graph_entities, graph_relations) within the main database.
PostgreSQL: Separate Database
For better isolation, use a separate database:
# Main database config
DB_HOST=localhost
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs
# Separate knowledge graph database
KG_STORAGE_BACKEND=postgresql
KG_DB_HOST=localhost
KG_DB_USER=kg_user
KG_DB_PASSWORD=kg_password
KG_DB_NAME=aiecs_knowledge_graph
PostgreSQL: Cloud Database (Connection String)
For cloud databases (e.g., Google Cloud SQL, AWS RDS):
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:password@host:5432/dbname?sslmode=require
PostgreSQL: Connection Pooling
Optimize for your workload:
# For high-concurrency applications
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50
# For low-concurrency applications
KG_MIN_POOL_SIZE=2
KG_MAX_POOL_SIZE=10
PostgreSQL: pgvector Extension
Enable optimized vector search (requires pgvector installed):
KG_ENABLE_PGVECTOR=true
Prerequisites:
Install pgvector extension in your PostgreSQL database
The extension will be automatically used for vector similarity search
SQLite: Memory vs. File
# File-based persistence (recommended)
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db
# In-memory SQLite (no persistence)
KG_SQLITE_DB_PATH=:memory:
Query Configuration
Search Limits
Control the number of results returned:
# Return more results (e.g., for comprehensive search)
KG_DEFAULT_SEARCH_LIMIT=50
# Return fewer results (e.g., for quick queries)
KG_DEFAULT_SEARCH_LIMIT=5
Traversal Depth
Control how deep graph traversals can go:
# Shallow traversals (faster, less comprehensive)
KG_MAX_TRAVERSAL_DEPTH=3
# Deep traversals (slower, more comprehensive)
KG_MAX_TRAVERSAL_DEPTH=7
Warning: Values > 10 may cause performance issues.
Vector Dimensions
Match your embedding model:
# OpenAI ada-002 (default)
KG_VECTOR_DIMENSION=1536
# OpenAI text-embedding-3-small
KG_VECTOR_DIMENSION=1536
# OpenAI text-embedding-3-large
KG_VECTOR_DIMENSION=3072
# Sentence Transformers (various)
KG_VECTOR_DIMENSION=384 # all-MiniLM-L6-v2
KG_VECTOR_DIMENSION=768 # all-mpnet-base-v2
Cache Configuration
Enable/Disable Caching
# Enable caching (recommended for production)
KG_ENABLE_QUERY_CACHE=true
# Disable caching (for development/debugging)
KG_ENABLE_QUERY_CACHE=false
Cache TTL
Control how long cached results remain valid:
# Short TTL (frequently changing data)
KG_CACHE_TTL_SECONDS=60
# Long TTL (stable data)
KG_CACHE_TTL_SECONDS=3600
Validation
Automatic Validation
Configuration is automatically validated when settings are loaded:
from aiecs.config import get_settings
try:
settings = get_settings()
except ValueError as e:
print(f"Configuration error: {e}")
Manual Validation
Validate configuration for specific operations:
from aiecs.config import validate_required_settings
# Validate knowledge graph configuration
try:
validate_required_settings("knowledge_graph")
print("Knowledge graph configuration is valid")
except ValueError as e:
print(f"Missing configuration: {e}")
Validation Rules
KG_STORAGE_BACKEND: Must be
inmemory,sqlite, orpostgresqlKG_SQLITE_DB_PATH: Parent directory is automatically created
KG_MAX_TRAVERSAL_DEPTH: Must be ≥ 1; warning if > 10
KG_VECTOR_DIMENSION: Must be ≥ 1; warning if not a common dimension
PostgreSQL: At least one of KG_POSTGRES_URL, KG_DB_HOST, or main DB_PASSWORD must be set
Examples
Example 1: Development Setup (SQLite)
.env:
# SQLite for development
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_knowledge_graph.db
# Disable caching for development
KG_ENABLE_QUERY_CACHE=false
# More verbose search
KG_DEFAULT_SEARCH_LIMIT=20
Example 2: Production Setup (PostgreSQL)
.env:
# PostgreSQL for production
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://kg_user:password@db.example.com:5432/aiecs_kg?sslmode=require
# Optimize connection pooling
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50
# Enable pgvector
KG_ENABLE_PGVECTOR=true
# Production query settings
KG_DEFAULT_SEARCH_LIMIT=10
KG_MAX_TRAVERSAL_DEPTH=5
# Enable caching
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=600
Example 3: Testing Setup (In-Memory)
.env.test:
# In-memory for fast tests
KG_STORAGE_BACKEND=inmemory
KG_INMEMORY_MAX_NODES=10000
# Disable caching for predictable tests
KG_ENABLE_QUERY_CACHE=false
Example 4: Programmatic Configuration
from aiecs.infrastructure.graph_storage import (
InMemoryGraphStore,
SQLiteGraphStore,
PostgresGraphStore
)
from aiecs.config import get_settings
settings = get_settings()
# Create store based on backend configuration
if settings.kg_storage_backend == "inmemory":
store = InMemoryGraphStore()
elif settings.kg_storage_backend == "sqlite":
config = settings.kg_database_config
store = SQLiteGraphStore(db_path=config["db_path"])
elif settings.kg_storage_backend == "postgresql":
config = settings.kg_database_config
store = PostgresGraphStore(**config)
await store.initialize()
# Use the store
# ...
await store.close()
Example 5: Multi-Environment Setup
Use different .env files for different environments:
.env.development:
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_kg.db
.env.staging:
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@staging-db:5432/aiecs_kg
.env.production:
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@prod-db:5432/aiecs_kg
KG_MIN_POOL_SIZE=20
KG_MAX_POOL_SIZE=100
KG_ENABLE_PGVECTOR=true
Load the appropriate file:
# Development
export ENV_FILE=.env.development
python -m aiecs
# Staging
export ENV_FILE=.env.staging
python -m aiecs
# Production
export ENV_FILE=.env.production
python -m aiecs
Troubleshooting
Issue: PostgreSQL connection fails
Solution: Check your connection parameters:
from aiecs.config import get_settings
settings = get_settings()
print(settings.kg_database_config)
# Verify host, port, user, password, database are correct
Issue: SQLite file not found
Solution: The parent directory is automatically created, but ensure the path is writable:
mkdir -p ./storage
chmod 755 ./storage
Issue: Vector search returns no results
Solution: Check vector dimensions match your embeddings:
# If using OpenAI ada-002
KG_VECTOR_DIMENSION=1536
# If using different model, adjust accordingly
Issue: Queries are slow
Solution: Optimize configuration:
# Reduce traversal depth
KG_MAX_TRAVERSAL_DEPTH=3
# Enable caching
KG_ENABLE_QUERY_CACHE=true
# For PostgreSQL: enable pgvector
KG_ENABLE_PGVECTOR=true
Best Practices
Use PostgreSQL for production: Scalable, concurrent, reliable
Use SQLite for development: Simple, portable, fast iteration
Use in-memory for testing: Fast, isolated, reproducible
Enable caching in production: Improves performance
Match vector dimensions to your embedding model: Prevents dimension mismatches
Set reasonable traversal depth: Balance comprehensiveness vs. performance
Use separate database for KG in production: Better isolation and resource management
Monitor connection pool usage: Adjust min/max based on workload
Enable pgvector for large-scale vector search: Significantly faster than brute-force
Migration
When changing backends, use the migration tools:
from aiecs.infrastructure.graph_storage.migration import migrate_sqlite_to_postgres
# Migrate from SQLite to PostgreSQL
await migrate_sqlite_to_postgres(
sqlite_path="./dev_kg.db",
postgres_config=None, # Uses config from settings
batch_size=1000,
show_progress=True
)
See the Knowledge Graph README for more details.