Knowledge Graph Configuration Guide

This guide explains how to configure the knowledge graph capabilities in AIECS.

Table of Contents

Quick Start
Storage Backends
Environment Variables
Configuration Properties
Backend-Specific Configuration
Query Configuration
Cache Configuration
Validation
Examples

Quick Start

Minimal Configuration (In-Memory)

No configuration needed! The default settings use in-memory storage:

from aiecs.config import get_settings

settings = get_settings()
# Uses inmemory backend by default

Development Configuration (SQLite)

Add to your .env file:

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

Production Configuration (PostgreSQL)

Add to your .env file:

KG_STORAGE_BACKEND=postgresql
# Use main database (default)
# OR use a separate database:
KG_DB_HOST=localhost
KG_DB_PORT=5432
KG_DB_USER=kg_user
KG_DB_PASSWORD=your_password
KG_DB_NAME=aiecs_knowledge_graph

Storage Backends

AIECS supports three storage backends for knowledge graphs:

1. In-Memory (Default)

Use Case: Development, testing, small graphs
Pros: Fast, no setup required
Cons: Data lost on restart, limited by RAM
Max Nodes: 100,000 (configurable)

KG_STORAGE_BACKEND=inmemory

2. SQLite

Use Case: Development, embedded applications, file-based persistence
Pros: Simple, portable, ACID transactions
Cons: Single-writer, limited concurrency
Best For: Single-user applications, up to ~1M nodes

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

3. PostgreSQL (Recommended for Production)

Use Case: Production, multi-user, large-scale graphs
Pros: Scalable, concurrent, ACID transactions, connection pooling
Cons: Requires database setup
Best For: Production applications, millions of nodes

KG_STORAGE_BACKEND=postgresql

Environment Variables

Core Configuration

Variable	Type	Default	Description
`KG_STORAGE_BACKEND`	string	`inmemory`	Storage backend: `inmemory`, `sqlite`, or `postgresql`

SQLite Configuration

Variable	Type	Default	Description
`KG_SQLITE_DB_PATH`	string	`./storage/knowledge_graph.db`	Path to SQLite database file

PostgreSQL Configuration

Variable	Type	Default	Description
`KG_POSTGRES_URL`	string	`""`	PostgreSQL connection string (DSN)
`KG_DB_HOST`	string	`""`	Database host (falls back to main `DB_HOST`)
`KG_DB_PORT`	int	`5432`	Database port
`KG_DB_USER`	string	`""`	Database user (falls back to main `DB_USER`)
`KG_DB_PASSWORD`	string	`""`	Database password (falls back to main `DB_PASSWORD`)
`KG_DB_NAME`	string	`""`	Database name (default: `aiecs_knowledge_graph`)
`KG_MIN_POOL_SIZE`	int	`5`	Minimum connection pool size
`KG_MAX_POOL_SIZE`	int	`20`	Maximum connection pool size
`KG_ENABLE_PGVECTOR`	bool	`false`	Enable pgvector extension for optimized vector search

In-Memory Configuration

Variable	Type	Default	Description
`KG_INMEMORY_MAX_NODES`	int	`100000`	Maximum number of nodes for in-memory storage

Query Configuration

Variable	Type	Default	Description
`KG_DEFAULT_SEARCH_LIMIT`	int	`10`	Default number of results to return in searches
`KG_MAX_TRAVERSAL_DEPTH`	int	`5`	Maximum depth for graph traversal queries (1-10)
`KG_VECTOR_DIMENSION`	int	`1536`	Dimension of embedding vectors (OpenAI ada-002 default)

Cache Configuration

Variable	Type	Default	Description
`KG_ENABLE_QUERY_CACHE`	bool	`true`	Enable caching of query results
`KG_CACHE_TTL_SECONDS`	int	`300`	Time-to-live for cached query results (seconds)

Configuration Properties

Access configuration programmatically:

from aiecs.config import get_settings

settings = get_settings()

# Get database configuration for current backend
db_config = settings.kg_database_config
# Returns different config based on backend:
# - PostgreSQL: {"host": ..., "port": ..., "user": ..., etc.}
# - SQLite: {"db_path": ...}
# - In-memory: {"max_nodes": ...}

# Get query configuration
query_config = settings.kg_query_config
# Returns: {
#   "default_search_limit": 10,
#   "max_traversal_depth": 5,
#   "vector_dimension": 1536
# }

# Get cache configuration
cache_config = settings.kg_cache_config
# Returns: {
#   "enable_query_cache": True,
#   "cache_ttl_seconds": 300
# }

Backend-Specific Configuration

PostgreSQL: Using Main Database

By default, if you don’t set KG-specific database parameters, the knowledge graph uses your main AIECS database:

# Main database config
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs

# Knowledge graph uses main DB
KG_STORAGE_BACKEND=postgresql

The knowledge graph creates its own tables (graph_entities, graph_relations) within the main database.

PostgreSQL: Separate Database

For better isolation, use a separate database:

# Main database config
DB_HOST=localhost
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs

# Separate knowledge graph database
KG_STORAGE_BACKEND=postgresql
KG_DB_HOST=localhost
KG_DB_USER=kg_user
KG_DB_PASSWORD=kg_password
KG_DB_NAME=aiecs_knowledge_graph

PostgreSQL: Cloud Database (Connection String)

For cloud databases (e.g., Google Cloud SQL, AWS RDS):

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:password@host:5432/dbname?sslmode=require

PostgreSQL: Connection Pooling

Optimize for your workload:

# For high-concurrency applications
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50

# For low-concurrency applications
KG_MIN_POOL_SIZE=2
KG_MAX_POOL_SIZE=10

PostgreSQL: pgvector Extension

Enable optimized vector search (requires pgvector installed):

KG_ENABLE_PGVECTOR=true

Prerequisites:

Install pgvector extension in your PostgreSQL database
The extension will be automatically used for vector similarity search

SQLite: Memory vs. File

# File-based persistence (recommended)
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

# In-memory SQLite (no persistence)
KG_SQLITE_DB_PATH=:memory:

Query Configuration

Search Limits

Control the number of results returned:

# Return more results (e.g., for comprehensive search)
KG_DEFAULT_SEARCH_LIMIT=50

# Return fewer results (e.g., for quick queries)
KG_DEFAULT_SEARCH_LIMIT=5

Traversal Depth

Control how deep graph traversals can go:

# Shallow traversals (faster, less comprehensive)
KG_MAX_TRAVERSAL_DEPTH=3

# Deep traversals (slower, more comprehensive)
KG_MAX_TRAVERSAL_DEPTH=7

Warning: Values > 10 may cause performance issues.

Vector Dimensions

Match your embedding model:

# OpenAI ada-002 (default)
KG_VECTOR_DIMENSION=1536

# OpenAI text-embedding-3-small
KG_VECTOR_DIMENSION=1536

# OpenAI text-embedding-3-large
KG_VECTOR_DIMENSION=3072

# Sentence Transformers (various)
KG_VECTOR_DIMENSION=384  # all-MiniLM-L6-v2
KG_VECTOR_DIMENSION=768  # all-mpnet-base-v2

Cache Configuration

Enable/Disable Caching

# Enable caching (recommended for production)
KG_ENABLE_QUERY_CACHE=true

# Disable caching (for development/debugging)
KG_ENABLE_QUERY_CACHE=false

Cache TTL

Control how long cached results remain valid:

# Short TTL (frequently changing data)
KG_CACHE_TTL_SECONDS=60

# Long TTL (stable data)
KG_CACHE_TTL_SECONDS=3600

Validation

Automatic Validation

Configuration is automatically validated when settings are loaded:

from aiecs.config import get_settings

try:
    settings = get_settings()
except ValueError as e:
    print(f"Configuration error: {e}")

Manual Validation

Validate configuration for specific operations:

from aiecs.config import validate_required_settings

# Validate knowledge graph configuration
try:
    validate_required_settings("knowledge_graph")
    print("Knowledge graph configuration is valid")
except ValueError as e:
    print(f"Missing configuration: {e}")

Validation Rules

KG_STORAGE_BACKEND: Must be inmemory, sqlite, or postgresql
KG_SQLITE_DB_PATH: Parent directory is automatically created
KG_MAX_TRAVERSAL_DEPTH: Must be ≥ 1; warning if > 10
KG_VECTOR_DIMENSION: Must be ≥ 1; warning if not a common dimension
PostgreSQL: At least one of KG_POSTGRES_URL, KG_DB_HOST, or main DB_PASSWORD must be set

Examples

Example 1: Development Setup (SQLite)

.env:

# SQLite for development
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_knowledge_graph.db

# Disable caching for development
KG_ENABLE_QUERY_CACHE=false

# More verbose search
KG_DEFAULT_SEARCH_LIMIT=20

Example 2: Production Setup (PostgreSQL)

.env:

# PostgreSQL for production
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://kg_user:password@db.example.com:5432/aiecs_kg?sslmode=require

# Optimize connection pooling
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50

# Enable pgvector
KG_ENABLE_PGVECTOR=true

# Production query settings
KG_DEFAULT_SEARCH_LIMIT=10
KG_MAX_TRAVERSAL_DEPTH=5

# Enable caching
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=600

Example 3: Testing Setup (In-Memory)

.env.test:

# In-memory for fast tests
KG_STORAGE_BACKEND=inmemory
KG_INMEMORY_MAX_NODES=10000

# Disable caching for predictable tests
KG_ENABLE_QUERY_CACHE=false

Example 4: Programmatic Configuration

from aiecs.infrastructure.graph_storage import (
    InMemoryGraphStore,
    SQLiteGraphStore,
    PostgresGraphStore
)
from aiecs.config import get_settings

settings = get_settings()

# Create store based on backend configuration
if settings.kg_storage_backend == "inmemory":
    store = InMemoryGraphStore()
elif settings.kg_storage_backend == "sqlite":
    config = settings.kg_database_config
    store = SQLiteGraphStore(db_path=config["db_path"])
elif settings.kg_storage_backend == "postgresql":
    config = settings.kg_database_config
    store = PostgresGraphStore(**config)

await store.initialize()

# Use the store
# ...

await store.close()

Example 5: Multi-Environment Setup

Use different .env files for different environments:

.env.development:

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_kg.db

.env.staging:

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@staging-db:5432/aiecs_kg

.env.production:

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@prod-db:5432/aiecs_kg
KG_MIN_POOL_SIZE=20
KG_MAX_POOL_SIZE=100
KG_ENABLE_PGVECTOR=true

Load the appropriate file:

# Development
export ENV_FILE=.env.development
python -m aiecs

# Staging
export ENV_FILE=.env.staging
python -m aiecs

# Production
export ENV_FILE=.env.production
python -m aiecs

Troubleshooting

Issue: PostgreSQL connection fails

Solution: Check your connection parameters:

from aiecs.config import get_settings

settings = get_settings()
print(settings.kg_database_config)
# Verify host, port, user, password, database are correct

Issue: SQLite file not found

Solution: The parent directory is automatically created, but ensure the path is writable:

mkdir -p ./storage
chmod 755 ./storage

Issue: Vector search returns no results

Solution: Check vector dimensions match your embeddings:

# If using OpenAI ada-002
KG_VECTOR_DIMENSION=1536

# If using different model, adjust accordingly

Issue: Queries are slow

Solution: Optimize configuration:

# Reduce traversal depth
KG_MAX_TRAVERSAL_DEPTH=3

# Enable caching
KG_ENABLE_QUERY_CACHE=true

# For PostgreSQL: enable pgvector
KG_ENABLE_PGVECTOR=true

Best Practices

Use PostgreSQL for production: Scalable, concurrent, reliable
Use SQLite for development: Simple, portable, fast iteration
Use in-memory for testing: Fast, isolated, reproducible
Enable caching in production: Improves performance
Match vector dimensions to your embedding model: Prevents dimension mismatches
Set reasonable traversal depth: Balance comprehensiveness vs. performance
Use separate database for KG in production: Better isolation and resource management
Monitor connection pool usage: Adjust min/max based on workload
Enable pgvector for large-scale vector search: Significantly faster than brute-force

Migration

When changing backends, use the migration tools:

from aiecs.infrastructure.graph_storage.migration import migrate_sqlite_to_postgres

# Migrate from SQLite to PostgreSQL
await migrate_sqlite_to_postgres(
    sqlite_path="./dev_kg.db",
    postgres_config=None,  # Uses config from settings
    batch_size=1000,
    show_progress=True
)

See the Knowledge Graph README for more details.

Knowledge Graph Configuration Guide

Table of Contents

Quick Start

Minimal Configuration (In-Memory)

Development Configuration (SQLite)

Production Configuration (PostgreSQL)

Storage Backends

1. In-Memory (Default)

2. SQLite

3. PostgreSQL (Recommended for Production)

Environment Variables

Core Configuration

SQLite Configuration

PostgreSQL Configuration

In-Memory Configuration

Query Configuration

Cache Configuration

Configuration Properties

Backend-Specific Configuration

PostgreSQL: Using Main Database

PostgreSQL: Separate Database

PostgreSQL: Cloud Database (Connection String)

PostgreSQL: Connection Pooling

PostgreSQL: pgvector Extension

SQLite: Memory vs. File

Query Configuration

Search Limits

Traversal Depth

Vector Dimensions

Cache Configuration

Enable/Disable Caching

Cache TTL

Validation

Automatic Validation

Manual Validation

Validation Rules

Examples

Example 1: Development Setup (SQLite)

Example 2: Production Setup (PostgreSQL)

Example 3: Testing Setup (In-Memory)

Example 4: Programmatic Configuration

Example 5: Multi-Environment Setup

Troubleshooting

Issue: PostgreSQL connection fails

Issue: SQLite file not found

Issue: Vector search returns no results

Issue: Queries are slow

Best Practices

Migration

See Also