Knowledge Graph Configuration Guide

This guide explains how to configure the knowledge graph capabilities in AIECS.

Table of Contents

  1. Quick Start

  2. Storage Backends

  3. Environment Variables

  4. Configuration Properties

  5. Backend-Specific Configuration

  6. Query Configuration

  7. Cache Configuration

  8. Validation

  9. Examples

Quick Start

Minimal Configuration (In-Memory)

No configuration needed! The default settings use in-memory storage:

from aiecs.config import get_settings

settings = get_settings()
# Uses inmemory backend by default

Development Configuration (SQLite)

Add to your .env file:

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

Production Configuration (PostgreSQL)

Add to your .env file:

KG_STORAGE_BACKEND=postgresql
# Use main database (default)
# OR use a separate database:
KG_DB_HOST=localhost
KG_DB_PORT=5432
KG_DB_USER=kg_user
KG_DB_PASSWORD=your_password
KG_DB_NAME=aiecs_knowledge_graph

Storage Backends

AIECS supports three storage backends for knowledge graphs:

1. In-Memory (Default)

  • Use Case: Development, testing, small graphs

  • Pros: Fast, no setup required

  • Cons: Data lost on restart, limited by RAM

  • Max Nodes: 100,000 (configurable)

KG_STORAGE_BACKEND=inmemory

2. SQLite

  • Use Case: Development, embedded applications, file-based persistence

  • Pros: Simple, portable, ACID transactions

  • Cons: Single-writer, limited concurrency

  • Best For: Single-user applications, up to ~1M nodes

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

Environment Variables

Core Configuration

Variable

Type

Default

Description

KG_STORAGE_BACKEND

string

inmemory

Storage backend: inmemory, sqlite, or postgresql

SQLite Configuration

Variable

Type

Default

Description

KG_SQLITE_DB_PATH

string

./storage/knowledge_graph.db

Path to SQLite database file

PostgreSQL Configuration

Variable

Type

Default

Description

KG_POSTGRES_URL

string

""

PostgreSQL connection string (DSN)

KG_DB_HOST

string

""

Database host (falls back to main DB_HOST)

KG_DB_PORT

int

5432

Database port

KG_DB_USER

string

""

Database user (falls back to main DB_USER)

KG_DB_PASSWORD

string

""

Database password (falls back to main DB_PASSWORD)

KG_DB_NAME

string

""

Database name (default: aiecs_knowledge_graph)

KG_MIN_POOL_SIZE

int

5

Minimum connection pool size

KG_MAX_POOL_SIZE

int

20

Maximum connection pool size

KG_ENABLE_PGVECTOR

bool

false

Enable pgvector extension for optimized vector search

In-Memory Configuration

Variable

Type

Default

Description

KG_INMEMORY_MAX_NODES

int

100000

Maximum number of nodes for in-memory storage

Query Configuration

Variable

Type

Default

Description

KG_DEFAULT_SEARCH_LIMIT

int

10

Default number of results to return in searches

KG_MAX_TRAVERSAL_DEPTH

int

5

Maximum depth for graph traversal queries (1-10)

KG_VECTOR_DIMENSION

int

1536

Dimension of embedding vectors (OpenAI ada-002 default)

Cache Configuration

Variable

Type

Default

Description

KG_ENABLE_QUERY_CACHE

bool

true

Enable caching of query results

KG_CACHE_TTL_SECONDS

int

300

Time-to-live for cached query results (seconds)

Configuration Properties

Access configuration programmatically:

from aiecs.config import get_settings

settings = get_settings()

# Get database configuration for current backend
db_config = settings.kg_database_config
# Returns different config based on backend:
# - PostgreSQL: {"host": ..., "port": ..., "user": ..., etc.}
# - SQLite: {"db_path": ...}
# - In-memory: {"max_nodes": ...}

# Get query configuration
query_config = settings.kg_query_config
# Returns: {
#   "default_search_limit": 10,
#   "max_traversal_depth": 5,
#   "vector_dimension": 1536
# }

# Get cache configuration
cache_config = settings.kg_cache_config
# Returns: {
#   "enable_query_cache": True,
#   "cache_ttl_seconds": 300
# }

Backend-Specific Configuration

PostgreSQL: Using Main Database

By default, if you don’t set KG-specific database parameters, the knowledge graph uses your main AIECS database:

# Main database config
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs

# Knowledge graph uses main DB
KG_STORAGE_BACKEND=postgresql

The knowledge graph creates its own tables (graph_entities, graph_relations) within the main database.

PostgreSQL: Separate Database

For better isolation, use a separate database:

# Main database config
DB_HOST=localhost
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=aiecs

# Separate knowledge graph database
KG_STORAGE_BACKEND=postgresql
KG_DB_HOST=localhost
KG_DB_USER=kg_user
KG_DB_PASSWORD=kg_password
KG_DB_NAME=aiecs_knowledge_graph

PostgreSQL: Cloud Database (Connection String)

For cloud databases (e.g., Google Cloud SQL, AWS RDS):

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:password@host:5432/dbname?sslmode=require

PostgreSQL: Connection Pooling

Optimize for your workload:

# For high-concurrency applications
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50

# For low-concurrency applications
KG_MIN_POOL_SIZE=2
KG_MAX_POOL_SIZE=10

PostgreSQL: pgvector Extension

Enable optimized vector search (requires pgvector installed):

KG_ENABLE_PGVECTOR=true

Prerequisites:

  1. Install pgvector extension in your PostgreSQL database

  2. The extension will be automatically used for vector similarity search

SQLite: Memory vs. File

# File-based persistence (recommended)
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db

# In-memory SQLite (no persistence)
KG_SQLITE_DB_PATH=:memory:

Query Configuration

Search Limits

Control the number of results returned:

# Return more results (e.g., for comprehensive search)
KG_DEFAULT_SEARCH_LIMIT=50

# Return fewer results (e.g., for quick queries)
KG_DEFAULT_SEARCH_LIMIT=5

Traversal Depth

Control how deep graph traversals can go:

# Shallow traversals (faster, less comprehensive)
KG_MAX_TRAVERSAL_DEPTH=3

# Deep traversals (slower, more comprehensive)
KG_MAX_TRAVERSAL_DEPTH=7

Warning: Values > 10 may cause performance issues.

Vector Dimensions

Match your embedding model:

# OpenAI ada-002 (default)
KG_VECTOR_DIMENSION=1536

# OpenAI text-embedding-3-small
KG_VECTOR_DIMENSION=1536

# OpenAI text-embedding-3-large
KG_VECTOR_DIMENSION=3072

# Sentence Transformers (various)
KG_VECTOR_DIMENSION=384  # all-MiniLM-L6-v2
KG_VECTOR_DIMENSION=768  # all-mpnet-base-v2

Cache Configuration

Enable/Disable Caching

# Enable caching (recommended for production)
KG_ENABLE_QUERY_CACHE=true

# Disable caching (for development/debugging)
KG_ENABLE_QUERY_CACHE=false

Cache TTL

Control how long cached results remain valid:

# Short TTL (frequently changing data)
KG_CACHE_TTL_SECONDS=60

# Long TTL (stable data)
KG_CACHE_TTL_SECONDS=3600

Validation

Automatic Validation

Configuration is automatically validated when settings are loaded:

from aiecs.config import get_settings

try:
    settings = get_settings()
except ValueError as e:
    print(f"Configuration error: {e}")

Manual Validation

Validate configuration for specific operations:

from aiecs.config import validate_required_settings

# Validate knowledge graph configuration
try:
    validate_required_settings("knowledge_graph")
    print("Knowledge graph configuration is valid")
except ValueError as e:
    print(f"Missing configuration: {e}")

Validation Rules

  1. KG_STORAGE_BACKEND: Must be inmemory, sqlite, or postgresql

  2. KG_SQLITE_DB_PATH: Parent directory is automatically created

  3. KG_MAX_TRAVERSAL_DEPTH: Must be ≥ 1; warning if > 10

  4. KG_VECTOR_DIMENSION: Must be ≥ 1; warning if not a common dimension

  5. PostgreSQL: At least one of KG_POSTGRES_URL, KG_DB_HOST, or main DB_PASSWORD must be set

Examples

Example 1: Development Setup (SQLite)

.env:

# SQLite for development
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_knowledge_graph.db

# Disable caching for development
KG_ENABLE_QUERY_CACHE=false

# More verbose search
KG_DEFAULT_SEARCH_LIMIT=20

Example 2: Production Setup (PostgreSQL)

.env:

# PostgreSQL for production
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://kg_user:password@db.example.com:5432/aiecs_kg?sslmode=require

# Optimize connection pooling
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50

# Enable pgvector
KG_ENABLE_PGVECTOR=true

# Production query settings
KG_DEFAULT_SEARCH_LIMIT=10
KG_MAX_TRAVERSAL_DEPTH=5

# Enable caching
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=600

Example 3: Testing Setup (In-Memory)

.env.test:

# In-memory for fast tests
KG_STORAGE_BACKEND=inmemory
KG_INMEMORY_MAX_NODES=10000

# Disable caching for predictable tests
KG_ENABLE_QUERY_CACHE=false

Example 4: Programmatic Configuration

from aiecs.infrastructure.graph_storage import (
    InMemoryGraphStore,
    SQLiteGraphStore,
    PostgresGraphStore
)
from aiecs.config import get_settings

settings = get_settings()

# Create store based on backend configuration
if settings.kg_storage_backend == "inmemory":
    store = InMemoryGraphStore()
elif settings.kg_storage_backend == "sqlite":
    config = settings.kg_database_config
    store = SQLiteGraphStore(db_path=config["db_path"])
elif settings.kg_storage_backend == "postgresql":
    config = settings.kg_database_config
    store = PostgresGraphStore(**config)

await store.initialize()

# Use the store
# ...

await store.close()

Example 5: Multi-Environment Setup

Use different .env files for different environments:

.env.development:

KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./dev_kg.db

.env.staging:

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@staging-db:5432/aiecs_kg

.env.production:

KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_URL=postgresql://user:pass@prod-db:5432/aiecs_kg
KG_MIN_POOL_SIZE=20
KG_MAX_POOL_SIZE=100
KG_ENABLE_PGVECTOR=true

Load the appropriate file:

# Development
export ENV_FILE=.env.development
python -m aiecs

# Staging
export ENV_FILE=.env.staging
python -m aiecs

# Production
export ENV_FILE=.env.production
python -m aiecs

Troubleshooting

Issue: PostgreSQL connection fails

Solution: Check your connection parameters:

from aiecs.config import get_settings

settings = get_settings()
print(settings.kg_database_config)
# Verify host, port, user, password, database are correct

Issue: SQLite file not found

Solution: The parent directory is automatically created, but ensure the path is writable:

mkdir -p ./storage
chmod 755 ./storage

Issue: Vector search returns no results

Solution: Check vector dimensions match your embeddings:

# If using OpenAI ada-002
KG_VECTOR_DIMENSION=1536

# If using different model, adjust accordingly

Issue: Queries are slow

Solution: Optimize configuration:

# Reduce traversal depth
KG_MAX_TRAVERSAL_DEPTH=3

# Enable caching
KG_ENABLE_QUERY_CACHE=true

# For PostgreSQL: enable pgvector
KG_ENABLE_PGVECTOR=true

Best Practices

  1. Use PostgreSQL for production: Scalable, concurrent, reliable

  2. Use SQLite for development: Simple, portable, fast iteration

  3. Use in-memory for testing: Fast, isolated, reproducible

  4. Enable caching in production: Improves performance

  5. Match vector dimensions to your embedding model: Prevents dimension mismatches

  6. Set reasonable traversal depth: Balance comprehensiveness vs. performance

  7. Use separate database for KG in production: Better isolation and resource management

  8. Monitor connection pool usage: Adjust min/max based on workload

  9. Enable pgvector for large-scale vector search: Significantly faster than brute-force

Migration

When changing backends, use the migration tools:

from aiecs.infrastructure.graph_storage.migration import migrate_sqlite_to_postgres

# Migrate from SQLite to PostgreSQL
await migrate_sqlite_to_postgres(
    sqlite_path="./dev_kg.db",
    postgres_config=None,  # Uses config from settings
    batch_size=1000,
    show_progress=True
)

See the Knowledge Graph README for more details.

See Also