Knowledge Graph Configuration Guide
Overview
This guide covers all configuration options for the AIECS Knowledge Graph system, including storage backends, feature flags, and performance tuning.
Table of Contents
Storage Configuration
Backend Selection
Choose the appropriate storage backend based on your use case:
# In-memory (default) - Fast, no persistence
KG_STORAGE_BACKEND=inmemory
# SQLite - File-based persistence, single-user
KG_STORAGE_BACKEND=sqlite
# PostgreSQL - Production-ready, multi-user
KG_STORAGE_BACKEND=postgresql
In-Memory Configuration
# Maximum number of nodes (default: 100000)
KG_INMEMORY_MAX_NODES=100000
Use Cases:
Development and testing
Temporary graphs
Small to medium datasets (<100K nodes)
SQLite Configuration
# Database file path (default: ./storage/knowledge_graph.db)
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db
Use Cases:
Single-user applications
File-based persistence
Medium datasets (<1M nodes)
PostgreSQL Configuration
# PostgreSQL connection settings
KG_POSTGRES_HOST=localhost
KG_POSTGRES_PORT=5432
KG_POSTGRES_USER=postgres
KG_POSTGRES_PASSWORD=your_password
KG_POSTGRES_DATABASE=knowledge_graph
# Connection pool settings
KG_MIN_POOL_SIZE=5
KG_MAX_POOL_SIZE=20
# Enable pgvector for optimized vector search (requires pgvector extension)
KG_ENABLE_PGVECTOR=false
Use Cases:
Production deployments
Multi-user applications
Large datasets (>1M nodes)
High concurrency
Feature Flags
Control which features are enabled in your deployment:
Runnable Pattern
# Enable Runnable pattern for composable operations (default: true)
KG_ENABLE_RUNNABLE_PATTERN=true
Benefits:
Composable graph operations
Pipeline chaining
Async/sync compatibility
When to disable:
Legacy code compatibility
Simplified debugging
Knowledge Fusion
# Enable cross-document entity merging (default: true)
KG_ENABLE_KNOWLEDGE_FUSION=true
Benefits:
Merge duplicate entities across documents
Resolve property conflicts
Track provenance
When to disable:
Single-document graphs
No duplicate entities expected
Result Reranking
# Enable search result reranking (default: true)
KG_ENABLE_RERANKING=true
Benefits:
Improved search relevance
Multiple ranking signals
Better precision
When to disable:
Performance-critical applications
Simple search requirements
Logical Queries
# Enable logical query parsing (default: true)
KG_ENABLE_LOGICAL_QUERIES=true
Benefits:
Natural language to structured queries
Query validation
Execution planning
When to disable:
Simple query patterns only
No NLP requirements
Structured Data Import
# Enable CSV/JSON import (default: true)
KG_ENABLE_STRUCTURED_IMPORT=true
Benefits:
Import from CSV/JSON files
Schema mapping
Bulk data loading
When to disable:
Text-only extraction
No structured data sources
Knowledge Fusion Configuration
Similarity Threshold
# Similarity threshold for entity fusion (0.0-1.0, default: 0.85)
KG_FUSION_SIMILARITY_THRESHOLD=0.85
Guidelines:
0.95-1.0: Very strict, only near-identical entities
0.85-0.95: Balanced (recommended)
0.70-0.85: Lenient, more merges
<0.70: Very lenient, risk of false positives
Conflict Resolution Strategy
# Strategy for resolving property conflicts (default: most_complete)
KG_FUSION_CONFLICT_RESOLUTION=most_complete
Available Strategies:
most_complete: Prefer non-empty, longer values (default)
Best for: General use, data enrichment
most_recent: Prefer values from most recent timestamp
Best for: Time-sensitive data, news articles
most_confident: Prefer values from most confident sources
Best for: Weighted sources, quality-ranked data
longest: Prefer longest string values
Best for: Descriptions, detailed text
keep_all: Keep all conflicting values as a list
Best for: Preserving all information, manual review
Reranking Configuration
Default Strategy
# Default reranking strategy (default: hybrid)
KG_RERANKING_DEFAULT_STRATEGY=hybrid
Available Strategies:
text: BM25-based text similarity
Fast, keyword-focused
semantic: Deep semantic similarity
Slower, meaning-focused
structural: Graph importance signals
Graph-aware, centrality-based
hybrid: Combines all signals (recommended)
Best results, slightly slower
Top-K Configuration
# Number of results to fetch before reranking (default: 100)
KG_RERANKING_TOP_K=100
Guidelines:
Higher values: Better recall, slower
Lower values: Faster, may miss relevant results
Recommended: 2-10x your final result count
Cache Configuration
Query Cache
# Enable query result caching (default: true)
KG_ENABLE_QUERY_CACHE=true
# Cache TTL in seconds (default: 300 = 5 minutes)
KG_CACHE_TTL_SECONDS=300
Benefits:
Faster repeated queries
Reduced database load
Better performance
When to disable:
Real-time data requirements
Frequently changing graphs
Schema Cache
# Enable schema caching (default: true)
KG_ENABLE_SCHEMA_CACHE=true
# Schema cache TTL in seconds (default: 3600 = 1 hour)
KG_SCHEMA_CACHE_TTL_SECONDS=3600
Benefits:
Faster schema operations
Reduced metadata queries
Better type inference
When to disable:
Frequently changing schemas
Development/testing
Query Optimization
Enable Optimization
# Enable query optimization (default: true)
KG_ENABLE_QUERY_OPTIMIZATION=true
Benefits:
Faster query execution
Better resource utilization
Automatic query planning
Optimization Strategy
# Optimization strategy (default: balanced)
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
Available Strategies:
cost: Minimize computational cost
Best for: Resource-constrained environments
latency: Minimize query latency
Best for: Real-time applications
balanced: Balance cost and latency (recommended)
Best for: General use
Environment Variables
Complete Reference
# =====================================
# Storage Configuration
# =====================================
KG_STORAGE_BACKEND=inmemory
KG_SQLITE_DB_PATH=./storage/knowledge_graph.db
KG_POSTGRES_HOST=localhost
KG_POSTGRES_PORT=5432
KG_POSTGRES_USER=postgres
KG_POSTGRES_PASSWORD=your_password
KG_POSTGRES_DATABASE=knowledge_graph
KG_MIN_POOL_SIZE=5
KG_MAX_POOL_SIZE=20
KG_ENABLE_PGVECTOR=false
KG_INMEMORY_MAX_NODES=100000
# =====================================
# Vector and Query Configuration
# =====================================
KG_VECTOR_DIMENSION=1536
KG_DEFAULT_SEARCH_LIMIT=10
KG_MAX_TRAVERSAL_DEPTH=5
# =====================================
# Cache Configuration
# =====================================
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=300
KG_ENABLE_SCHEMA_CACHE=true
KG_SCHEMA_CACHE_TTL_SECONDS=3600
# =====================================
# Feature Flags
# =====================================
KG_ENABLE_RUNNABLE_PATTERN=true
KG_ENABLE_KNOWLEDGE_FUSION=true
KG_ENABLE_RERANKING=true
KG_ENABLE_LOGICAL_QUERIES=true
KG_ENABLE_STRUCTURED_IMPORT=true
# =====================================
# Knowledge Fusion Configuration
# =====================================
KG_FUSION_SIMILARITY_THRESHOLD=0.85
KG_FUSION_CONFLICT_RESOLUTION=most_complete
# =====================================
# Reranking Configuration
# =====================================
KG_RERANKING_DEFAULT_STRATEGY=hybrid
KG_RERANKING_TOP_K=100
# =====================================
# Query Optimization
# =====================================
KG_ENABLE_QUERY_OPTIMIZATION=true
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
Configuration Examples
Development Setup
Fast iteration with in-memory storage:
# .env.development
KG_STORAGE_BACKEND=inmemory
KG_INMEMORY_MAX_NODES=50000
KG_ENABLE_QUERY_CACHE=false
KG_ENABLE_SCHEMA_CACHE=false
KG_ENABLE_QUERY_OPTIMIZATION=false
Testing Setup
File-based persistence for reproducible tests:
# .env.test
KG_STORAGE_BACKEND=sqlite
KG_SQLITE_DB_PATH=./test_data/test_graph.db
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=60
KG_ENABLE_QUERY_OPTIMIZATION=true
Production Setup
PostgreSQL with all optimizations:
# .env.production
KG_STORAGE_BACKEND=postgresql
KG_POSTGRES_HOST=db.example.com
KG_POSTGRES_PORT=5432
KG_POSTGRES_USER=kg_user
KG_POSTGRES_PASSWORD=secure_password
KG_POSTGRES_DATABASE=knowledge_graph
KG_MIN_POOL_SIZE=10
KG_MAX_POOL_SIZE=50
KG_ENABLE_PGVECTOR=true
# Enable all features
KG_ENABLE_RUNNABLE_PATTERN=true
KG_ENABLE_KNOWLEDGE_FUSION=true
KG_ENABLE_RERANKING=true
KG_ENABLE_LOGICAL_QUERIES=true
KG_ENABLE_STRUCTURED_IMPORT=true
# Optimize for production
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=600
KG_ENABLE_SCHEMA_CACHE=true
KG_SCHEMA_CACHE_TTL_SECONDS=7200
KG_ENABLE_QUERY_OPTIMIZATION=true
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
# Reranking for best results
KG_RERANKING_DEFAULT_STRATEGY=hybrid
KG_RERANKING_TOP_K=200
# Fusion for data quality
KG_FUSION_SIMILARITY_THRESHOLD=0.85
KG_FUSION_CONFLICT_RESOLUTION=most_complete
High-Performance Setup
Optimized for speed:
# .env.performance
KG_STORAGE_BACKEND=postgresql
KG_ENABLE_PGVECTOR=true
KG_MAX_POOL_SIZE=100
# Aggressive caching
KG_ENABLE_QUERY_CACHE=true
KG_CACHE_TTL_SECONDS=1800
KG_ENABLE_SCHEMA_CACHE=true
KG_SCHEMA_CACHE_TTL_SECONDS=14400
# Latency optimization
KG_ENABLE_QUERY_OPTIMIZATION=true
KG_QUERY_OPTIMIZATION_STRATEGY=latency
# Disable expensive features
KG_ENABLE_RERANKING=false
KG_ENABLE_KNOWLEDGE_FUSION=false
Data Quality Setup
Optimized for accuracy:
# .env.quality
KG_STORAGE_BACKEND=postgresql
# Enable all quality features
KG_ENABLE_KNOWLEDGE_FUSION=true
KG_ENABLE_RERANKING=true
KG_ENABLE_LOGICAL_QUERIES=true
# Strict fusion
KG_FUSION_SIMILARITY_THRESHOLD=0.90
KG_FUSION_CONFLICT_RESOLUTION=most_confident
# Best reranking
KG_RERANKING_DEFAULT_STRATEGY=hybrid
KG_RERANKING_TOP_K=500
# Balanced optimization
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
Best Practices
1. Start Simple
Begin with default settings and adjust based on your needs:
# Minimal configuration
KG_STORAGE_BACKEND=inmemory
2. Monitor Performance
Track key metrics:
Query latency
Cache hit rate
Fusion merge rate
Reranking impact
3. Tune Gradually
Adjust one parameter at a time:
Choose storage backend
Enable/disable features
Tune cache settings
Optimize queries
4. Environment-Specific Configs
Use different configurations for different environments:
.env.development- Fast iteration.env.test- Reproducible tests.env.staging- Production-like.env.production- Optimized for scale
5. Security Considerations
Never commit
.envfiles to version controlUse strong passwords for PostgreSQL
Restrict database access
Enable SSL for production databases
Troubleshooting
Slow Queries
Problem: Queries are taking too long
Solutions:
Enable query optimization:
KG_ENABLE_QUERY_OPTIMIZATION=trueIncrease cache TTL:
KG_CACHE_TTL_SECONDS=600Use PostgreSQL with pgvector:
KG_ENABLE_PGVECTOR=trueReduce reranking top-K:
KG_RERANKING_TOP_K=50
High Memory Usage
Problem: Application using too much memory
Solutions:
Switch to SQLite or PostgreSQL:
KG_STORAGE_BACKEND=sqliteReduce in-memory max nodes:
KG_INMEMORY_MAX_NODES=50000Disable caching:
KG_ENABLE_QUERY_CACHE=falseReduce cache TTL:
KG_CACHE_TTL_SECONDS=60
Too Many Duplicate Entities
Problem: Fusion is merging too many entities
Solutions:
Increase similarity threshold:
KG_FUSION_SIMILARITY_THRESHOLD=0.90Change conflict resolution:
KG_FUSION_CONFLICT_RESOLUTION=keep_allReview entity extraction quality
Poor Search Results
Problem: Search results are not relevant
Solutions:
Enable reranking:
KG_ENABLE_RERANKING=trueUse hybrid strategy:
KG_RERANKING_DEFAULT_STRATEGY=hybridIncrease reranking top-K:
KG_RERANKING_TOP_K=200Adjust vector dimension:
KG_VECTOR_DIMENSION=1536