Production Deployment Guide for Knowledge Graph

Complete guide for deploying AIECS Knowledge Graph in production environments.


Table of Contents

  1. Prerequisites

  2. Backend Selection

  3. PostgreSQL Setup

  4. Configuration

  5. Performance Optimization

  6. Monitoring and Health Checks

  7. Security

  8. Backup and Recovery

  9. Scaling

  10. Troubleshooting


Prerequisites

System Requirements

Minimum:

  • CPU: 2 cores

  • RAM: 4GB

  • Disk: 20GB SSD

  • Network: 100Mbps

Recommended (for 1M+ entities):

  • CPU: 4+ cores

  • RAM: 16GB+

  • Disk: 100GB+ SSD

  • Network: 1Gbps

Software Requirements

  • Python 3.9+

  • PostgreSQL 12+ (for PostgreSQL backend)

  • Redis 6+ (optional, for caching)

  • Docker (optional, for containerized deployment)


Backend Selection

When to Use Each Backend

InMemoryGraphStore:

  • NOT for production

  • ✅ Development and testing only

  • ✅ Small datasets (< 10K entities)

  • ✅ Temporary data

SQLiteGraphStore:

  • ✅ Small to medium applications (< 1M entities)

  • ✅ Single-user or low-concurrency

  • ✅ Embedded systems

  • ✅ Simple deployment

  • ❌ Not for high-concurrency production

PostgresGraphStore:

  • Recommended for production

  • ✅ Large graphs (1M+ entities)

  • ✅ High concurrency

  • ✅ ACID transactions

  • ✅ Advanced features (JSONB, pgvector)


PostgreSQL Setup

1. Install PostgreSQL

Ubuntu/Debian:

sudo apt update
sudo apt install postgresql-14 postgresql-contrib-14

CentOS/RHEL:

sudo yum install postgresql14-server postgresql14

macOS:

brew install postgresql@14

2. Configure PostgreSQL

Create database:

sudo -u postgres psql
CREATE DATABASE aiecs_knowledge_graph;
CREATE USER graph_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE aiecs_knowledge_graph TO graph_user;
\q

Enable pgvector extension (optional, for vector search):

sudo -u postgres psql -d aiecs_knowledge_graph
CREATE EXTENSION IF NOT EXISTS vector;
\q

3. Configure Connection Pooling

PostgreSQL configuration (postgresql.conf):

# Connection settings
max_connections = 200
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20MB
min_wal_size = 1GB
max_wal_size = 4GB

Restart PostgreSQL:

sudo systemctl restart postgresql

4. Configure Firewall

# Allow PostgreSQL connections
sudo ufw allow 5432/tcp

# Or restrict to specific IPs
sudo ufw allow from 10.0.0.0/8 to any port 5432

Configuration

Environment Variables

Create .env file:

For local PostgreSQL:

# PostgreSQL Configuration (Local Mode)
DB_CONNECTION_MODE=local
DB_HOST=localhost
DB_PORT=5432
DB_USER=graph_user
DB_PASSWORD=secure_password
DB_NAME=aiecs_knowledge_graph

For cloud PostgreSQL:

# PostgreSQL Configuration (Cloud Mode)
DB_CONNECTION_MODE=cloud
POSTGRES_URL=postgresql://user:password@host:port/database?sslmode=require

Common settings:

# Redis Configuration (optional)
REDIS_URL=redis://localhost:6379/0
REDIS_TTL=300

# Graph Store Configuration
GRAPH_STORE_BACKEND=postgres
GRAPH_STORE_POOL_MIN=5
GRAPH_STORE_POOL_MAX=20
GRAPH_STORE_ENABLE_CACHE=true
GRAPH_STORE_ENABLE_MONITORING=true

Application Configuration

from aiecs.infrastructure.graph_storage import PostgresGraphStore
from aiecs.infrastructure.graph_storage.cache import GraphStoreCache, GraphStoreCacheConfig
from aiecs.infrastructure.graph_storage.performance_monitoring import PerformanceMonitor

# Initialize store
store = PostgresGraphStore(
    host=os.getenv('DB_HOST'),
    port=int(os.getenv('DB_PORT', '5432')),
    user=os.getenv('DB_USER'),
    password=os.getenv('DB_PASSWORD'),
    database=os.getenv('DB_NAME'),
    min_pool_size=5,
    max_pool_size=20
)
await store.initialize()

# Enable caching
cache = GraphStoreCache(GraphStoreCacheConfig(
    redis_url=os.getenv('REDIS_URL'),
    ttl=int(os.getenv('REDIS_TTL', '300'))
))
await cache.initialize()

# Enable monitoring
monitor = PerformanceMonitor(
    enabled=True,
    slow_query_threshold_ms=100.0
)
await monitor.initialize()

Performance Optimization

1. Enable Caching

from aiecs.infrastructure.graph_storage.cache import GraphStoreCache, GraphStoreCacheConfig

cache = GraphStoreCache(GraphStoreCacheConfig(
    enabled=True,
    redis_url="redis://localhost:6379/0",
    ttl=300,  # 5 minutes
    max_cache_size_mb=1000  # 1GB in-memory fallback
))
await cache.initialize()

2. Optimize Indexes

from aiecs.infrastructure.graph_storage.index_optimization import IndexOptimizer

optimizer = IndexOptimizer(store.pool)

# Get recommendations
recommendations = await optimizer.get_missing_index_recommendations()

# Apply high-priority recommendations
high_priority = [r for r in recommendations if r.estimated_benefit == "high"]
results = await optimizer.apply_recommendations(high_priority)

3. Use Batch Operations

from aiecs.infrastructure.graph_storage.batch_operations import BatchOperationsMixin

class OptimizedStore(PostgresGraphStore, BatchOperationsMixin):
    pass

store = OptimizedStore(...)
await store.batch_add_entities(entities, batch_size=1000, use_copy=True)

4. Configure Connection Pooling

store = PostgresGraphStore(
    ...,
    min_pool_size=10,  # Minimum connections
    max_pool_size=50   # Maximum connections
)

Monitoring and Health Checks

1. Health Checks

from aiecs.infrastructure.graph_storage.health_checks import HealthChecker, HealthMonitor

checker = HealthChecker(store, timeout_seconds=5.0)
result = await checker.check_health()

if result.is_healthy():
    print("Store is healthy")
else:
    print(f"Store is {result.status}: {result.message}")

# Continuous monitoring
monitor = HealthMonitor(checker, interval_seconds=30)
await monitor.start()

2. Metrics Collection

from aiecs.infrastructure.graph_storage.metrics import MetricsCollector, MetricsExporter

collector = MetricsCollector()
collector.record_latency("get_entity", 12.5)
collector.record_cache_hit()

# Export to Prometheus
exporter = MetricsExporter(collector)
prometheus_metrics = exporter.to_prometheus()

3. Performance Monitoring

from aiecs.infrastructure.graph_storage.performance_monitoring import PerformanceMonitor

monitor = PerformanceMonitor(
    enabled=True,
    slow_query_threshold_ms=100.0,
    log_slow_queries=True
)

# Track queries
async with monitor.track_query("get_entity", query):
    result = await conn.fetch(query)

# Get report
report = monitor.get_performance_report()

Security

1. Database Security

Use strong passwords:

# Generate secure password
openssl rand -base64 32

Limit database access:

-- Revoke public access
REVOKE ALL ON DATABASE aiecs_knowledge_graph FROM PUBLIC;

-- Grant only to application user
GRANT CONNECT ON DATABASE aiecs_knowledge_graph TO graph_user;

Enable SSL:

store = PostgresGraphStore(
    ...,
    ssl=True,
    sslmode='require'
)

2. Connection Security

Use connection pooling:

  • Prevents connection exhaustion

  • Reduces authentication overhead

  • Improves security

Rotate credentials:

  • Change passwords regularly

  • Use secrets management (Vault, AWS Secrets Manager)

3. Input Validation

Validate entity IDs:

import re

def validate_entity_id(entity_id: str) -> bool:
    # Only allow alphanumeric, underscore, hyphen
    return bool(re.match(r'^[a-zA-Z0-9_-]+$', entity_id))

Sanitize queries:

  • Use parameterized queries (already implemented)


Backup and Recovery

1. PostgreSQL Backup

Automated backup script:

#!/bin/bash
# backup_graph.sh

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/graph"
DB_NAME="aiecs_knowledge_graph"

# Create backup
pg_dump -U graph_user -F c -f "$BACKUP_DIR/graph_$DATE.dump" $DB_NAME

# Compress
gzip "$BACKUP_DIR/graph_$DATE.dump"

# Keep only last 7 days
find $BACKUP_DIR -name "*.dump.gz" -mtime +7 -delete

Schedule with cron:

# Daily backup at 2 AM
0 2 * * * /path/to/backup_graph.sh

2. Restore from Backup

# Restore
gunzip graph_20231109_020000.dump.gz
pg_restore -U graph_user -d aiecs_knowledge_graph graph_20231109_020000.dump

3. Streaming Backup

from aiecs.infrastructure.graph_storage.streaming import GraphStreamExporter

exporter = GraphStreamExporter(store)
await exporter.export_to_file(
    "backup.jsonl.gz",
    format=StreamFormat.JSONL,
    compress=True
)

Scaling

Vertical Scaling

Increase resources:

  • More RAM: Better caching

  • More CPU: Faster queries

  • SSD storage: Faster I/O

Horizontal Scaling

Read Replicas:

# Primary for writes
primary_store = PostgresGraphStore(host="primary.db", ...)

# Replicas for reads
replica_stores = [
    PostgresGraphStore(host="replica1.db", ...),
    PostgresGraphStore(host="replica2.db", ...)
]

# Route reads to replicas
def get_read_store():
    return random.choice(replica_stores)

Connection Pooling:

  • Increase max_pool_size for more concurrent connections

  • Monitor pool usage


Troubleshooting

Common Issues

Connection Errors:

# Check connection
from aiecs.infrastructure.graph_storage.health_checks import HealthChecker

checker = HealthChecker(store)
result = await checker.check_health()
print(result.to_dict())

Slow Queries:

# Analyze query plans
from aiecs.infrastructure.graph_storage.performance_monitoring import PerformanceMonitor

monitor = PerformanceMonitor()
plan = await monitor.analyze_query_plan(conn, query)
print(plan.get_warnings())

Memory Issues:

  • Use streaming for large exports

  • Enable pagination

  • Use lazy loading


Production Checklist

  • PostgreSQL installed and configured

  • Database and user created

  • Indexes optimized

  • Caching enabled (Redis)

  • Health checks configured

  • Monitoring metrics exported

  • Backups scheduled

  • SSL enabled for connections

  • Firewall configured

  • Connection pooling tuned

  • Error handling enabled

  • Logging configured

  • Graceful degradation tested


Next Steps

  • See PERFORMANCE_GUIDE.md for performance tuning

  • See CUSTOM_BACKEND_GUIDE.md for custom backends

  • See PHASE6_TASK6.5_COMPLETE.md for complete implementation details