Knowledge Graph Troubleshooting Guide
Common Issues and Solutions
Import Issues
Problem: CSV Import Fails with “Missing Column” Error
Symptoms:
Error: Column 'name' not found in CSV file
Solution:
Check that column names in schema mapping match CSV headers exactly
Verify CSV file has a header row
Check for extra spaces in column names
Example:
# ❌ Wrong - column name doesn't match
property_mappings={"name": "full_name"} # CSV has "fullname" not "full_name"
# ✅ Correct
property_mappings={"name": "fullname"}
Problem: Import is Very Slow
Symptoms:
Import takes >10 seconds for 1000 rows
Throughput <50 rows/second
Solutions:
Increase batch size:
pipeline = StructuredDataPipeline(
mapping=schema_mapping,
graph_store=store,
batch_size=500 # Increase from default 50
)
Use PostgreSQL for large datasets:
# Switch from InMemory to PostgreSQL
from aiecs.infrastructure.graph_storage.postgresql import PostgreSQLGraphStore
store = PostgreSQLGraphStore(connection_string="postgresql://...")
Enable skip_errors for faster processing:
pipeline = StructuredDataPipeline(
mapping=schema_mapping,
graph_store=store,
skip_errors=True # Skip malformed rows
)
Problem: JSON Import Fails with “Invalid JSON”
Symptoms:
Error: Expecting value: line 1 column 1 (char 0)
Solutions:
Validate JSON format:
python -m json.tool data.json
Check for:
Missing commas between objects
Trailing commas
Single quotes instead of double quotes
Unescaped special characters
Use newline-delimited JSON for large files:
{"id": "1", "name": "Alice"}
{"id": "2", "name": "Bob"}
Search and Reranking Issues
Problem: Search Returns No Results
Symptoms:
Query returns empty list
Expected entities not found
Solutions:
Check entity properties match query:
# Verify entities have searchable text
entity = await store.get_entity("e1")
print(entity.properties) # Should have text fields
Try different search modes:
# Try vector search
result = await tool.run(mode="vector", query="...")
# Try graph search
result = await tool.run(mode="graph", seed_entity_ids=["e1"])
# Try hybrid
result = await tool.run(mode="hybrid", query="...")
Check embeddings are present:
entity = await store.get_entity("e1")
print(entity.embedding) # Should not be None
Problem: Reranking is Too Slow
Symptoms:
Search takes >1 second
Latency >500ms
Solutions:
Use faster reranking strategy:
# ❌ Slow - hybrid reranking
rerank_strategy="hybrid" # 150-300ms
# ✅ Fast - text reranking
rerank_strategy="text" # 50-100ms
Reduce top_k:
# ❌ Slow - reranking 200 results
top_k=200
# ✅ Fast - reranking 20 results
top_k=20
Disable reranking for simple queries:
result = await tool.run(
query="...",
enable_reranking=False # Skip reranking
)
Knowledge Fusion Issues
Problem: Too Many Entities Being Merged
Symptoms:
Unrelated entities are merged
Merge count is unexpectedly high
Solutions:
Increase similarity threshold:
# ❌ Too lenient - merges too many
fusion = KnowledgeFusion(store, similarity_threshold=0.70)
# ✅ More strict - fewer merges
fusion = KnowledgeFusion(store, similarity_threshold=0.90)
Filter by entity type:
# Only merge specific types
stats = await fusion.fuse_cross_document_entities(
entity_types=["Person"] # Don't merge other types
)
Review merge results:
# Check what was merged
provenance = await fusion.track_entity_provenance("e1")
print(f"Entity came from: {provenance}")
Problem: Fusion is Too Slow
Symptoms:
Fusion takes >30 seconds for 200 entities
Throughput <10 entities/second
Solutions:
Increase similarity threshold (fewer comparisons):
fusion = KnowledgeFusion(store, similarity_threshold=0.90)
Run fusion periodically, not on every update:
# ❌ Slow - fusion after every import
await pipeline.import_from_csv("data.csv")
await fusion.fuse_cross_document_entities()
# ✅ Fast - fusion once at the end
await pipeline.import_from_csv("data1.csv")
await pipeline.import_from_csv("data2.csv")
await pipeline.import_from_csv("data3.csv")
await fusion.fuse_cross_document_entities() # Once
Use faster conflict resolution:
# ❌ Slower
conflict_resolution_strategy="most_confident"
# ✅ Faster
conflict_resolution_strategy="most_complete"
Performance Issues
Problem: High Memory Usage
Symptoms:
Application using >2GB RAM
Out of memory errors
Solutions:
Switch to SQLite or PostgreSQL:
# ❌ High memory - InMemory
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore
store = InMemoryGraphStore()
# ✅ Low memory - SQLite
from aiecs.infrastructure.graph_storage.sqlite import SQLiteGraphStore
store = SQLiteGraphStore(db_path="graph.db")
Reduce cache sizes:
# Reduce schema cache
schema_manager = SchemaManager(
cache_size=100, # Reduce from 1000
ttl_seconds=300
)
Process data in batches:
# Process large files in chunks
for chunk in pd.read_csv("large.csv", chunksize=1000):
await pipeline.import_from_dataframe(chunk)
Problem: Slow Query Performance
Symptoms:
Queries take >500ms
Search is slow
Solutions:
Enable query optimization:
# Enable in configuration
KG_ENABLE_QUERY_OPTIMIZATION=true
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
Enable schema caching:
KG_ENABLE_SCHEMA_CACHE=true
KG_SCHEMA_CACHE_TTL_SECONDS=3600
Use PostgreSQL with pgvector:
KG_STORAGE_BACKEND=postgresql
KG_ENABLE_PGVECTOR=true
Add indexes (PostgreSQL):
CREATE INDEX idx_entity_type ON entities(entity_type);
CREATE INDEX idx_relation_type ON relations(relation_type);
Configuration Issues
Problem: Configuration Not Loading
Symptoms:
Settings not applied
Using default values
Solutions:
Check .env file location:
# Should be in project root
ls -la .env
Verify environment variables:
# Check if variables are set
env | grep KG_
Use explicit configuration:
from aiecs.config import Settings
settings = Settings(
kg_storage_backend="postgresql",
kg_enable_reranking=True
)
Tool Issues
Problem: Tool Returns “Unsupported Operation” Error
Symptoms:
Error: Unsupported operation: kg_builder
Solutions:
Use correct operation name:
# ❌ Wrong operation name
await tool.run(op="kg_builder", ...)
# ✅ Correct - use tool's registered operations
await tool.run(op="build_from_text", ...)
Check available operations:
print(tool.input_schema()) # Shows available operations
Getting Help
If you’re still experiencing issues:
Check the API Reference
Review Configuration Guide
Open an issue on GitHub with:
Error message
Minimal reproduction code
Environment details (Python version, OS)
Configuration settings
Performance Benchmarks
Expected performance for reference:
CSV Import: 100-300 rows/second
JSON Import: 100-250 records/second
Text Reranking: 50-100ms
Hybrid Reranking: 150-300ms
Schema Cache Hit: <1ms
Query Optimization: 40-70% improvement
Knowledge Fusion: 10-40 entities/second
If your performance is significantly worse, review the solutions above.