Knowledge Graph Troubleshooting Guide

Common Issues and Solutions

Import Issues

Problem: CSV Import Fails with “Missing Column” Error

Symptoms:

Error: Column 'name' not found in CSV file

Solution:

  1. Check that column names in schema mapping match CSV headers exactly

  2. Verify CSV file has a header row

  3. Check for extra spaces in column names

Example:

# ❌ Wrong - column name doesn't match
property_mappings={"name": "full_name"}  # CSV has "fullname" not "full_name"

# ✅ Correct
property_mappings={"name": "fullname"}

Problem: Import is Very Slow

Symptoms:

  • Import takes >10 seconds for 1000 rows

  • Throughput <50 rows/second

Solutions:

  1. Increase batch size:

pipeline = StructuredDataPipeline(
    mapping=schema_mapping,
    graph_store=store,
    batch_size=500  # Increase from default 50
)
  1. Use PostgreSQL for large datasets:

# Switch from InMemory to PostgreSQL
from aiecs.infrastructure.graph_storage.postgresql import PostgreSQLGraphStore
store = PostgreSQLGraphStore(connection_string="postgresql://...")
  1. Enable skip_errors for faster processing:

pipeline = StructuredDataPipeline(
    mapping=schema_mapping,
    graph_store=store,
    skip_errors=True  # Skip malformed rows
)

Problem: JSON Import Fails with “Invalid JSON”

Symptoms:

Error: Expecting value: line 1 column 1 (char 0)

Solutions:

  1. Validate JSON format:

python -m json.tool data.json
  1. Check for:

    • Missing commas between objects

    • Trailing commas

    • Single quotes instead of double quotes

    • Unescaped special characters

  2. Use newline-delimited JSON for large files:

{"id": "1", "name": "Alice"}
{"id": "2", "name": "Bob"}

Search and Reranking Issues

Problem: Search Returns No Results

Symptoms:

  • Query returns empty list

  • Expected entities not found

Solutions:

  1. Check entity properties match query:

# Verify entities have searchable text
entity = await store.get_entity("e1")
print(entity.properties)  # Should have text fields
  1. Try different search modes:

# Try vector search
result = await tool.run(mode="vector", query="...")

# Try graph search
result = await tool.run(mode="graph", seed_entity_ids=["e1"])

# Try hybrid
result = await tool.run(mode="hybrid", query="...")
  1. Check embeddings are present:

entity = await store.get_entity("e1")
print(entity.embedding)  # Should not be None

Problem: Reranking is Too Slow

Symptoms:

  • Search takes >1 second

  • Latency >500ms

Solutions:

  1. Use faster reranking strategy:

# ❌ Slow - hybrid reranking
rerank_strategy="hybrid"  # 150-300ms

# ✅ Fast - text reranking
rerank_strategy="text"  # 50-100ms
  1. Reduce top_k:

# ❌ Slow - reranking 200 results
top_k=200

# ✅ Fast - reranking 20 results
top_k=20
  1. Disable reranking for simple queries:

result = await tool.run(
    query="...",
    enable_reranking=False  # Skip reranking
)

Knowledge Fusion Issues

Problem: Too Many Entities Being Merged

Symptoms:

  • Unrelated entities are merged

  • Merge count is unexpectedly high

Solutions:

  1. Increase similarity threshold:

# ❌ Too lenient - merges too many
fusion = KnowledgeFusion(store, similarity_threshold=0.70)

# ✅ More strict - fewer merges
fusion = KnowledgeFusion(store, similarity_threshold=0.90)
  1. Filter by entity type:

# Only merge specific types
stats = await fusion.fuse_cross_document_entities(
    entity_types=["Person"]  # Don't merge other types
)
  1. Review merge results:

# Check what was merged
provenance = await fusion.track_entity_provenance("e1")
print(f"Entity came from: {provenance}")

Problem: Fusion is Too Slow

Symptoms:

  • Fusion takes >30 seconds for 200 entities

  • Throughput <10 entities/second

Solutions:

  1. Increase similarity threshold (fewer comparisons):

fusion = KnowledgeFusion(store, similarity_threshold=0.90)
  1. Run fusion periodically, not on every update:

# ❌ Slow - fusion after every import
await pipeline.import_from_csv("data.csv")
await fusion.fuse_cross_document_entities()

# ✅ Fast - fusion once at the end
await pipeline.import_from_csv("data1.csv")
await pipeline.import_from_csv("data2.csv")
await pipeline.import_from_csv("data3.csv")
await fusion.fuse_cross_document_entities()  # Once
  1. Use faster conflict resolution:

# ❌ Slower
conflict_resolution_strategy="most_confident"

# ✅ Faster
conflict_resolution_strategy="most_complete"

Performance Issues

Problem: High Memory Usage

Symptoms:

  • Application using >2GB RAM

  • Out of memory errors

Solutions:

  1. Switch to SQLite or PostgreSQL:

# ❌ High memory - InMemory
from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore
store = InMemoryGraphStore()

# ✅ Low memory - SQLite
from aiecs.infrastructure.graph_storage.sqlite import SQLiteGraphStore
store = SQLiteGraphStore(db_path="graph.db")
  1. Reduce cache sizes:

# Reduce schema cache
schema_manager = SchemaManager(
    cache_size=100,  # Reduce from 1000
    ttl_seconds=300
)
  1. Process data in batches:

# Process large files in chunks
for chunk in pd.read_csv("large.csv", chunksize=1000):
    await pipeline.import_from_dataframe(chunk)

Problem: Slow Query Performance

Symptoms:

  • Queries take >500ms

  • Search is slow

Solutions:

  1. Enable query optimization:

# Enable in configuration
KG_ENABLE_QUERY_OPTIMIZATION=true
KG_QUERY_OPTIMIZATION_STRATEGY=balanced
  1. Enable schema caching:

KG_ENABLE_SCHEMA_CACHE=true
KG_SCHEMA_CACHE_TTL_SECONDS=3600
  1. Use PostgreSQL with pgvector:

KG_STORAGE_BACKEND=postgresql
KG_ENABLE_PGVECTOR=true
  1. Add indexes (PostgreSQL):

CREATE INDEX idx_entity_type ON entities(entity_type);
CREATE INDEX idx_relation_type ON relations(relation_type);

Configuration Issues

Problem: Configuration Not Loading

Symptoms:

  • Settings not applied

  • Using default values

Solutions:

  1. Check .env file location:

# Should be in project root
ls -la .env
  1. Verify environment variables:

# Check if variables are set
env | grep KG_
  1. Use explicit configuration:

from aiecs.config import Settings

settings = Settings(
    kg_storage_backend="postgresql",
    kg_enable_reranking=True
)

Tool Issues

Problem: Tool Returns “Unsupported Operation” Error

Symptoms:

Error: Unsupported operation: kg_builder

Solutions:

  1. Use correct operation name:

# ❌ Wrong operation name
await tool.run(op="kg_builder", ...)

# ✅ Correct - use tool's registered operations
await tool.run(op="build_from_text", ...)
  1. Check available operations:

print(tool.input_schema())  # Shows available operations

Getting Help

If you’re still experiencing issues:

  1. Check the API Reference

  2. Review Configuration Guide

  3. See Performance Guide

  4. Open an issue on GitHub with:

    • Error message

    • Minimal reproduction code

    • Environment details (Python version, OS)

    • Configuration settings

Performance Benchmarks

Expected performance for reference:

  • CSV Import: 100-300 rows/second

  • JSON Import: 100-250 records/second

  • Text Reranking: 50-100ms

  • Hybrid Reranking: 150-300ms

  • Schema Cache Hit: <1ms

  • Query Optimization: 40-70% improvement

  • Knowledge Fusion: 10-40 entities/second

If your performance is significantly worse, review the solutions above.