# End-to-End Knowledge Graph Tutorial ## Overview This tutorial demonstrates a complete workflow using the AIECS Knowledge Graph system, from data import to advanced search and reasoning. ## Scenario We'll build a knowledge graph of employees, companies, and projects, then perform various operations: 1. Import structured data from CSV 2. Add unstructured text data 3. Merge duplicate entities 4. Search with reranking 5. Parse logical queries 6. Optimize performance ## Prerequisites ```bash pip install aiecs ``` ## Step 1: Prepare Your Data ### employees.csv ```csv person_id,full_name,age,role,company_id,company_name,city 1,Alice Smith,30,Engineer,100,Tech Corp,San Francisco 2,Bob Jones,25,Designer,100,Tech Corp,San Francisco 3,Charlie Brown,35,Manager,101,Data Inc,New York 4,Diana Prince,28,Analyst,101,Data Inc,New York 5,Eve Wilson,32,Engineer,100,Tech Corp,San Francisco ``` ### projects.json ```json [ { "project_id": "p1", "name": "AI Platform", "lead_id": "1", "company_id": "100" }, { "project_id": "p2", "name": "Data Pipeline", "lead_id": "3", "company_id": "101" } ] ``` ## Step 2: Initialize the System ```python import asyncio from aiecs.tools.knowledge_graph import ( KnowledgeGraphBuilderTool, GraphSearchTool, GraphReasoningTool ) from aiecs.application.knowledge_graph.fusion import KnowledgeFusion from aiecs.infrastructure.graph_storage.in_memory import InMemoryGraphStore async def main(): # Initialize graph store store = InMemoryGraphStore() await store.initialize() # Initialize tools builder = KnowledgeGraphBuilderTool() await builder._initialize() search_tool = GraphSearchTool() await search_tool._initialize() reasoning_tool = GraphReasoningTool() await reasoning_tool._initialize() return store, builder, search_tool, reasoning_tool # Run store, builder, search_tool, reasoning_tool = await main() ``` ## Step 3: Import CSV Data ```python # Define schema mapping employee_schema = { "entity_mappings": [ { "entity_type": "Person", "id_column": "person_id", "property_mappings": { "name": "full_name", "age": "age", "role": "role" } }, { "entity_type": "Company", "id_column": "company_id", "property_mappings": { "name": "company_name", "location": "city" } } ], "relation_mappings": [ { "relation_type": "WORKS_FOR", "source_column": "person_id", "target_column": "company_id", "source_type": "Person", "target_type": "Company" } ] } # Import CSV result = await builder.run( op="kg_builder", action="build_from_structured_data", data_path="employees.csv", schema_mapping=employee_schema ) print(f"✓ Imported {result['entities_added']} entities") print(f"✓ Created {result['relations_added']} relations") print(f"✓ Processed {result['rows_processed']} rows in {result['duration_seconds']:.2f}s") # Output: # ✓ Imported 7 entities (5 people + 2 companies) # ✓ Created 5 relations # ✓ Processed 5 rows in 0.15s ``` ## Step 4: Import JSON Data ```python project_schema = { "entity_mappings": [ { "entity_type": "Project", "id_column": "project_id", "property_mappings": { "name": "name" } } ], "relation_mappings": [ { "relation_type": "LEADS", "source_column": "lead_id", "target_column": "project_id", "source_type": "Person", "target_type": "Project" }, { "relation_type": "BELONGS_TO", "source_column": "project_id", "target_column": "company_id", "source_type": "Project", "target_type": "Company" } ] } result = await builder.run( op="kg_builder", action="build_from_structured_data", data_path="projects.json", schema_mapping=project_schema ) print(f"✓ Imported {result['entities_added']} projects") # Output: ✓ Imported 2 projects ``` ## Step 5: Add Unstructured Text ```python # Add information from text text = """ Alice Smith is a senior engineer at Tech Corp, specializing in machine learning. She recently led the AI Platform project, which uses deep learning for natural language processing. Bob Jones, a talented designer, collaborated with Alice on the user interface. """ result = await builder.run( op="kg_builder", action="build_from_text", text=text, source="company_blog", entity_types=["Person", "Technology", "Skill"] ) print(f"✓ Extracted {result['entities_added']} entities from text") print(f"✓ Found {result['relations_added']} relations") # Output: # ✓ Extracted 6 entities from text # ✓ Found 8 relations ``` ## Step 6: Merge Duplicate Entities ```python # Alice Smith appears in both CSV and text - let's merge duplicates fusion = KnowledgeFusion( graph_store=store, similarity_threshold=0.85, conflict_resolution_strategy="most_complete" ) stats = await fusion.fuse_cross_document_entities(entity_types=["Person"]) print(f"✓ Analyzed {stats['entities_analyzed']} entities") print(f"✓ Merged {stats['entities_merged']} duplicates") print(f"✓ Resolved {stats['conflicts_resolved']} conflicts") # Output: # ✓ Analyzed 7 entities # ✓ Merged 2 duplicates (Alice from CSV and text) # ✓ Resolved 3 conflicts ``` ## Step 7: Search with Reranking ```python # Search for machine learning experts result = await search_tool.run( op="graph_search", mode="hybrid", query="machine learning engineer", max_results=10, enable_reranking=True, rerank_strategy="hybrid" ) print(f"✓ Found {len(result['entities'])} relevant entities") for entity in result['entities'][:3]: print(f" - {entity['properties']['name']}: {entity['entity_type']}") # Output: # ✓ Found 5 relevant entities # - Alice Smith: Person # - AI Platform: Project # - machine learning: Skill ``` ## Step 8: Parse Logical Queries ```python # Convert natural language to logical query result = await reasoning_tool.run( op="graph_reasoning", mode="logical_query", query="Find all engineers who work for companies in San Francisco" ) print(f"✓ Query type: {result['query_type']}") print(f"✓ Variables: {result['variables']}") print(f"✓ Predicates: {len(result['predicates'])}") # Output: # ✓ Query type: FIND # ✓ Variables: ['?person', '?company'] # ✓ Predicates: 2 ``` ## Step 9: Get Statistics ```python stats = await builder.run(op="kg_builder", action="get_stats") print(f"\n=== Knowledge Graph Statistics ===") print(f"Total entities: {stats['num_entities']}") print(f"Total relations: {stats['num_relations']}") print(f"\nEntity types:") for entity_type, count in stats['entity_types'].items(): print(f" {entity_type}: {count}") print(f"\nRelation types:") for relation_type, count in stats['relation_types'].items(): print(f" {relation_type}: {count}") # Output: # === Knowledge Graph Statistics === # Total entities: 15 # Total relations: 18 # # Entity types: # Person: 5 # Company: 2 # Project: 2 # Technology: 3 # Skill: 3 # # Relation types: # WORKS_FOR: 5 # LEADS: 2 # BELONGS_TO: 2 # SPECIALIZES_IN: 4 # USES: 5 ``` ## Step 10: Cleanup ```python await store.close() ``` ## Complete Example See [CSV-to-Graph Tutorial](../examples/csv_to_graph_tutorial.md) and [JSON-to-Graph Tutorial](../examples/json_to_graph_tutorial.md) for complete working examples. ## Next Steps - [Configuration Guide](../CONFIGURATION_GUIDE.md) - Optimize performance - [Performance Guide](../PERFORMANCE_GUIDE.md) - Benchmark and tune - [API Reference](../API_REFERENCE.md) - Detailed API documentation - [Troubleshooting](../TROUBLESHOOTING.md) - Common issues and solutions ## Performance Tips 1. **Batch Size**: Use 100-500 for large imports 2. **Reranking**: Use "text" for speed, "hybrid" for precision 3. **Caching**: Enable schema caching in production 4. **Fusion**: Run periodically, not on every update 5. **Storage**: Use PostgreSQL for >1M entities See [Performance Guide](../PERFORMANCE_GUIDE.md) for details.