# Knowledge Graph Reasoning Engine **Status**: ✅ Complete **Version**: 1.0.0 **Phase**: 4 - Reasoning Engine ## Overview The Knowledge Graph Reasoning Engine provides advanced reasoning capabilities over knowledge graphs. It combines query planning, multi-hop reasoning, logical inference, and evidence synthesis to answer complex questions and discover implicit knowledge. ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Reasoning Engine │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Query │ │ Multi-Hop │ │ Inference │ │ │ │ Planner │→ │ Reasoning │→ │ Engine │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ └──────────────────┴──────────────────┘ │ │ ↓ │ │ ┌──────────────────┐ │ │ │ Evidence │ │ │ │ Synthesizer │ │ │ └──────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Components ### 1. Query Planner **Purpose**: Translates natural language queries into optimized execution plans. **Key Features**: - Query decomposition into steps - Dependency resolution - Cost and latency estimation - Three optimization strategies (cost, latency, balanced) **Example**: ```python from aiecs.application.knowledge_graph.reasoning import QueryPlanner planner = QueryPlanner(graph_store) plan = planner.plan_query("Who works at companies that Alice knows people at?") # Optimize for minimal cost optimized_plan = planner.optimize_plan(plan, OptimizationStrategy.MINIMIZE_COST) ``` ### 2. Multi-Hop Reasoning Engine **Purpose**: Finds and reasons over multi-hop paths in the knowledge graph. **Key Features**: - Path finding with depth limits - Evidence collection from paths - Path ranking by relevance - Answer generation from evidence - Query execution with trace **Example**: ```python from aiecs.application.knowledge_graph.reasoning import ReasoningEngine engine = ReasoningEngine(graph_store) result = await engine.reason( query="How is Alice connected to Company X?", context={"start_entity_id": "alice", "target_entity_id": "company_x"}, max_hops=3 ) print(f"Answer: {result.answer}") print(f"Confidence: {result.confidence}") print(f"Evidence: {result.evidence_count} pieces") ``` ### 3. Inference Engine **Purpose**: Applies logical inference rules to discover implicit knowledge. **Key Features**: - Transitive inference (A→B, B→C ⇒ A→C) - Symmetric inference (A→B ⇒ B→A) - Rule-based inference - Inference result caching - Full explainability (trace inference steps) **Example**: ```python from aiecs.application.knowledge_graph.reasoning import InferenceEngine from aiecs.domain.knowledge_graph.models.inference_rule import InferenceRule, RuleType engine = InferenceEngine(graph_store) # Add transitive rule for KNOWS relations engine.add_rule(InferenceRule( rule_id="transitive_knows", rule_type=RuleType.TRANSITIVE, relation_type="KNOWS", description="Transitive closure for KNOWS" )) # Apply inference result = await engine.infer_relations( relation_type="KNOWS", max_steps=5, use_cache=True ) print(f"Inferred {len(result.inferred_relations)} new relations") ``` ### 4. Evidence Synthesizer **Purpose**: Combines evidence from multiple sources for robust conclusions. **Key Features**: - Evidence grouping by overlap - Multiple synthesis methods (weighted average, max, voting) - Confidence boosting from agreement - Contradiction detection - Reliability ranking **Example**: ```python from aiecs.application.knowledge_graph.reasoning import EvidenceSynthesizer synthesizer = EvidenceSynthesizer( confidence_threshold=0.7, contradiction_threshold=0.3 ) # Synthesize overlapping evidence synthesized = synthesizer.synthesize_evidence( evidence_list, method="weighted_average" ) # Estimate overall confidence overall_confidence = synthesizer.estimate_overall_confidence(synthesized) # Rank by reliability ranked = synthesizer.rank_by_reliability(synthesized) ``` ## Reasoning Workflow ### Complete Reasoning Pipeline ```python from aiecs.tools.knowledge_graph import GraphReasoningTool tool = GraphReasoningTool(graph_store) # Full reasoning with all components result = await tool._execute(GraphReasoningInput( mode="full_reasoning", query="How is Alice connected to Company X?", start_entity_id="alice", target_entity_id="company_x", max_hops=3, apply_inference=True, inference_relation_type="KNOWS", synthesize_evidence=True, confidence_threshold=0.7 )) # Results include: # - Query plan steps # - Multi-hop reasoning results # - Inferred relations (if enabled) # - Synthesized evidence # - Final answer with confidence # - Complete reasoning trace ``` ### Step-by-Step Workflow ``` 1. Query Planning ├─ Parse natural language query ├─ Identify query type (vector, traversal, path finding, etc.) ├─ Decompose into executable steps └─ Optimize for cost/latency 2. Multi-Hop Reasoning ├─ Execute query plan ├─ Find paths in knowledge graph ├─ Collect evidence from paths ├─ Rank evidence by relevance └─ Generate answer 3. Logical Inference (Optional) ├─ Apply inference rules ├─ Discover implicit relations ├─ Track inference steps └─ Cache results 4. Evidence Synthesis ├─ Group overlapping evidence ├─ Combine using synthesis method ├─ Boost confidence from agreement ├─ Detect contradictions └─ Rank by reliability 5. Answer Generation ├─ Combine evidence and inferences ├─ Calculate overall confidence ├─ Generate natural language answer └─ Provide reasoning trace ``` ## Domain Models ### QueryPlan ```python class QueryPlan(BaseModel): plan_id: str original_query: str steps: List[QueryStep] total_estimated_cost: float optimized: bool explanation: str ``` ### QueryStep ```python class QueryStep(BaseModel): step_id: str operation: QueryOperation query: GraphQuery depends_on: List[str] description: str estimated_cost: float ``` ### ReasoningResult ```python class ReasoningResult(BaseModel): query: str evidence: List[Evidence] answer: str confidence: float reasoning_trace: List[str] execution_time_ms: float ``` ### Evidence ```python class Evidence(BaseModel): evidence_id: str evidence_type: EvidenceType entities: List[Entity] relations: List[Relation] paths: List[Path] confidence: float relevance_score: float explanation: str source: str ``` ### InferenceResult ```python class InferenceResult(BaseModel): inferred_relations: List[Relation] inference_steps: List[InferenceStep] confidence: float total_steps: int ``` ## Use Cases ### 1. Complex Question Answering ```python # Multi-hop question with inference result = await engine.reason( query="Who are the most influential people connected to Alice?", context={"start_entity_id": "alice"}, max_hops=4 ) ``` ### 2. Relationship Discovery ```python # Find all transitive connections result = await inference_engine.infer_relations( relation_type="KNOWS", max_steps=10, use_cache=True ) ``` ### 3. Evidence-Based Decision Making ```python # Collect and synthesize evidence evidence = await collect_evidence(query) synthesized = synthesizer.synthesize_evidence(evidence) ranked = synthesizer.rank_by_reliability(synthesized) # Make decision based on top evidence decision = make_decision(ranked[0]) ``` ### 4. Knowledge Graph Completion ```python # Infer missing relations symmetric_rule = InferenceRule( rule_id="symmetric_friend", rule_type=RuleType.SYMMETRIC, relation_type="FRIEND_OF" ) inference_engine.add_rule(symmetric_rule) result = await inference_engine.infer_relations("FRIEND_OF") # Adds reverse friendship relations ``` ## Performance ### Benchmarks | Operation | Graph Size | Time (ms) | Throughput | |-----------|------------|-----------|------------| | Query Planning | Any | <10 | >100 queries/sec | | Multi-Hop (3 hops) | 1K entities | 20-50 | ~20 queries/sec | | Multi-Hop (3 hops) | 10K entities | 50-150 | ~7 queries/sec | | Inference (Transitive) | 100 relations | 10-30 | ~30 ops/sec | | Inference (Transitive) | 1K relations | 50-200 | ~5 ops/sec | | Evidence Synthesis | 10 pieces | <5 | >200 ops/sec | ### Optimization Tips 1. **Use Caching**: - Enable inference result caching - Cache query plans for repeated queries - Use retrieval cache for frequent lookups 2. **Limit Depth**: - Set `max_hops` appropriately (3-4 is usually sufficient) - Use `max_evidence` to limit evidence collection 3. **Optimize Inference**: - Set `max_steps` based on graph size - Enable only needed inference rules - Use cache for repeated relation types 4. **Parallel Execution**: - Query planner identifies parallel steps - Use `execution_order` for optimal parallelization ## Best Practices ### Query Writing ```python # Good: Specific and focused "How is Alice connected to Company X?" # Better: With constraints "How is Alice connected to Company X through WORKS_FOR relations?" # Best: With context context = { "start_entity_id": "alice", "target_entity_id": "company_x", "relation_types": ["WORKS_FOR", "KNOWS"] } ``` ### Inference Rules ```python # Enable only needed rules for rule in inference_engine.get_rules("KNOWS"): rule.enabled = True # Only when needed # Set appropriate confidence decay InferenceRule( rule_id="transitive_knows", rule_type=RuleType.TRANSITIVE, relation_type="KNOWS", confidence_decay=0.1 # 10% decay per hop ) ``` ### Evidence Synthesis ```python # Filter before synthesis high_confidence = synthesizer.filter_by_confidence( evidence_list, threshold=0.7 ) # Use appropriate method synthesized = synthesizer.synthesize_evidence( high_confidence, method="weighted_average" # Balanced approach ) # Check for contradictions contradictions = synthesizer.detect_contradictions(synthesized) if contradictions: # Handle contradictions pass ``` ## Error Handling ```python from aiecs.application.knowledge_graph.reasoning import ( QueryPlanner, ReasoningEngine, InferenceEngine ) try: # Query planning plan = planner.plan_query(query) # Multi-hop reasoning result = await engine.reason(query, context, max_hops=3) # Inference inferred = await inference_engine.infer_relations( relation_type="KNOWS", max_steps=5 ) except ValueError as e: print(f"Invalid parameter: {e}") except Exception as e: print(f"Reasoning error: {e}") ``` ## Testing All reasoning components are thoroughly tested: - **Query Planning**: 22 unit tests - **Multi-Hop Reasoning**: 20 unit tests - **Logical Inference**: 21 unit tests - **Evidence Synthesis**: 14 unit tests - **Reasoning Tools**: 11 unit tests **Total**: 88 tests passing ## API Reference ### QueryPlanner - `plan_query(query, context) -> QueryPlan` - `optimize_plan(plan, strategy) -> QueryPlan` - `translate_to_graph_query(query) -> GraphQuery` ### ReasoningEngine - `reason(query, context, max_hops, max_evidence) -> ReasoningResult` - `find_multi_hop_paths(start_id, target_id, max_hops) -> List[Path]` - `collect_evidence_from_paths(paths) -> List[Evidence]` - `rank_evidence(evidence) -> List[Evidence]` ### InferenceEngine - `infer_relations(relation_type, max_steps, use_cache) -> InferenceResult` - `add_rule(rule) -> None` - `remove_rule(rule_id) -> None` - `get_rules(relation_type) -> List[InferenceRule]` ### EvidenceSynthesizer - `synthesize_evidence(evidence_list, method) -> List[Evidence]` - `filter_by_confidence(evidence_list, threshold) -> List[Evidence]` - `detect_contradictions(evidence_list) -> List[Dict]` - `estimate_overall_confidence(evidence_list) -> float` - `rank_by_reliability(evidence_list) -> List[Evidence]` ## Examples See `docs/knowledge_graph/examples/` for complete examples: - `09_multi_hop_qa.py` - Multi-hop question answering - `10_logical_inference.py` - Logical inference over knowledge - `11_evidence_reasoning.py` - Evidence-based reasoning ## Related Documentation - [Multi-Hop Reasoning Tutorial](../tutorials/MULTI_HOP_REASONING_TUTORIAL.md) - [Logic Query Parser](./logic_query_parser.md) - [Graph Reasoning Tool](../tools/GRAPH_REASONING_TOOL.md) --- **Status**: ✅ Complete **Phase**: 4 - Reasoning Engine **Tests**: 88/88 passing **Coverage**: >80% **Ready for**: Production use