Knowledge Graph Search Strategies

Overview

The knowledge graph system provides multiple sophisticated search strategies for querying entities and relationships. Each strategy is optimized for different use cases and can be combined for powerful query capabilities.

Search Strategy Types

1. Vector Search

Purpose: Find entities semantically similar to a query using vector embeddings.

How It Works:

Converts query to embedding vector
Computes cosine similarity with all entity embeddings
Returns top-k most similar entities

Use Cases:

Content discovery
Semantic similarity matching
Finding related concepts

Example:

results = await graph_store.vector_search(
    query_embedding=[0.1, 0.2, 0.3, ...],
    entity_type="Article",
    max_results=10,
    score_threshold=0.7
)

Performance: O(n) where n = number of entities with embeddings

2. Graph Structure Search

Purpose: Explore graph structure from seed entities.

How It Works:

Starts from seed entities
Traverses graph edges (BFS)
Scores by depth (closer = higher score)

Use Cases:

Relationship exploration
Network analysis
Connected entity discovery

Example:

neighbors = await graph_store.get_neighbors(
    entity_id="person_1",
    relation_type="KNOWS",
    direction="outgoing"
)

Performance: O(b^d) where b = branching factor, d = depth

3. Hybrid Search

Purpose: Combine vector similarity with graph structure.

How It Works:

Performs vector search
Expands results with graph neighbors
Combines scores with weighted averaging

Use Cases:

Comprehensive search
Context-aware retrieval
Balanced semantic + structural results

Example:

from aiecs.application.knowledge_graph.search.hybrid_search import (
    HybridSearchStrategy,
    HybridSearchConfig,
    SearchMode
)

strategy = HybridSearchStrategy(graph_store)

config = HybridSearchConfig(
    mode=SearchMode.HYBRID,
    vector_weight=0.6,
    graph_weight=0.4,
    max_results=10
)

results = await strategy.search(
    query_embedding=[0.1, 0.2, ...],
    config=config
)

Performance: O(n + b^d) - combines vector and graph search

4. Personalized PageRank

Purpose: Find influential entities using random walk algorithm.

How It Works:

Random walk with restart at seed entities
Iterates until convergence
Ranks entities by visit frequency

Use Cases:

Influence analysis
Authority ranking
Central entity identification

Example:

from aiecs.application.knowledge_graph.retrieval.retrieval_strategies import (
    PersonalizedPageRank
)

ppr = PersonalizedPageRank(graph_store)

results = await ppr.retrieve(
    seed_entity_ids=["key_person"],
    max_results=20,
    alpha=0.15  # restart probability
)

Performance: O(iterations × edges) - typically 10-50 iterations

5. Multi-Hop Retrieval

Purpose: Discover entities within N hops from seeds.

How It Works:

Breadth-first expansion from seeds
Scores decay with hop distance
Configurable depth limit

Use Cases:

Friend-of-friend discovery
Local network exploration
Proximity-based search

Example:

from aiecs.application.knowledge_graph.retrieval.retrieval_strategies import (
    MultiHopRetrieval
)

retrieval = MultiHopRetrieval(graph_store)

results = await retrieval.retrieve(
    seed_entity_ids=["start_node"],
    max_hops=2,
    score_decay=0.5,
    max_results=50
)

Performance: O(b^d) where b = branching factor, d = max_hops

6. Filtered Retrieval

Purpose: Precise entity selection by properties.

How It Works:

Filters entities by type and properties
Supports exact matches and custom functions
Scores by match quality

Use Cases:

Attribute-based selection
Data validation queries
Precise entity lookup

Example:

from aiecs.application.knowledge_graph.retrieval.retrieval_strategies import (
    FilteredRetrieval
)

retrieval = FilteredRetrieval(graph_store)

results = await retrieval.retrieve(
    entity_type="Person",
    property_filters={"role": "Engineer", "level": "Senior"},
    max_results=100
)

Performance: O(n) where n = candidate entities

7. Pattern-Based Traversal

Purpose: Follow specific relationship patterns.

How It Works:

Uses PathPattern to specify constraints
Traverses graph matching pattern
Returns paths and entities

Use Cases:

Pattern matching
Path discovery
Relationship chain exploration

Example:

from aiecs.application.knowledge_graph.traversal.enhanced_traversal import (
    EnhancedTraversal
)
from aiecs.domain.knowledge_graph.models.path_pattern import PathPattern

traversal = EnhancedTraversal(graph_store)

pattern = PathPattern(
    relation_types=["WORKS_FOR", "LOCATED_IN"],
    max_depth=2,
    allow_cycles=False
)

paths = await traversal.traverse_with_pattern(
    start_entity_id="person_1",
    pattern=pattern,
    max_results=10
)

Performance: O(paths × depth) - depends on pattern complexity

Strategy Comparison

Strategy	Best For	Speed	Precision	Scalability
Vector	Semantic similarity	Fast	High	Medium
Graph	Structure exploration	Fast	Medium	High
Hybrid	Balanced search	Medium	High	Medium
PageRank	Influence ranking	Medium	High	Medium
Multi-Hop	Local exploration	Fast	Medium	High
Filtered	Precise selection	Fast	Very High	High
Traverse	Pattern matching	Medium	High	Medium

Combining Strategies

Pattern 1: Vector → Graph

# Find semantically similar entities
vector_results = await graph_store.vector_search(
    query_embedding=query,
    max_results=5
)

# Explore their graph neighbors
seeds = [e.id for e, _ in vector_results]
graph_results = await multihop.retrieve(
    seed_entity_ids=seeds,
    max_hops=2
)

Pattern 2: PageRank → Filter

# Find influential entities
pagerank_results = await ppr.retrieve(
    seed_entity_ids=["key_node"],
    max_results=50
)

# Filter by properties
filtered = [
    e for e, score in pagerank_results
    if e.properties.get("verified") == True
]

Pattern 3: Hybrid with Caching

from aiecs.application.knowledge_graph.retrieval.retrieval_strategies import (
    RetrievalCache
)

cache = RetrievalCache(max_size=100, ttl=300)

results = await cache.get_or_compute(
    cache_key="frequent_query",
    compute_fn=lambda: hybrid_strategy.search(...)
)

Performance Optimization

1. Use Appropriate Strategy

Small graphs (< 1K entities): Any strategy works well
Medium graphs (1K-10K): Vector, Graph, Multi-Hop, Filtered
Large graphs (> 10K): Filtered, Graph (with depth limits)

2. Limit Search Scope

# Use entity_type filter
results = await vector_search(
    query_embedding=query,
    entity_type="Person",  # Reduces search space
    max_results=10
)

3. Set Reasonable Limits

# Limit depth for graph operations
results = await multihop.retrieve(
    seed_entity_ids=seeds,
    max_hops=2,  # Keep ≤ 3 for performance
    max_results=50  # Reasonable limit
)

4. Enable Caching

# Cache frequent queries
cache = RetrievalCache(max_size=100, ttl=300)
results = await cache.get_or_compute(...)

5. Batch Operations

# Use transactions for multiple operations
async with store.transaction():
    await store.add_entity(e1)
    await store.add_entity(e2)
    # More efficient than individual commits

Best Practices

Choose the Right Strategy: Match strategy to use case
Set Thresholds: Use similarity thresholds to filter low-quality results
Limit Depth: Keep graph traversal depth ≤ 3 for performance
Use Filters: Entity type and property filters reduce search space
Cache Results: Enable caching for repeated queries
Combine Strategies: Use multiple strategies for comprehensive results

Knowledge Graph Search Strategies

Overview

Search Strategy Types

1. Vector Search

2. Graph Structure Search

3. Hybrid Search

4. Personalized PageRank

5. Multi-Hop Retrieval

6. Filtered Retrieval

7. Pattern-Based Traversal

Strategy Comparison

Combining Strategies

Pattern 1: Vector → Graph

Pattern 2: PageRank → Filter

Pattern 3: Hybrid with Caching

Performance Optimization

1. Use Appropriate Strategy

2. Limit Search Scope

3. Set Reasonable Limits

4. Enable Caching

5. Batch Operations

Best Practices

See Also