Reranking Strategies Guide

Version: 1.0
Date: 2025-11-14
Module: aiecs.application.knowledge_graph.search.reranker_strategies

Overview

This guide covers the built-in reranking strategies available in the knowledge graph search system. Each strategy uses different signals to score entity relevance, and they can be combined for optimal results.

Built-in Strategies

1. TextSimilarityReranker

Scores entities based on text similarity using BM25 and Jaccard similarity.

from aiecs.application.knowledge_graph.search.reranker_strategies import TextSimilarityReranker

How It Works

BM25: Term-based relevance scoring (TF-IDF variant)
Jaccard: Set overlap between query and entity text
Combination: Weighted combination of both scores

Constructor

def __init__(
    self,
    bm25_weight: float = 0.7,
    jaccard_weight: float = 0.3,
    text_fields: Optional[List[str]] = None
)

Parameters:

bm25_weight (float): Weight for BM25 score (default: 0.7)
jaccard_weight (float): Weight for Jaccard score (default: 0.3)
text_fields (Optional[List[str]]): Entity fields to use for text (default: [“name”, “description”])

Example

# Create text similarity reranker
text_reranker = TextSimilarityReranker(
    bm25_weight=0.7,
    jaccard_weight=0.3,
    text_fields=["name", "description", "content"]
)

# Score entities
scores = await text_reranker.score(
    query="machine learning algorithms",
    entities=search_results
)

When to Use

✅ Query contains specific keywords
✅ Exact term matching is important
✅ Entity text is rich and descriptive
❌ Query is very short or generic
❌ Semantic meaning is more important than keywords

Performance

Speed: Fast (no external calls)
Memory: Low
Accuracy: Good for keyword-based queries

2. SemanticReranker

Scores entities based on semantic similarity using vector embeddings.

from aiecs.application.knowledge_graph.search.reranker_strategies import SemanticReranker

How It Works

Embeddings: Uses entity embedding vectors
Similarity: Computes cosine similarity with query embedding
Fallback: Returns 0.5 if embeddings are missing

Constructor

def __init__(self)

No parameters required.

Example

# Create semantic reranker
semantic_reranker = SemanticReranker()

# Score entities (requires query_embedding)
scores = await semantic_reranker.score(
    query="machine learning",
    entities=search_results,
    query_embedding=[0.1, 0.2, 0.3, ...]  # Query vector
)

When to Use

✅ Semantic meaning is important
✅ Query and entities have embeddings
✅ Handling synonyms and related concepts
✅ Cross-lingual search
❌ Embeddings are not available
❌ Exact keyword matching is critical

Performance

Speed: Fast (vector operations)
Memory: Medium (stores embeddings)
Accuracy: Excellent for semantic queries

3. StructuralReranker

Scores entities based on graph structure (PageRank, centrality).

from aiecs.application.knowledge_graph.search.reranker_strategies import StructuralReranker

How It Works

PageRank: Scores based on entity importance in graph
Degree Centrality: Scores based on number of connections
Combination: Weighted combination of both metrics

Constructor

def __init__(
    self,
    graph_store: GraphStore,
    pagerank_weight: float = 0.7,
    centrality_weight: float = 0.3,
    use_cache: bool = True
)

Parameters:

graph_store (GraphStore): Graph storage backend
pagerank_weight (float): Weight for PageRank score (default: 0.7)
centrality_weight (float): Weight for centrality score (default: 0.3)
use_cache (bool): Whether to cache PageRank scores (default: True)

Example

# Create structural reranker
structural_reranker = StructuralReranker(
    graph_store=store,
    pagerank_weight=0.7,
    centrality_weight=0.3
)

# Score entities
scores = await structural_reranker.score(
    query="important entities",
    entities=search_results
)

When to Use

✅ Entity importance matters
✅ Well-connected entities are more relevant
✅ Graph structure is meaningful
❌ All entities are equally important
❌ Graph is sparse or disconnected

Performance

Speed: Medium (requires graph queries)
Memory: Medium (caches PageRank)
Accuracy: Good for authority-based ranking

4. HybridReranker

Combines text, semantic, and structural signals into a single strategy.

from aiecs.application.knowledge_graph.search.reranker_strategies import HybridReranker

How It Works

Multi-Signal: Combines all three reranking approaches
Weighted: Configurable weights for each signal
Normalized: Normalizes scores before combining

Constructor

def __init__(
    self,
    graph_store: GraphStore,
    text_weight: float = 0.4,
    semantic_weight: float = 0.4,
    structural_weight: float = 0.2,
    text_fields: Optional[List[str]] = None
)

Parameters:

graph_store (GraphStore): Graph storage backend
text_weight (float): Weight for text similarity (default: 0.4)
semantic_weight (float): Weight for semantic similarity (default: 0.4)
structural_weight (float): Weight for structural importance (default: 0.2)
text_fields (Optional[List[str]]): Entity fields for text similarity

Example

# Create hybrid reranker
hybrid_reranker = HybridReranker(
    graph_store=store,
    text_weight=0.4,
    semantic_weight=0.4,
    structural_weight=0.2
)

# Score entities
scores = await hybrid_reranker.score(
    query="machine learning",
    entities=search_results,
    query_embedding=[0.1, 0.2, ...]
)

When to Use

✅ Want comprehensive ranking
✅ Multiple signals are available
✅ Balanced approach is needed
❌ Only one signal is available
❌ Need fine-grained control over strategies

Performance

Speed: Medium (combines all strategies)
Memory: Medium
Accuracy: Excellent for general-purpose ranking

Strategy Comparison

Strategy	Speed	Memory	Best For	Requires
TextSimilarity	Fast	Low	Keyword queries	Entity text
Semantic	Fast	Medium	Semantic queries	Embeddings
Structural	Medium	Medium	Authority ranking	Graph structure
Hybrid	Medium	Medium	General purpose	All of above

Combining Strategies

Using ResultReranker

Combine multiple strategies with custom weights:

from aiecs.application.knowledge_graph.search.reranker import (
    ResultReranker,
    ScoreCombinationMethod
)

# Create individual strategies
text_reranker = TextSimilarityReranker()
semantic_reranker = SemanticReranker()
structural_reranker = StructuralReranker(graph_store)

# Combine with weighted average
reranker = ResultReranker(
    strategies=[text_reranker, semantic_reranker, structural_reranker],
    combination_method=ScoreCombinationMethod.WEIGHTED_AVERAGE,
    weights={
        "text": 0.4,
        "semantic": 0.4,
        "structural": 0.2
    }
)

Combination Methods

1. Weighted Average (Recommended)

Combines scores using weighted average:

combination_method=ScoreCombinationMethod.WEIGHTED_AVERAGE
weights={"text": 0.6, "semantic": 0.4}

When to use: Most cases, allows fine-tuning importance

2. Reciprocal Rank Fusion (RRF)

Combines based on ranks rather than scores:

combination_method=ScoreCombinationMethod.RRF

When to use: Scores are on different scales, want rank-based fusion

3. Max Score

Takes maximum score across strategies:

combination_method=ScoreCombinationMethod.MAX

When to use: Want entities that excel in any strategy

4. Min Score

Takes minimum score across strategies:

combination_method=ScoreCombinationMethod.MIN

When to use: Want entities that score well in all strategies

Best Practices

1. Choose the Right Strategy

# For keyword-heavy queries
if query_has_specific_terms:
    use TextSimilarityReranker

# For semantic/conceptual queries
if query_is_conceptual:
    use SemanticReranker

# For authority-based ranking
if importance_matters:
    use StructuralReranker

# For general purpose
else:
    use HybridReranker

2. Tune Weights

Start with default weights and adjust based on results:

# Default balanced weights
weights = {"text": 0.4, "semantic": 0.4, "structural": 0.2}

# Keyword-focused
weights = {"text": 0.7, "semantic": 0.2, "structural": 0.1}

# Semantic-focused
weights = {"text": 0.2, "semantic": 0.7, "structural": 0.1}

# Authority-focused
weights = {"text": 0.3, "semantic": 0.3, "structural": 0.4}

3. Normalize Scores

Always normalize scores when combining strategies:

reranker = ResultReranker(
    strategies=[...],
    normalize_scores=True,  # Important!
    normalization_method="min_max"
)

4. Use Top-K Limiting

Limit results for better performance:

reranked = await reranker.rerank(
    query=query,
    entities=entities,
    top_k=20  # Only return top 20
)

5. Cache When Possible

Enable caching for structural reranker:

structural_reranker = StructuralReranker(
    graph_store=store,
    use_cache=True  # Cache PageRank scores
)

Custom Strategies

Creating a Custom Strategy

Implement the RerankerStrategy interface:

from aiecs.application.knowledge_graph.search.reranker import RerankerStrategy
from typing import List
from aiecs.domain.knowledge_graph.models.entity import Entity

class RecencyReranker(RerankerStrategy):
    """Rerank based on entity recency"""

    @property
    def name(self) -> str:
        return "recency"

    async def score(
        self,
        query: str,
        entities: List[Entity],
        **kwargs
    ) -> List[float]:
        """Score based on creation/update time"""
        scores = []
        for entity in entities:
            # Get timestamp from entity metadata
            timestamp = entity.metadata.get("updated_at", 0)
            # Normalize to [0, 1] based on age
            age_days = (time.time() - timestamp) / 86400
            score = 1.0 / (1.0 + age_days / 365)  # Decay over year
            scores.append(score)
        return scores

Using Custom Strategy

# Create custom strategy
recency_reranker = RecencyReranker()

# Use with ResultReranker
reranker = ResultReranker(
    strategies=[text_reranker, recency_reranker],
    weights={"text": 0.7, "recency": 0.3}
)

Use Cases

Use Case 1: Academic Paper Search

Goal: Find relevant papers with high citation count

Strategy:

# Combine semantic similarity with structural importance
reranker = ResultReranker(
    strategies=[
        SemanticReranker(),
        StructuralReranker(graph_store)  # Citations = high PageRank
    ],
    weights={
        "semantic": 0.6,
        "structural": 0.4  # Emphasize citations
    }
)

Use Case 2: Product Search

Goal: Find products matching keywords with good reviews

Strategy:

# Combine text matching with custom review score
class ReviewReranker(RerankerStrategy):
    @property
    def name(self) -> str:
        return "reviews"

    async def score(self, query, entities, **kwargs):
        return [
            entity.metadata.get("review_score", 0.5) / 5.0
            for entity in entities
        ]

reranker = ResultReranker(
    strategies=[
        TextSimilarityReranker(),
        ReviewReranker()
    ],
    weights={"text": 0.7, "reviews": 0.3}
)

Use Case 3: Expert Finding

Goal: Find experts in a domain

Strategy:

# Emphasize structural importance (connections, collaborations)
reranker = ResultReranker(
    strategies=[
        SemanticReranker(),
        StructuralReranker(graph_store)
    ],
    weights={
        "semantic": 0.3,
        "structural": 0.7  # Emphasize network position
    }
)

Use Case 4: News Article Search

Goal: Find relevant recent articles

Strategy:

# Combine relevance with recency
reranker = ResultReranker(
    strategies=[
        TextSimilarityReranker(),
        SemanticReranker(),
        RecencyReranker()
    ],
    weights={
        "text": 0.4,
        "semantic": 0.3,
        "recency": 0.3  # Boost recent articles
    }
)

Performance Optimization

1. Batch Processing

Process multiple queries efficiently:

async def rerank_batch(queries, entities_list):
    """Rerank multiple query-entity pairs"""
    tasks = [
        reranker.rerank(query, entities)
        for query, entities in zip(queries, entities_list)
    ]
    return await asyncio.gather(*tasks)

2. Caching

Cache expensive computations:

from functools import lru_cache

class CachedStructuralReranker(StructuralReranker):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._pagerank_cache = {}

    async def _get_pagerank_scores(self, entity_ids):
        # Check cache first
        cache_key = tuple(sorted(entity_ids))
        if cache_key in self._pagerank_cache:
            return self._pagerank_cache[cache_key]

        # Compute and cache
        scores = await super()._get_pagerank_scores(entity_ids)
        self._pagerank_cache[cache_key] = scores
        return scores

3. Parallel Strategy Execution

Execute strategies in parallel:

import asyncio

async def parallel_rerank(query, entities):
    """Execute all strategies in parallel"""
    # Get scores from all strategies concurrently
    score_tasks = [
        strategy.score(query, entities)
        for strategy in reranker.strategies
    ]
    all_scores = await asyncio.gather(*score_tasks)

    # Combine scores
    # ... (combination logic)

4. Early Stopping

Stop processing if top results are clear:

async def rerank_with_early_stop(
    query,
    entities,
    confidence_threshold=0.9
):
    """Stop if top result has high confidence"""
    reranked = await reranker.rerank(query, entities)

    if reranked and reranked[0][1] > confidence_threshold:
        # Top result is very confident, return early
        return reranked[:10]

    return reranked

Troubleshooting

Problem: Low Scores for All Entities

Cause: Normalization issue or missing data

Solution:

# Check raw scores before normalization
reranker = ResultReranker(
    strategies=[...],
    normalize_scores=False  # Disable to debug
)

# Or use different normalization
reranker = ResultReranker(
    strategies=[...],
    normalization_method="softmax"  # Try different method
)

Problem: One Strategy Dominates

Cause: Scores on different scales

Solution:

# Always normalize scores
reranker = ResultReranker(
    strategies=[...],
    normalize_scores=True,  # Enable normalization
    normalization_method="min_max"
)

# Or adjust weights
weights = {
    "dominant_strategy": 0.3,  # Reduce weight
    "other_strategy": 0.7      # Increase weight
}

Problem: Slow Performance

Cause: Expensive strategy computations

Solution:

# Use caching
structural_reranker = StructuralReranker(
    graph_store=store,
    use_cache=True
)

# Limit results early
reranked = await reranker.rerank(
    query=query,
    entities=entities[:100],  # Limit input size
    top_k=20
)

# Use faster strategies
# Replace SemanticReranker with TextSimilarityReranker if embeddings are slow

Problem: Missing Embeddings

Cause: Entities don’t have embedding vectors

Solution:

# Provide fallback score
class SafeSemanticReranker(SemanticReranker):
    async def score(self, query, entities, **kwargs):
        scores = []
        for entity in entities:
            if entity.embedding:
                score = compute_similarity(query_emb, entity.embedding)
            else:
                score = 0.5  # Neutral score for missing embeddings
            scores.append(score)
        return scores

Testing Strategies

Unit Testing

import pytest

@pytest.mark.asyncio
async def test_text_similarity_reranker():
    """Test text similarity reranker"""
    reranker = TextSimilarityReranker()

    # Create test entities
    entities = [
        Entity(id="1", name="Machine Learning", description="ML algorithms"),
        Entity(id="2", name="Deep Learning", description="Neural networks"),
        Entity(id="3", name="Cooking", description="Recipes and food")
    ]

    # Score entities
    scores = await reranker.score("machine learning", entities)

    # Verify scores
    assert len(scores) == 3
    assert scores[0] > scores[2]  # ML more relevant than cooking
    assert all(0 <= s <= 1 for s in scores)  # Scores in valid range

Integration Testing

@pytest.mark.asyncio
async def test_result_reranker_integration():
    """Test full reranker pipeline"""
    reranker = ResultReranker(
        strategies=[
            TextSimilarityReranker(),
            SemanticReranker()
        ],
        weights={"text": 0.6, "semantic": 0.4}
    )

    # Rerank entities
    reranked = await reranker.rerank(
        query="machine learning",
        entities=test_entities,
        top_k=10
    )

    # Verify results
    assert len(reranked) <= 10
    assert all(isinstance(item, tuple) for item in reranked)
    assert all(0 <= score <= 1 for _, score in reranked)
    # Verify sorted descending
    scores = [score for _, score in reranked]
    assert scores == sorted(scores, reverse=True)

Conclusion

The reranking framework provides flexible, composable strategies for improving search result relevance. Key takeaways:

✅ Choose the right strategy for your use case
✅ Combine strategies for better results
✅ Tune weights based on your data
✅ Normalize scores when combining
✅ Cache expensive computations
✅ Test thoroughly before production

For more information, see the ResultReranker API Documentation.