Reranking Strategies Guide

Version: 1.0
Date: 2025-11-14
Module: aiecs.application.knowledge_graph.search.reranker_strategies

Overview

This guide covers the built-in reranking strategies available in the knowledge graph search system. Each strategy uses different signals to score entity relevance, and they can be combined for optimal results.

Built-in Strategies

1. TextSimilarityReranker

Scores entities based on text similarity using BM25 and Jaccard similarity.

from aiecs.application.knowledge_graph.search.reranker_strategies import TextSimilarityReranker

How It Works

  • BM25: Term-based relevance scoring (TF-IDF variant)

  • Jaccard: Set overlap between query and entity text

  • Combination: Weighted combination of both scores

Constructor

def __init__(
    self,
    bm25_weight: float = 0.7,
    jaccard_weight: float = 0.3,
    text_fields: Optional[List[str]] = None
)

Parameters:

  • bm25_weight (float): Weight for BM25 score (default: 0.7)

  • jaccard_weight (float): Weight for Jaccard score (default: 0.3)

  • text_fields (Optional[List[str]]): Entity fields to use for text (default: [“name”, “description”])

Example

# Create text similarity reranker
text_reranker = TextSimilarityReranker(
    bm25_weight=0.7,
    jaccard_weight=0.3,
    text_fields=["name", "description", "content"]
)

# Score entities
scores = await text_reranker.score(
    query="machine learning algorithms",
    entities=search_results
)

When to Use

  • ✅ Query contains specific keywords

  • ✅ Exact term matching is important

  • ✅ Entity text is rich and descriptive

  • ❌ Query is very short or generic

  • ❌ Semantic meaning is more important than keywords

Performance

  • Speed: Fast (no external calls)

  • Memory: Low

  • Accuracy: Good for keyword-based queries


2. SemanticReranker

Scores entities based on semantic similarity using vector embeddings.

from aiecs.application.knowledge_graph.search.reranker_strategies import SemanticReranker

How It Works

  • Embeddings: Uses entity embedding vectors

  • Similarity: Computes cosine similarity with query embedding

  • Fallback: Returns 0.5 if embeddings are missing

Constructor

def __init__(self)

No parameters required.

Example

# Create semantic reranker
semantic_reranker = SemanticReranker()

# Score entities (requires query_embedding)
scores = await semantic_reranker.score(
    query="machine learning",
    entities=search_results,
    query_embedding=[0.1, 0.2, 0.3, ...]  # Query vector
)

When to Use

  • ✅ Semantic meaning is important

  • ✅ Query and entities have embeddings

  • ✅ Handling synonyms and related concepts

  • ✅ Cross-lingual search

  • ❌ Embeddings are not available

  • ❌ Exact keyword matching is critical

Performance

  • Speed: Fast (vector operations)

  • Memory: Medium (stores embeddings)

  • Accuracy: Excellent for semantic queries


3. StructuralReranker

Scores entities based on graph structure (PageRank, centrality).

from aiecs.application.knowledge_graph.search.reranker_strategies import StructuralReranker

How It Works

  • PageRank: Scores based on entity importance in graph

  • Degree Centrality: Scores based on number of connections

  • Combination: Weighted combination of both metrics

Constructor

def __init__(
    self,
    graph_store: GraphStore,
    pagerank_weight: float = 0.7,
    centrality_weight: float = 0.3,
    use_cache: bool = True
)

Parameters:

  • graph_store (GraphStore): Graph storage backend

  • pagerank_weight (float): Weight for PageRank score (default: 0.7)

  • centrality_weight (float): Weight for centrality score (default: 0.3)

  • use_cache (bool): Whether to cache PageRank scores (default: True)

Example

# Create structural reranker
structural_reranker = StructuralReranker(
    graph_store=store,
    pagerank_weight=0.7,
    centrality_weight=0.3
)

# Score entities
scores = await structural_reranker.score(
    query="important entities",
    entities=search_results
)

When to Use

  • ✅ Entity importance matters

  • ✅ Well-connected entities are more relevant

  • ✅ Graph structure is meaningful

  • ❌ All entities are equally important

  • ❌ Graph is sparse or disconnected

Performance

  • Speed: Medium (requires graph queries)

  • Memory: Medium (caches PageRank)

  • Accuracy: Good for authority-based ranking


4. HybridReranker

Combines text, semantic, and structural signals into a single strategy.

from aiecs.application.knowledge_graph.search.reranker_strategies import HybridReranker

How It Works

  • Multi-Signal: Combines all three reranking approaches

  • Weighted: Configurable weights for each signal

  • Normalized: Normalizes scores before combining

Constructor

def __init__(
    self,
    graph_store: GraphStore,
    text_weight: float = 0.4,
    semantic_weight: float = 0.4,
    structural_weight: float = 0.2,
    text_fields: Optional[List[str]] = None
)

Parameters:

  • graph_store (GraphStore): Graph storage backend

  • text_weight (float): Weight for text similarity (default: 0.4)

  • semantic_weight (float): Weight for semantic similarity (default: 0.4)

  • structural_weight (float): Weight for structural importance (default: 0.2)

  • text_fields (Optional[List[str]]): Entity fields for text similarity

Example

# Create hybrid reranker
hybrid_reranker = HybridReranker(
    graph_store=store,
    text_weight=0.4,
    semantic_weight=0.4,
    structural_weight=0.2
)

# Score entities
scores = await hybrid_reranker.score(
    query="machine learning",
    entities=search_results,
    query_embedding=[0.1, 0.2, ...]
)

When to Use

  • ✅ Want comprehensive ranking

  • ✅ Multiple signals are available

  • ✅ Balanced approach is needed

  • ❌ Only one signal is available

  • ❌ Need fine-grained control over strategies

Performance

  • Speed: Medium (combines all strategies)

  • Memory: Medium

  • Accuracy: Excellent for general-purpose ranking


Strategy Comparison

Strategy

Speed

Memory

Best For

Requires

TextSimilarity

Fast

Low

Keyword queries

Entity text

Semantic

Fast

Medium

Semantic queries

Embeddings

Structural

Medium

Medium

Authority ranking

Graph structure

Hybrid

Medium

Medium

General purpose

All of above


Combining Strategies

Using ResultReranker

Combine multiple strategies with custom weights:

from aiecs.application.knowledge_graph.search.reranker import (
    ResultReranker,
    ScoreCombinationMethod
)

# Create individual strategies
text_reranker = TextSimilarityReranker()
semantic_reranker = SemanticReranker()
structural_reranker = StructuralReranker(graph_store)

# Combine with weighted average
reranker = ResultReranker(
    strategies=[text_reranker, semantic_reranker, structural_reranker],
    combination_method=ScoreCombinationMethod.WEIGHTED_AVERAGE,
    weights={
        "text": 0.4,
        "semantic": 0.4,
        "structural": 0.2
    }
)

Combination Methods

2. Reciprocal Rank Fusion (RRF)

Combines based on ranks rather than scores:

combination_method=ScoreCombinationMethod.RRF

When to use: Scores are on different scales, want rank-based fusion

3. Max Score

Takes maximum score across strategies:

combination_method=ScoreCombinationMethod.MAX

When to use: Want entities that excel in any strategy

4. Min Score

Takes minimum score across strategies:

combination_method=ScoreCombinationMethod.MIN

When to use: Want entities that score well in all strategies


Best Practices

1. Choose the Right Strategy

# For keyword-heavy queries
if query_has_specific_terms:
    use TextSimilarityReranker

# For semantic/conceptual queries
if query_is_conceptual:
    use SemanticReranker

# For authority-based ranking
if importance_matters:
    use StructuralReranker

# For general purpose
else:
    use HybridReranker

2. Tune Weights

Start with default weights and adjust based on results:

# Default balanced weights
weights = {"text": 0.4, "semantic": 0.4, "structural": 0.2}

# Keyword-focused
weights = {"text": 0.7, "semantic": 0.2, "structural": 0.1}

# Semantic-focused
weights = {"text": 0.2, "semantic": 0.7, "structural": 0.1}

# Authority-focused
weights = {"text": 0.3, "semantic": 0.3, "structural": 0.4}

3. Normalize Scores

Always normalize scores when combining strategies:

reranker = ResultReranker(
    strategies=[...],
    normalize_scores=True,  # Important!
    normalization_method="min_max"
)

4. Use Top-K Limiting

Limit results for better performance:

reranked = await reranker.rerank(
    query=query,
    entities=entities,
    top_k=20  # Only return top 20
)

5. Cache When Possible

Enable caching for structural reranker:

structural_reranker = StructuralReranker(
    graph_store=store,
    use_cache=True  # Cache PageRank scores
)

Custom Strategies

Creating a Custom Strategy

Implement the RerankerStrategy interface:

from aiecs.application.knowledge_graph.search.reranker import RerankerStrategy
from typing import List
from aiecs.domain.knowledge_graph.models.entity import Entity

class RecencyReranker(RerankerStrategy):
    """Rerank based on entity recency"""

    @property
    def name(self) -> str:
        return "recency"

    async def score(
        self,
        query: str,
        entities: List[Entity],
        **kwargs
    ) -> List[float]:
        """Score based on creation/update time"""
        scores = []
        for entity in entities:
            # Get timestamp from entity metadata
            timestamp = entity.metadata.get("updated_at", 0)
            # Normalize to [0, 1] based on age
            age_days = (time.time() - timestamp) / 86400
            score = 1.0 / (1.0 + age_days / 365)  # Decay over year
            scores.append(score)
        return scores

Using Custom Strategy

# Create custom strategy
recency_reranker = RecencyReranker()

# Use with ResultReranker
reranker = ResultReranker(
    strategies=[text_reranker, recency_reranker],
    weights={"text": 0.7, "recency": 0.3}
)

Use Cases

Use Case 3: Expert Finding

Goal: Find experts in a domain

Strategy:

# Emphasize structural importance (connections, collaborations)
reranker = ResultReranker(
    strategies=[
        SemanticReranker(),
        StructuralReranker(graph_store)
    ],
    weights={
        "semantic": 0.3,
        "structural": 0.7  # Emphasize network position
    }
)

Use Case 4: News Article Search

Goal: Find relevant recent articles

Strategy:

# Combine relevance with recency
reranker = ResultReranker(
    strategies=[
        TextSimilarityReranker(),
        SemanticReranker(),
        RecencyReranker()
    ],
    weights={
        "text": 0.4,
        "semantic": 0.3,
        "recency": 0.3  # Boost recent articles
    }
)

Performance Optimization

1. Batch Processing

Process multiple queries efficiently:

async def rerank_batch(queries, entities_list):
    """Rerank multiple query-entity pairs"""
    tasks = [
        reranker.rerank(query, entities)
        for query, entities in zip(queries, entities_list)
    ]
    return await asyncio.gather(*tasks)

2. Caching

Cache expensive computations:

from functools import lru_cache

class CachedStructuralReranker(StructuralReranker):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._pagerank_cache = {}

    async def _get_pagerank_scores(self, entity_ids):
        # Check cache first
        cache_key = tuple(sorted(entity_ids))
        if cache_key in self._pagerank_cache:
            return self._pagerank_cache[cache_key]

        # Compute and cache
        scores = await super()._get_pagerank_scores(entity_ids)
        self._pagerank_cache[cache_key] = scores
        return scores

3. Parallel Strategy Execution

Execute strategies in parallel:

import asyncio

async def parallel_rerank(query, entities):
    """Execute all strategies in parallel"""
    # Get scores from all strategies concurrently
    score_tasks = [
        strategy.score(query, entities)
        for strategy in reranker.strategies
    ]
    all_scores = await asyncio.gather(*score_tasks)

    # Combine scores
    # ... (combination logic)

4. Early Stopping

Stop processing if top results are clear:

async def rerank_with_early_stop(
    query,
    entities,
    confidence_threshold=0.9
):
    """Stop if top result has high confidence"""
    reranked = await reranker.rerank(query, entities)

    if reranked and reranked[0][1] > confidence_threshold:
        # Top result is very confident, return early
        return reranked[:10]

    return reranked

Troubleshooting

Problem: Low Scores for All Entities

Cause: Normalization issue or missing data

Solution:

# Check raw scores before normalization
reranker = ResultReranker(
    strategies=[...],
    normalize_scores=False  # Disable to debug
)

# Or use different normalization
reranker = ResultReranker(
    strategies=[...],
    normalization_method="softmax"  # Try different method
)

Problem: One Strategy Dominates

Cause: Scores on different scales

Solution:

# Always normalize scores
reranker = ResultReranker(
    strategies=[...],
    normalize_scores=True,  # Enable normalization
    normalization_method="min_max"
)

# Or adjust weights
weights = {
    "dominant_strategy": 0.3,  # Reduce weight
    "other_strategy": 0.7      # Increase weight
}

Problem: Slow Performance

Cause: Expensive strategy computations

Solution:

# Use caching
structural_reranker = StructuralReranker(
    graph_store=store,
    use_cache=True
)

# Limit results early
reranked = await reranker.rerank(
    query=query,
    entities=entities[:100],  # Limit input size
    top_k=20
)

# Use faster strategies
# Replace SemanticReranker with TextSimilarityReranker if embeddings are slow

Problem: Missing Embeddings

Cause: Entities don’t have embedding vectors

Solution:

# Provide fallback score
class SafeSemanticReranker(SemanticReranker):
    async def score(self, query, entities, **kwargs):
        scores = []
        for entity in entities:
            if entity.embedding:
                score = compute_similarity(query_emb, entity.embedding)
            else:
                score = 0.5  # Neutral score for missing embeddings
            scores.append(score)
        return scores

Testing Strategies

Unit Testing

import pytest

@pytest.mark.asyncio
async def test_text_similarity_reranker():
    """Test text similarity reranker"""
    reranker = TextSimilarityReranker()

    # Create test entities
    entities = [
        Entity(id="1", name="Machine Learning", description="ML algorithms"),
        Entity(id="2", name="Deep Learning", description="Neural networks"),
        Entity(id="3", name="Cooking", description="Recipes and food")
    ]

    # Score entities
    scores = await reranker.score("machine learning", entities)

    # Verify scores
    assert len(scores) == 3
    assert scores[0] > scores[2]  # ML more relevant than cooking
    assert all(0 <= s <= 1 for s in scores)  # Scores in valid range

Integration Testing

@pytest.mark.asyncio
async def test_result_reranker_integration():
    """Test full reranker pipeline"""
    reranker = ResultReranker(
        strategies=[
            TextSimilarityReranker(),
            SemanticReranker()
        ],
        weights={"text": 0.6, "semantic": 0.4}
    )

    # Rerank entities
    reranked = await reranker.rerank(
        query="machine learning",
        entities=test_entities,
        top_k=10
    )

    # Verify results
    assert len(reranked) <= 10
    assert all(isinstance(item, tuple) for item in reranked)
    assert all(0 <= score <= 1 for _, score in reranked)
    # Verify sorted descending
    scores = [score for _, score in reranked]
    assert scores == sorted(scores, reverse=True)

Conclusion

The reranking framework provides flexible, composable strategies for improving search result relevance. Key takeaways:

  • Choose the right strategy for your use case

  • Combine strategies for better results

  • Tune weights based on your data

  • Normalize scores when combining

  • Cache expensive computations

  • Test thoroughly before production

For more information, see the ResultReranker API Documentation.