Knowledge Retrieval Metrics and Monitoring

This guide documents the metrics and monitoring capabilities for knowledge retrieval in KnowledgeAwareAgent.

Table of Contents

  1. Overview

  2. Available Metrics

  3. Accessing Metrics

  4. Monitoring Patterns

  5. Performance Analysis

  6. Examples


Overview

The KnowledgeAwareAgent tracks comprehensive metrics for knowledge retrieval operations, including:

  • Query counts and performance

  • Cache hit/miss rates

  • Strategy usage statistics

  • Entity extraction metrics

  • Relationship traversal metrics

All metrics are exposed through the GraphMetrics model and can be accessed via agent.get_metrics().


Available Metrics

Query Metrics

total_graph_queries

Type: int
Description: Total number of graph queries executed
Incremented: On each knowledge retrieval call

total_entities_retrieved

Type: int
Description: Total number of entities retrieved across all queries
Incremented: Sum of entities returned from each query

total_relationships_traversed

Type: int
Description: Total number of relationships traversed during graph searches
Incremented: During graph traversal operations


Performance Metrics

average_graph_query_time

Type: float
Description: Average graph query time in seconds
Calculated: total_graph_query_time / total_graph_queries

total_graph_query_time

Type: float
Description: Cumulative time spent on graph queries in seconds

min_graph_query_time

Type: Optional[float]
Description: Minimum graph query time in seconds

max_graph_query_time

Type: Optional[float]
Description: Maximum graph query time in seconds


Cache Metrics

cache_hits

Type: int
Description: Number of cache hits
Incremented: When cached knowledge is returned

cache_misses

Type: int
Description: Number of cache misses
Incremented: When knowledge is retrieved from graph store

cache_hit_rate

Type: float
Description: Cache hit rate (0.0 to 1.0)
Calculated: cache_hits / (cache_hits + cache_misses)

Example: 0.75 means 75% of queries hit the cache


Strategy Metrics

vector_search_count

Type: int
Description: Number of vector-only searches performed

graph_search_count

Type: int
Description: Number of graph-only searches performed

hybrid_search_count

Type: int
Description: Number of hybrid searches performed


Entity Extraction Metrics

entity_extraction_count

Type: int
Description: Number of entity extractions performed

average_extraction_time

Type: float
Description: Average entity extraction time in seconds
Calculated: total_extraction_time / entity_extraction_count

total_extraction_time

Type: float
Description: Cumulative time spent on entity extraction


Accessing Metrics

Method 1: Via get_metrics()

# Get all agent metrics (includes graph metrics)
metrics = agent.get_metrics()

# Access graph metrics
graph_metrics = metrics.graph_metrics

print(f"Total queries: {graph_metrics.total_graph_queries}")
print(f"Cache hit rate: {graph_metrics.cache_hit_rate:.2%}")
print(f"Average query time: {graph_metrics.average_graph_query_time:.3f}s")

Method 2: Direct Access

# Access graph metrics directly
graph_metrics = agent._graph_metrics

print(f"Vector searches: {graph_metrics.vector_search_count}")
print(f"Graph searches: {graph_metrics.graph_search_count}")
print(f"Hybrid searches: {graph_metrics.hybrid_search_count}")

Method 3: Via get_graph_metrics()

# Get graph metrics explicitly
graph_metrics = agent.get_graph_metrics()

print(f"Entities retrieved: {graph_metrics.total_entities_retrieved}")
print(f"Relationships traversed: {graph_metrics.total_relationships_traversed}")

Monitoring Patterns

Pattern 1: Basic Metrics Tracking

from aiecs.domain.agent import KnowledgeAwareAgent, AgentConfiguration
from aiecs.infrastructure.graph_storage import InMemoryGraphStore

# Create agent
agent = KnowledgeAwareAgent(
    agent_id="monitored_agent",
    name="Monitored Agent",
    llm_client=llm_client,
    tools=[],
    config=AgentConfiguration(),
    graph_store=graph_store
)
await agent.initialize()

# Execute some tasks
for i in range(10):
    await agent.execute_task({"description": f"Query {i}"})

# Get metrics
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

# Print summary
print(f"Total queries: {graph_metrics.total_graph_queries}")
print(f"Cache hit rate: {graph_metrics.cache_hit_rate:.2%}")
print(f"Average query time: {graph_metrics.average_graph_query_time*1000:.2f} ms")
print(f"Entities retrieved: {graph_metrics.total_entities_retrieved}")

Pattern 2: Performance Monitoring

import time

# Track performance over time
start_time = time.time()

# Execute operations
for query in queries:
    await agent.execute_task({"description": query})

elapsed_time = time.time() - start_time

# Get metrics
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

# Calculate throughput
queries_per_second = graph_metrics.total_graph_queries / elapsed_time
entities_per_second = graph_metrics.total_entities_retrieved / elapsed_time

print(f"Throughput: {queries_per_second:.2f} queries/second")
print(f"Entity retrieval rate: {entities_per_second:.2f} entities/second")
print(f"Average latency: {graph_metrics.average_graph_query_time*1000:.2f} ms")

Pattern 3: Cache Effectiveness Monitoring

# Monitor cache performance
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

cache_hits = graph_metrics.cache_hits
cache_misses = graph_metrics.cache_misses
total_requests = cache_hits + cache_misses

if total_requests > 0:
    hit_rate = graph_metrics.cache_hit_rate
    miss_rate = 1.0 - hit_rate
    
    print(f"Cache Performance:")
    print(f"  Hit rate: {hit_rate:.2%}")
    print(f"  Miss rate: {miss_rate:.2%}")
    print(f"  Total requests: {total_requests}")
    print(f"  Hits: {cache_hits}")
    print(f"  Misses: {cache_misses}")