# Knowledge Retrieval Metrics and Monitoring

This guide documents the metrics and monitoring capabilities for knowledge retrieval in `KnowledgeAwareAgent`.

## Table of Contents

1. [Overview](#overview)
2. [Available Metrics](#available-metrics)
3. [Accessing Metrics](#accessing-metrics)
4. [Monitoring Patterns](#monitoring-patterns)
5. [Performance Analysis](#performance-analysis)
6. [Examples](#examples)

---

## Overview

The `KnowledgeAwareAgent` tracks comprehensive metrics for knowledge retrieval operations, including:

- Query counts and performance
- Cache hit/miss rates
- Strategy usage statistics
- Entity extraction metrics
- Relationship traversal metrics

All metrics are exposed through the `GraphMetrics` model and can be accessed via `agent.get_metrics()`.

---

## Available Metrics

### Query Metrics

#### `total_graph_queries`
**Type**: `int`  
**Description**: Total number of graph queries executed  
**Incremented**: On each knowledge retrieval call

#### `total_entities_retrieved`
**Type**: `int`  
**Description**: Total number of entities retrieved across all queries  
**Incremented**: Sum of entities returned from each query

#### `total_relationships_traversed`
**Type**: `int`  
**Description**: Total number of relationships traversed during graph searches  
**Incremented**: During graph traversal operations

---

### Performance Metrics

#### `average_graph_query_time`
**Type**: `float`  
**Description**: Average graph query time in seconds  
**Calculated**: `total_graph_query_time / total_graph_queries`

#### `total_graph_query_time`
**Type**: `float`  
**Description**: Cumulative time spent on graph queries in seconds

#### `min_graph_query_time`
**Type**: `Optional[float]`  
**Description**: Minimum graph query time in seconds

#### `max_graph_query_time`
**Type**: `Optional[float]`  
**Description**: Maximum graph query time in seconds

---

### Cache Metrics

#### `cache_hits`
**Type**: `int`  
**Description**: Number of cache hits  
**Incremented**: When cached knowledge is returned

#### `cache_misses`
**Type**: `int`  
**Description**: Number of cache misses  
**Incremented**: When knowledge is retrieved from graph store

#### `cache_hit_rate`
**Type**: `float`  
**Description**: Cache hit rate (0.0 to 1.0)  
**Calculated**: `cache_hits / (cache_hits + cache_misses)`

**Example**: `0.75` means 75% of queries hit the cache

---

### Strategy Metrics

#### `vector_search_count`
**Type**: `int`  
**Description**: Number of vector-only searches performed

#### `graph_search_count`
**Type**: `int`  
**Description**: Number of graph-only searches performed

#### `hybrid_search_count`
**Type**: `int`  
**Description**: Number of hybrid searches performed

---

### Entity Extraction Metrics

#### `entity_extraction_count`
**Type**: `int`  
**Description**: Number of entity extractions performed

#### `average_extraction_time`
**Type**: `float`  
**Description**: Average entity extraction time in seconds  
**Calculated**: `total_extraction_time / entity_extraction_count`

#### `total_extraction_time`
**Type**: `float`  
**Description**: Cumulative time spent on entity extraction

---

## Accessing Metrics

### Method 1: Via `get_metrics()`

```python
# Get all agent metrics (includes graph metrics)
metrics = agent.get_metrics()

# Access graph metrics
graph_metrics = metrics.graph_metrics

print(f"Total queries: {graph_metrics.total_graph_queries}")
print(f"Cache hit rate: {graph_metrics.cache_hit_rate:.2%}")
print(f"Average query time: {graph_metrics.average_graph_query_time:.3f}s")
```

### Method 2: Direct Access

```python
# Access graph metrics directly
graph_metrics = agent._graph_metrics

print(f"Vector searches: {graph_metrics.vector_search_count}")
print(f"Graph searches: {graph_metrics.graph_search_count}")
print(f"Hybrid searches: {graph_metrics.hybrid_search_count}")
```

### Method 3: Via `get_graph_metrics()`

```python
# Get graph metrics explicitly
graph_metrics = agent.get_graph_metrics()

print(f"Entities retrieved: {graph_metrics.total_entities_retrieved}")
print(f"Relationships traversed: {graph_metrics.total_relationships_traversed}")
```

---

## Monitoring Patterns

### Pattern 1: Basic Metrics Tracking

```python
from aiecs.domain.agent import KnowledgeAwareAgent, AgentConfiguration
from aiecs.infrastructure.graph_storage import InMemoryGraphStore

# Create agent
agent = KnowledgeAwareAgent(
    agent_id="monitored_agent",
    name="Monitored Agent",
    llm_client=llm_client,
    tools=[],
    config=AgentConfiguration(),
    graph_store=graph_store
)
await agent.initialize()

# Execute some tasks
for i in range(10):
    await agent.execute_task({"description": f"Query {i}"})

# Get metrics
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

# Print summary
print(f"Total queries: {graph_metrics.total_graph_queries}")
print(f"Cache hit rate: {graph_metrics.cache_hit_rate:.2%}")
print(f"Average query time: {graph_metrics.average_graph_query_time*1000:.2f} ms")
print(f"Entities retrieved: {graph_metrics.total_entities_retrieved}")
```

---

### Pattern 2: Performance Monitoring

```python
import time

# Track performance over time
start_time = time.time()

# Execute operations
for query in queries:
    await agent.execute_task({"description": query})

elapsed_time = time.time() - start_time

# Get metrics
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

# Calculate throughput
queries_per_second = graph_metrics.total_graph_queries / elapsed_time
entities_per_second = graph_metrics.total_entities_retrieved / elapsed_time

print(f"Throughput: {queries_per_second:.2f} queries/second")
print(f"Entity retrieval rate: {entities_per_second:.2f} entities/second")
print(f"Average latency: {graph_metrics.average_graph_query_time*1000:.2f} ms")
```

---

### Pattern 3: Cache Effectiveness Monitoring

```python
# Monitor cache performance
metrics = agent.get_metrics()
graph_metrics = metrics.graph_metrics

cache_hits = graph_metrics.cache_hits
cache_misses = graph_metrics.cache_misses
total_requests = cache_hits + cache_misses

if total_requests > 0:
    hit_rate = graph_metrics.cache_hit_rate
    miss_rate = 1.0 - hit_rate
    
    print(f"Cache Performance:")
    print(f"  Hit rate: {hit_rate:.2%}")
    print(f"  Miss rate: {miss_rate:.2%}")
    print(f"  Total requests: {total_requests}")
    print(f"  Hits: {cache_hits}")
    print(f"  Misses: {cache_misses}")
```

```