Tool Result Caching Best Practices

This guide covers how to use tool result caching to reduce API calls, improve performance, and lower costs by caching tool execution results.

Table of Contents

Overview
Basic Caching
Cache Configuration
Per-Tool TTL Configuration
Cache Management
Cache Invalidation
Performance Monitoring
Best Practices

Overview

Tool result caching provides:

Cost Reduction: 30-50% reduction in API costs by avoiding redundant calls
Performance Improvement: Faster responses for cached results
Configurable TTL: Different cache durations for different tools
Automatic Cleanup: Automatic cache cleanup when capacity threshold reached
Memory Management: Size limits to prevent memory exhaustion

When to Use Caching

✅ Expensive API calls (search, weather, translation)
✅ Results don’t change frequently
✅ Same queries repeated often
✅ Cost reduction is important

When NOT to Use Caching

❌ Time-sensitive data (real-time prices, live data)
❌ Results change frequently
❌ Unique queries each time
❌ Memory constraints

Basic Caching

Pattern 1: Enable Caching

Enable caching with default configuration.

from aiecs.domain.agent import HybridAgent, AgentConfiguration, CacheConfig
from aiecs.llm import OpenAIClient

# Configure caching
cache_config = CacheConfig(
    enabled=True,
    default_ttl=300  # 5 minutes default
)

agent = HybridAgent(
    agent_id="agent-1",
    name="My Agent",
    llm_client=OpenAIClient(),
    tools=["search", "calculator", "weather"],
    config=AgentConfiguration(),
    cache_config=cache_config
)

await agent.initialize()

# First call - executes tool and caches result
result1 = await agent.execute_tool_with_cache("search", {"query": "Python"})

# Second call with same parameters - uses cache!
result2 = await agent.execute_tool_with_cache("search", {"query": "Python"})
# No API call made - result from cache

Pattern 2: Disable Caching

Disable caching for specific use cases.

# Disable caching
cache_config = CacheConfig(enabled=False)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

# All tool calls execute directly (no caching)
result = await agent.execute_tool_with_cache("search", {"query": "Python"})

Pattern 3: Automatic Caching

Agent automatically caches tool results when caching is enabled.

# Agent automatically caches results
result = await agent.execute_task(
    {"description": "Search for Python"},
    {}
)
# Tool result cached automatically

Cache Configuration

Pattern 1: Basic Configuration

Configure basic caching settings.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,  # 5 minutes
    max_cache_size=1000,  # Maximum 1000 cached entries
    max_memory_mb=100  # Maximum 100MB cache memory
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

Pattern 2: Aggressive Caching

Configure aggressive caching for expensive operations.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=3600,  # 1 hour default
    max_cache_size=5000,  # Larger cache
    max_memory_mb=500,  # More memory
    cleanup_threshold=0.95  # Cleanup at 95% capacity
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

Pattern 3: Conservative Caching

Configure conservative caching for frequently changing data.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=60,  # 1 minute (short TTL)
    max_cache_size=100,  # Small cache
    cleanup_threshold=0.8  # Cleanup at 80% capacity
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["weather"],  # Weather changes frequently
    config=config,
    cache_config=cache_config
)

Per-Tool TTL Configuration

Pattern 1: Tool-Specific TTL

Configure different TTL for different tools.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,  # 5 minutes default
    tool_specific_ttl={
        "search": 600,  # Search cached for 10 minutes
        "calculator": 3600,  # Calculator cached for 1 hour
        "weather": 1800,  # Weather cached for 30 minutes
        "translation": 7200  # Translation cached for 2 hours
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search", "calculator", "weather", "translation"],
    config=config,
    cache_config=cache_config
)

Pattern 2: Disable Caching for Specific Tools

Disable caching for specific tools.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,
    tool_specific_ttl={
        "live_data": 0,  # Disable caching (0 TTL)
        "real_time": 0
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search", "live_data", "real_time"],
    config=config,
    cache_config=cache_config
)

Pattern 3: Long-Term Caching

Configure long-term caching for stable data.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,
    tool_specific_ttl={
        "dictionary": 86400,  # 24 hours
        "encyclopedia": 86400,  # 24 hours
        "historical_data": 604800  # 7 days
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["dictionary", "encyclopedia", "historical_data"],
    config=config,
    cache_config=cache_config
)

Cache Management

Pattern 1: Cache Statistics

Get cache statistics to monitor performance.

# Get cache statistics
stats = agent.get_cache_stats()

print(f"Cache size: {stats['size']}")
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.1%}")
print(f"Memory usage: {stats['memory_mb']:.1f}MB")

Pattern 2: Cache Cleanup

Manually trigger cache cleanup.

# Clean up expired entries
cleaned_count = agent.cleanup_cache()

print(f"Cleaned up {cleaned_count} expired entries")

Pattern 3: Cache Size Monitoring

Monitor cache size and trigger cleanup when needed.

stats = agent.get_cache_stats()

if stats['size'] > cache_config.max_cache_size * 0.9:
    # Cache approaching limit - cleanup
    cleaned_count = agent.cleanup_cache()
    print(f"Cleaned up {cleaned_count} entries")

Cache Invalidation

Pattern 1: Invalidate Specific Tool

Invalidate cache for a specific tool.

# Invalidate all cache entries for "search" tool
invalidated_count = agent.invalidate_cache(tool_name="search")

print(f"Invalidated {invalidated_count} cache entries")

Pattern 2: Invalidate by Pattern

Invalidate cache entries matching a pattern.

# Invalidate cache entries matching pattern
invalidated_count = agent.invalidate_cache(pattern="query:Python*")

print(f"Invalidated {invalidated_count} cache entries")

Pattern 3: Clear All Cache

Clear entire cache.

# Clear all cache
invalidated_count = agent.invalidate_cache()

print(f"Cleared {invalidated_count} cache entries")

Pattern 4: Time-Based Invalidation

Invalidate cache based on age.

import time

# Invalidate entries older than 1 hour
stats = agent.get_cache_stats()
current_time = time.time()

# Get cache timestamps and invalidate old entries
# (Implementation depends on agent's cache structure)

Performance Monitoring

Pattern 1: Cache Hit Rate Monitoring

Monitor cache hit rate to optimize configuration.

# Get cache statistics
stats = agent.get_cache_stats()

if stats['hit_rate'] < 0.5:  # Less than 50% hit rate
    logger.warning("Low cache hit rate - consider adjusting TTL")
elif stats['hit_rate'] > 0.9:  # More than 90% hit rate
    logger.info("High cache hit rate - caching is effective")

Pattern 2: Cost Savings Calculation

Calculate cost savings from caching.

stats = agent.get_cache_stats()

# Estimate cost savings
api_call_cost = 0.01  # $0.01 per API call
cache_hits = stats['hits']
cost_saved = cache_hits * api_call_cost

print(f"Cache hits: {cache_hits}")
print(f"Estimated cost saved: ${cost_saved:.2f}")

Pattern 3: Performance Impact

Measure performance impact of caching.

import time

# Without cache
start = time.time()
result1 = await agent.execute_tool("search", {"query": "Python"})
time_without_cache = time.time() - start

# With cache (second call)
start = time.time()
result2 = await agent.execute_tool_with_cache("search", {"query": "Python"})
time_with_cache = time.time() - start

speedup = time_without_cache / time_with_cache
print(f"Speedup: {speedup:.1f}x faster with cache")

Best Practices

1. Configure Appropriate TTL

Set TTL based on data freshness requirements:

# Stable data: Long TTL
cache_config = CacheConfig(
    tool_specific_ttl={
        "dictionary": 86400,  # 24 hours
        "encyclopedia": 86400
    }
)

# Frequently changing data: Short TTL
cache_config = CacheConfig(
    tool_specific_ttl={
        "weather": 1800,  # 30 minutes
        "stock_prices": 60  # 1 minute
    }
)

2. Monitor Cache Performance

Regularly monitor cache performance:

stats = agent.get_cache_stats()

if stats['hit_rate'] < 0.3:
    # Low hit rate - consider disabling caching or adjusting TTL
    logger.warning("Low cache hit rate")

3. Set Appropriate Cache Size

Set cache size based on available memory:

cache_config = CacheConfig(
    max_cache_size=1000,  # Adjust based on memory
    max_memory_mb=100  # Monitor memory usage
)

4. Invalidate Stale Data

Invalidate cache when data changes:

# After updating data
agent.invalidate_cache(tool_name="search")

5. Use for Expensive Operations

Use caching for expensive operations:

# Good: Expensive API calls
cache_config = CacheConfig(
    tool_specific_ttl={
        "search": 600,  # Cache expensive search
        "translation": 3600  # Cache expensive translation
    }
)

# Less useful: Cheap operations
cache_config = CacheConfig(
    tool_specific_ttl={
        "calculator": 0  # Don't cache cheap calculations
    }
)

6. Handle Cache Errors Gracefully

Handle cache errors without breaking functionality:

try:
    result = await agent.execute_tool_with_cache("search", {"query": "Python"})
except CacheError as e:
    logger.error(f"Cache error: {e}")
    # Fall back to direct execution
    result = await agent.execute_tool("search", {"query": "Python"})

Summary

Tool result caching provides:

✅ 30-50% cost reduction
✅ Faster responses for cached results
✅ Configurable TTL per tool
✅ Automatic cleanup
✅ Memory management

Key Takeaways:

Use for expensive operations
Configure appropriate TTL
Monitor cache performance
Invalidate stale data
Set appropriate cache size

For more details, see: