Tool Result Caching Best Practices

This guide covers how to use tool result caching to reduce API calls, improve performance, and lower costs by caching tool execution results.

Table of Contents

  1. Overview

  2. Basic Caching

  3. Cache Configuration

  4. Per-Tool TTL Configuration

  5. Cache Management

  6. Cache Invalidation

  7. Performance Monitoring

  8. Best Practices

Overview

Tool result caching provides:

  • Cost Reduction: 30-50% reduction in API costs by avoiding redundant calls

  • Performance Improvement: Faster responses for cached results

  • Configurable TTL: Different cache durations for different tools

  • Automatic Cleanup: Automatic cache cleanup when capacity threshold reached

  • Memory Management: Size limits to prevent memory exhaustion

When to Use Caching

  • ✅ Expensive API calls (search, weather, translation)

  • ✅ Results don’t change frequently

  • ✅ Same queries repeated often

  • ✅ Cost reduction is important

When NOT to Use Caching

  • ❌ Time-sensitive data (real-time prices, live data)

  • ❌ Results change frequently

  • ❌ Unique queries each time

  • ❌ Memory constraints

Basic Caching

Pattern 1: Enable Caching

Enable caching with default configuration.

from aiecs.domain.agent import HybridAgent, AgentConfiguration, CacheConfig
from aiecs.llm import OpenAIClient

# Configure caching
cache_config = CacheConfig(
    enabled=True,
    default_ttl=300  # 5 minutes default
)

agent = HybridAgent(
    agent_id="agent-1",
    name="My Agent",
    llm_client=OpenAIClient(),
    tools=["search", "calculator", "weather"],
    config=AgentConfiguration(),
    cache_config=cache_config
)

await agent.initialize()

# First call - executes tool and caches result
result1 = await agent.execute_tool_with_cache("search", {"query": "Python"})

# Second call with same parameters - uses cache!
result2 = await agent.execute_tool_with_cache("search", {"query": "Python"})
# No API call made - result from cache

Pattern 2: Disable Caching

Disable caching for specific use cases.

# Disable caching
cache_config = CacheConfig(enabled=False)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

# All tool calls execute directly (no caching)
result = await agent.execute_tool_with_cache("search", {"query": "Python"})

Pattern 3: Automatic Caching

Agent automatically caches tool results when caching is enabled.

# Agent automatically caches results
result = await agent.execute_task(
    {"description": "Search for Python"},
    {}
)
# Tool result cached automatically

Cache Configuration

Pattern 1: Basic Configuration

Configure basic caching settings.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,  # 5 minutes
    max_cache_size=1000,  # Maximum 1000 cached entries
    max_memory_mb=100  # Maximum 100MB cache memory
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

Pattern 2: Aggressive Caching

Configure aggressive caching for expensive operations.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=3600,  # 1 hour default
    max_cache_size=5000,  # Larger cache
    max_memory_mb=500,  # More memory
    cleanup_threshold=0.95  # Cleanup at 95% capacity
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    cache_config=cache_config
)

Pattern 3: Conservative Caching

Configure conservative caching for frequently changing data.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=60,  # 1 minute (short TTL)
    max_cache_size=100,  # Small cache
    cleanup_threshold=0.8  # Cleanup at 80% capacity
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["weather"],  # Weather changes frequently
    config=config,
    cache_config=cache_config
)

Per-Tool TTL Configuration

Pattern 1: Tool-Specific TTL

Configure different TTL for different tools.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,  # 5 minutes default
    tool_specific_ttl={
        "search": 600,  # Search cached for 10 minutes
        "calculator": 3600,  # Calculator cached for 1 hour
        "weather": 1800,  # Weather cached for 30 minutes
        "translation": 7200  # Translation cached for 2 hours
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search", "calculator", "weather", "translation"],
    config=config,
    cache_config=cache_config
)

Pattern 2: Disable Caching for Specific Tools

Disable caching for specific tools.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,
    tool_specific_ttl={
        "live_data": 0,  # Disable caching (0 TTL)
        "real_time": 0
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search", "live_data", "real_time"],
    config=config,
    cache_config=cache_config
)

Pattern 3: Long-Term Caching

Configure long-term caching for stable data.

cache_config = CacheConfig(
    enabled=True,
    default_ttl=300,
    tool_specific_ttl={
        "dictionary": 86400,  # 24 hours
        "encyclopedia": 86400,  # 24 hours
        "historical_data": 604800  # 7 days
    }
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["dictionary", "encyclopedia", "historical_data"],
    config=config,
    cache_config=cache_config
)

Cache Management

Pattern 1: Cache Statistics

Get cache statistics to monitor performance.

# Get cache statistics
stats = agent.get_cache_stats()

print(f"Cache size: {stats['size']}")
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.1%}")
print(f"Memory usage: {stats['memory_mb']:.1f}MB")

Pattern 2: Cache Cleanup

Manually trigger cache cleanup.

# Clean up expired entries
cleaned_count = agent.cleanup_cache()

print(f"Cleaned up {cleaned_count} expired entries")

Pattern 3: Cache Size Monitoring

Monitor cache size and trigger cleanup when needed.

stats = agent.get_cache_stats()

if stats['size'] > cache_config.max_cache_size * 0.9:
    # Cache approaching limit - cleanup
    cleaned_count = agent.cleanup_cache()
    print(f"Cleaned up {cleaned_count} entries")

Cache Invalidation

Pattern 1: Invalidate Specific Tool

Invalidate cache for a specific tool.

# Invalidate all cache entries for "search" tool
invalidated_count = agent.invalidate_cache(tool_name="search")

print(f"Invalidated {invalidated_count} cache entries")

Pattern 2: Invalidate by Pattern

Invalidate cache entries matching a pattern.

# Invalidate cache entries matching pattern
invalidated_count = agent.invalidate_cache(pattern="query:Python*")

print(f"Invalidated {invalidated_count} cache entries")

Pattern 3: Clear All Cache

Clear entire cache.

# Clear all cache
invalidated_count = agent.invalidate_cache()

print(f"Cleared {invalidated_count} cache entries")

Pattern 4: Time-Based Invalidation

Invalidate cache based on age.

import time

# Invalidate entries older than 1 hour
stats = agent.get_cache_stats()
current_time = time.time()

# Get cache timestamps and invalidate old entries
# (Implementation depends on agent's cache structure)

Performance Monitoring

Pattern 1: Cache Hit Rate Monitoring

Monitor cache hit rate to optimize configuration.

# Get cache statistics
stats = agent.get_cache_stats()

if stats['hit_rate'] < 0.5:  # Less than 50% hit rate
    logger.warning("Low cache hit rate - consider adjusting TTL")
elif stats['hit_rate'] > 0.9:  # More than 90% hit rate
    logger.info("High cache hit rate - caching is effective")

Pattern 2: Cost Savings Calculation

Calculate cost savings from caching.

stats = agent.get_cache_stats()

# Estimate cost savings
api_call_cost = 0.01  # $0.01 per API call
cache_hits = stats['hits']
cost_saved = cache_hits * api_call_cost

print(f"Cache hits: {cache_hits}")
print(f"Estimated cost saved: ${cost_saved:.2f}")

Pattern 3: Performance Impact

Measure performance impact of caching.

import time

# Without cache
start = time.time()
result1 = await agent.execute_tool("search", {"query": "Python"})
time_without_cache = time.time() - start

# With cache (second call)
start = time.time()
result2 = await agent.execute_tool_with_cache("search", {"query": "Python"})
time_with_cache = time.time() - start

speedup = time_without_cache / time_with_cache
print(f"Speedup: {speedup:.1f}x faster with cache")

Best Practices

1. Configure Appropriate TTL

Set TTL based on data freshness requirements:

# Stable data: Long TTL
cache_config = CacheConfig(
    tool_specific_ttl={
        "dictionary": 86400,  # 24 hours
        "encyclopedia": 86400
    }
)

# Frequently changing data: Short TTL
cache_config = CacheConfig(
    tool_specific_ttl={
        "weather": 1800,  # 30 minutes
        "stock_prices": 60  # 1 minute
    }
)

2. Monitor Cache Performance

Regularly monitor cache performance:

stats = agent.get_cache_stats()

if stats['hit_rate'] < 0.3:
    # Low hit rate - consider disabling caching or adjusting TTL
    logger.warning("Low cache hit rate")

3. Set Appropriate Cache Size

Set cache size based on available memory:

cache_config = CacheConfig(
    max_cache_size=1000,  # Adjust based on memory
    max_memory_mb=100  # Monitor memory usage
)

4. Invalidate Stale Data

Invalidate cache when data changes:

# After updating data
agent.invalidate_cache(tool_name="search")

5. Use for Expensive Operations

Use caching for expensive operations:

# Good: Expensive API calls
cache_config = CacheConfig(
    tool_specific_ttl={
        "search": 600,  # Cache expensive search
        "translation": 3600  # Cache expensive translation
    }
)

# Less useful: Cheap operations
cache_config = CacheConfig(
    tool_specific_ttl={
        "calculator": 0  # Don't cache cheap calculations
    }
)

6. Handle Cache Errors Gracefully

Handle cache errors without breaking functionality:

try:
    result = await agent.execute_tool_with_cache("search", {"query": "Python"})
except CacheError as e:
    logger.error(f"Cache error: {e}")
    # Fall back to direct execution
    result = await agent.execute_tool("search", {"query": "Python"})

Summary

Tool result caching provides:

  • ✅ 30-50% cost reduction

  • ✅ Faster responses for cached results

  • ✅ Configurable TTL per tool

  • ✅ Automatic cleanup

  • ✅ Memory management

Key Takeaways:

  • Use for expensive operations

  • Configure appropriate TTL

  • Monitor cache performance

  • Invalidate stale data

  • Set appropriate cache size

For more details, see: