Resource Management Configuration

This guide covers how to configure and use resource management features including rate limiting, quotas, and throttling to ensure stable operation in production environments.

Table of Contents

  1. Overview

  2. Basic Configuration

  3. Rate Limiting

  4. Concurrent Task Limits

  5. Memory Limits

  6. Timeout Configuration

  7. Enforcement Modes

  8. Resource Monitoring

  9. Best Practices

Overview

Resource management provides:

  • Rate Limiting: Token bucket algorithm for token and tool call rate limiting

  • Concurrent Limits: Maximum concurrent task execution limits

  • Memory Limits: Memory usage constraints

  • Timeouts: Task execution timeouts

  • Enforcement: Configurable enforcement (enforce vs monitor)

  • Throttling: Automatic throttling when limits approached

When to Use Resource Management

  • ✅ Production deployments

  • ✅ API rate limit compliance

  • ✅ Resource exhaustion prevention

  • ✅ Cost control

  • ✅ Stability requirements

Basic Configuration

Pattern 1: Basic Rate Limiting

Configure basic rate limiting.

from aiecs.domain.agent import HybridAgent, AgentConfiguration
from aiecs.domain.agent.models import ResourceLimits
from aiecs.llm import OpenAIClient

# Configure resource limits
resource_limits = ResourceLimits(
    max_concurrent_tasks=5,
    max_tokens_per_minute=10000,
    max_tool_calls_per_minute=100
)

agent = HybridAgent(
    agent_id="agent-1",
    name="My Agent",
    llm_client=OpenAIClient(),
    tools=["search"],
    config=AgentConfiguration(),
    resource_limits=resource_limits
)

await agent.initialize()

Pattern 2: Production Configuration

Configure strict limits for production.

resource_limits = ResourceLimits(
    max_concurrent_tasks=10,
    max_tokens_per_minute=50000,
    max_tokens_per_hour=2000000,
    max_tool_calls_per_minute=500,
    max_tool_calls_per_hour=30000,
    max_memory_mb=2048,
    task_timeout_seconds=300,
    enforce_limits=True,
    reject_on_limit=True  # Reject instead of waiting
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

Pattern 3: Monitoring Mode

Monitor resources without enforcement.

resource_limits = ResourceLimits(
    max_concurrent_tasks=10,
    max_tokens_per_minute=10000,
    enforce_limits=False  # Monitor but don't enforce
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

Rate Limiting

Pattern 1: Token Rate Limiting

Configure token rate limits.

resource_limits = ResourceLimits(
    max_tokens_per_minute=10000,  # 10K tokens per minute
    max_tokens_per_hour=500000  # 500K tokens per hour
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Agent automatically enforces token limits
result = await agent.execute_task(task, context)

Pattern 2: Token Bucket with Burst

Configure token bucket with burst support.

resource_limits = ResourceLimits(
    max_tokens_per_minute=10000,
    token_burst_size=20000  # Allow 2x burst (20K tokens)
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Burst allows temporary over-limit usage

Pattern 3: Tool Call Rate Limiting

Configure tool call rate limits.

resource_limits = ResourceLimits(
    max_tool_calls_per_minute=100,
    max_tool_calls_per_hour=5000
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search", "calculator"],
    config=config,
    resource_limits=resource_limits
)

Concurrent Task Limits

Pattern 1: Basic Concurrent Limits

Set maximum concurrent tasks.

resource_limits = ResourceLimits(
    max_concurrent_tasks=5  # Maximum 5 concurrent tasks
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Agent enforces concurrent task limit

Pattern 2: Check Resource Availability

Check resources before executing.

# Check if resources are available
status = await agent.check_resource_availability()

if status['available']:
    result = await agent.execute_task(task, context)
else:
    print(f"Resources unavailable: {status['reason']}")
    # Wait for resources or reject

Pattern 3: Wait for Resources

Wait for resources to become available.

# Wait for resources with timeout
available = await agent.wait_for_resources(timeout=30.0)

if available:
    result = await agent.execute_task(task, context)
else:
    print("Resources not available within timeout")

Memory Limits

Pattern 1: Memory Constraints

Configure memory limits.

resource_limits = ResourceLimits(
    max_memory_mb=1024  # Maximum 1GB memory
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

Pattern 2: Memory Monitoring

Monitor memory usage.

# Get resource status
status = await agent.check_resource_availability()

if 'memory_usage_mb' in status:
    print(f"Memory usage: {status['memory_usage_mb']}MB")
    print(f"Memory limit: {status['memory_limit_mb']}MB")

Timeout Configuration

Pattern 1: Task Timeout

Configure task execution timeout.

resource_limits = ResourceLimits(
    task_timeout_seconds=300  # 5 minute timeout
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Tasks timeout after 5 minutes

Pattern 2: Resource Wait Timeout

Configure resource wait timeout.

resource_limits = ResourceLimits(
    max_concurrent_tasks=5,
    resource_wait_timeout_seconds=120  # Wait up to 2 minutes for resources
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

Enforcement Modes

Pattern 1: Enforce Limits

Enforce limits strictly.

resource_limits = ResourceLimits(
    max_concurrent_tasks=5,
    max_tokens_per_minute=10000,
    enforce_limits=True,  # Enforce limits
    reject_on_limit=True  # Reject when limit reached
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Requests rejected when limits exceeded

Pattern 2: Wait Mode

Wait for resources instead of rejecting.

resource_limits = ResourceLimits(
    max_concurrent_tasks=5,
    max_tokens_per_minute=10000,
    enforce_limits=True,
    reject_on_limit=False,  # Wait instead of reject
    resource_wait_timeout_seconds=60
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Requests wait for resources to become available

Pattern 3: Monitor Only

Monitor without enforcement.

resource_limits = ResourceLimits(
    max_concurrent_tasks=10,
    max_tokens_per_minute=10000,
    enforce_limits=False  # Monitor but don't enforce
)

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    resource_limits=resource_limits
)

# Limits monitored but not enforced

Resource Monitoring

Pattern 1: Check Resource Status

Check current resource status.

# Check resource availability
status = await agent.check_resource_availability()

print(f"Available: {status['available']}")
print(f"Active tasks: {status.get('active_tasks', 0)}")
print(f"Max tasks: {status.get('max_tasks', 0)}")

if not status['available']:
    print(f"Reason: {status['reason']}")

Pattern 2: Monitor Resource Usage

Monitor resource usage over time.

import asyncio

async def monitor_resources():
    """Monitor resource usage every minute"""
    while True:
        status = await agent.check_resource_availability()
        
        print(f"Active tasks: {status.get('active_tasks', 0)}/{status.get('max_tasks', 0)}")
        
        if not status['available']:
            logger.warning(f"Resource limit reached: {status['reason']}")
        
        await asyncio.sleep(60)  # Check every minute

# Start monitoring
asyncio.create_task(monitor_resources())

Pattern 3: Resource Alerts

Alert when resources are constrained.

status = await agent.check_resource_availability()

# Alert when approaching limits
if status.get('active_tasks', 0) >= status.get('max_tasks', 0) * 0.8:
    logger.warning("Approaching concurrent task limit")

if not status['available']:
    await send_alert(f"Resource limit reached: {status['reason']}")

Best Practices

1. Set Appropriate Limits

Set limits based on your infrastructure:

# Production limits
resource_limits = ResourceLimits(
    max_concurrent_tasks=10,  # Based on CPU cores
    max_tokens_per_minute=50000,  # Based on API limits
    max_memory_mb=2048  # Based on available memory
)

2. Use Token Bucket for Burst

Use token bucket for burst handling:

resource_limits = ResourceLimits(
    max_tokens_per_minute=10000,
    token_burst_size=20000  # Allow 2x burst
)

3. Monitor Before Enforcing

Start with monitoring mode:

# Development: Monitor only
resource_limits = ResourceLimits(
    enforce_limits=False
)

# Production: Enforce limits
resource_limits = ResourceLimits(
    enforce_limits=True,
    reject_on_limit=True
)

4. Handle Resource Unavailability

Handle resource unavailability gracefully:

status = await agent.check_resource_availability()

if not status['available']:
    if resource_limits.reject_on_limit:
        # Reject request
        raise ResourceLimitExceeded(status['reason'])
    else:
        # Wait for resources
        await agent.wait_for_resources(timeout=30.0)

5. Set Appropriate Timeouts

Set timeouts based on task complexity:

# Short timeout for simple tasks
resource_limits = ResourceLimits(
    task_timeout_seconds=60
)

# Long timeout for complex tasks
resource_limits = ResourceLimits(
    task_timeout_seconds=600
)

Summary

Resource management provides:

  • ✅ Rate limiting (tokens, tool calls)

  • ✅ Concurrent task limits

  • ✅ Memory constraints

  • ✅ Timeout configuration

  • ✅ Enforcement modes

  • ✅ Resource monitoring

Key Takeaways:

  • Set limits based on infrastructure

  • Use token bucket for burst handling

  • Monitor before enforcing

  • Handle unavailability gracefully

  • Set appropriate timeouts

For more details, see: