Session Management Best Practices

This guide covers best practices for managing conversation sessions with AIECS agents, including lifecycle management, metrics tracking, cleanup strategies, and production patterns.

Table of Contents

  1. Overview

  2. Session Lifecycle

  3. Session Identification

  4. Metrics Tracking

  5. Session Cleanup

  6. Error Handling

  7. Production Patterns

  8. Best Practices

Overview

Sessions provide:

  • Conversation Isolation: Each user gets their own conversation history

  • Lifecycle Management: Track session states (active, completed, failed, expired)

  • Metrics Tracking: Monitor request count, errors, processing time

  • Automatic Cleanup: Remove inactive sessions automatically

Session States

  • active: Session is active and receiving requests

  • completed: Session ended successfully

  • failed: Session ended due to error

  • expired: Session expired due to inactivity

Session Lifecycle

Pattern 1: Basic Lifecycle

Standard session lifecycle management.

from aiecs.domain.agent import HybridAgent, AgentConfiguration
from aiecs.domain.context import ContextEngine
from aiecs.llm import OpenAIClient

context_engine = ContextEngine()
await context_engine.initialize()

agent = HybridAgent(
    agent_id="agent-1",
    llm_client=OpenAIClient(),
    tools=["search"],
    config=AgentConfiguration(),
    context_engine=context_engine
)

await agent.initialize()

# 1. Create session (automatic on first request)
session_id = "user-123"
result = await agent.execute_task(
    {"description": "Hello"},
    {"session_id": session_id}
)
# Session created automatically if doesn't exist

# 2. Use session for multiple requests
for i in range(5):
    result = await agent.execute_task(
        {"description": f"Request {i}"},
        {"session_id": session_id}
    )
    # All requests tracked in same session

# 3. End session explicitly
await context_engine.end_session(session_id, status="completed")

Pattern 2: Explicit Session Creation

Create sessions explicitly for better control.

# Create session explicitly
session_metrics = await context_engine.create_session(
    session_id="user-123",
    user_id="user-456",
    metadata={
        "source": "web",
        "device": "mobile",
        "ip_address": "192.168.1.1"
    }
)

# Use session
agent = HybridAgent(
    agent_id="agent-1",
    llm_client=llm_client,
    tools=["search"],
    config=config,
    context_engine=context_engine
)

result = await agent.execute_task(
    {"description": "Hello"},
    {"session_id": "user-123"}
)

# End session
await context_engine.end_session("user-123", status="completed")

Pattern 3: Session Status Management

Manage session status throughout lifecycle.

# Create session
session = await context_engine.create_session(
    session_id="user-123",
    user_id="user-456"
)

# Check status
assert session.status == "active"

# Process requests
for i in range(10):
    result = await agent.execute_task(
        {"description": f"Request {i}"},
        {"session_id": "user-123"}
    )

# Check if session expired
session = await context_engine.get_session("user-123")
if session.is_expired(max_idle_seconds=1800):
    await context_engine.end_session("user-123", status="expired")

# End session explicitly
await context_engine.end_session("user-123", status="completed")

Session Identification

Pattern 1: User-Based Sessions

Use user ID as session identifier.

# Good: User-based session ID
user_id = "user-456"
session_id = f"user-{user_id}"

# Use consistent session ID
result = await agent.execute_task(
    {"description": "Hello"},
    {"session_id": session_id}
)

Pattern 2: Device-Based Sessions

Use device ID for multi-device support.

# Device-based session ID
user_id = "user-456"
device_id = "device-789"
session_id = f"user-{user_id}-device-{device_id}"

# Each device gets its own session
result = await agent.execute_task(
    {"description": "Hello"},
    {"session_id": session_id}
)

Pattern 3: Application-Based Sessions

Use application context for session ID.

# Application-based session ID
user_id = "user-456"
app_id = "web-app"
session_id = f"user-{user_id}-app-{app_id}"

# Different apps get different sessions
result = await agent.execute_task(
    {"description": "Hello"},
    {"session_id": session_id}
)

Pattern 4: Temporary Sessions

Use temporary sessions for one-off interactions.

import uuid

# Temporary session for one-off interaction
temp_session_id = f"temp-{uuid.uuid4()}"

result = await agent.execute_task(
    {"description": "One-time question"},
    {"session_id": temp_session_id}
)

# Clean up temporary session
await context_engine.end_session(temp_session_id, status="completed")

Metrics Tracking

Pattern 1: Request Tracking

Track requests automatically with agent execution.

# Agent automatically tracks requests
for i in range(10):
    result = await agent.execute_task(
        {"description": f"Request {i}"},
        {"session_id": "user-123"}
    )
    # Each request tracked automatically

# Get session metrics
session = await context_engine.get_session("user-123")
print(f"Request count: {session.request_count}")  # 10
print(f"Error count: {session.error_count}")
print(f"Total processing time: {session.total_processing_time}s")

Pattern 2: Manual Metrics Update

Update metrics manually for custom tracking.

# Update session metrics manually
await context_engine.update_session(
    session_id="user-123",
    increment_requests=True,
    add_processing_time=1.5,
    mark_error=False
)

# Track error
await context_engine.update_session(
    session_id="user-123",
    increment_requests=True,
    add_processing_time=0.1,
    mark_error=True
)

# Get metrics
session = await context_engine.get_session("user-123")
print(f"Requests: {session.request_count}")
print(f"Errors: {session.error_count}")
print(f"Error rate: {session.error_count / session.request_count * 100}%")

Pattern 3: Metrics Aggregation

Aggregate metrics across multiple sessions.

# Get all sessions for a user
all_sessions = await context_engine.list_sessions(user_id="user-456")

# Aggregate metrics
total_requests = sum(s.request_count for s in all_sessions)
total_errors = sum(s.error_count for s in all_sessions)
total_time = sum(s.total_processing_time for s in all_sessions)

print(f"Total requests: {total_requests}")
print(f"Total errors: {total_errors}")
print(f"Average time: {total_time / total_requests}s")
print(f"Error rate: {total_errors / total_requests * 100}%")

Pattern 4: Performance Monitoring

Monitor session performance metrics.

# Track processing time
import time

start = time.time()
result = await agent.execute_task(
    {"description": "Complex task"},
    {"session_id": "user-123"}
)
duration = time.time() - start

# Update metrics with processing time
await context_engine.update_session(
    session_id="user-123",
    increment_requests=True,
    add_processing_time=duration,
    mark_error=False
)

# Get performance metrics
session = await context_engine.get_session("user-123")
avg_time = session.total_processing_time / session.request_count
print(f"Average processing time: {avg_time}s")

Session Cleanup

Pattern 1: Automatic Cleanup

Use automatic cleanup for inactive sessions.

# Clean up sessions inactive for 30 minutes
cleaned_count = await context_engine.cleanup_inactive_sessions(
    max_idle_seconds=1800
)

print(f"Cleaned up {cleaned_count} inactive sessions")

Pattern 2: Scheduled Cleanup

Schedule cleanup at regular intervals.

import asyncio

async def cleanup_sessions_periodically():
    """Clean up inactive sessions every hour"""
    while True:
        await asyncio.sleep(3600)  # 1 hour
        
        cleaned_count = await context_engine.cleanup_inactive_sessions(
            max_idle_seconds=1800  # 30 minutes
        )
        
        logger.info(f"Cleaned up {cleaned_count} inactive sessions")

# Start cleanup task
asyncio.create_task(cleanup_sessions_periodically())

Pattern 3: Custom Cleanup Logic

Implement custom cleanup logic.

# Get all sessions
all_sessions = await context_engine.list_sessions()

# Custom cleanup: Remove sessions with high error rate
for session in all_sessions:
    if session.request_count > 0:
        error_rate = session.error_count / session.request_count
        if error_rate > 0.5:  # More than 50% errors
            await context_engine.end_session(
                session.session_id,
                status="failed"
            )
            logger.warning(
                f"Ended session {session.session_id} "
                f"due to high error rate: {error_rate}"
            )

Pattern 4: Cleanup by Age

Clean up sessions older than a certain age.

from datetime import datetime, timedelta

# Get all sessions
all_sessions = await context_engine.list_sessions()

# Clean up sessions older than 7 days
cutoff_date = datetime.utcnow() - timedelta(days=7)
cleaned_count = 0

for session in all_sessions:
    if session.created_at < cutoff_date:
        await context_engine.end_session(
            session.session_id,
            status="expired"
        )
        cleaned_count += 1

print(f"Cleaned up {cleaned_count} old sessions")

Error Handling

Pattern 1: Error Tracking

Track errors in sessions.

try:
    result = await agent.execute_task(
        {"description": "Task"},
        {"session_id": "user-123"}
    )
except Exception as e:
    # Track error in session
    await context_engine.update_session(
        session_id="user-123",
        increment_requests=True,
        add_processing_time=0.1,
        mark_error=True
    )
    logger.error(f"Task failed: {e}")
    raise

Pattern 2: Error Recovery

Recover from errors and continue session.

max_retries = 3
retry_count = 0

while retry_count < max_retries:
    try:
        result = await agent.execute_task(
            {"description": "Task"},
            {"session_id": "user-123"}
        )
        break  # Success
    except Exception as e:
        retry_count += 1
        await context_engine.update_session(
            session_id="user-123",
            increment_requests=True,
            mark_error=True
        )
        
        if retry_count >= max_retries:
            # End session on failure
            await context_engine.end_session(
                "user-123",
                status="failed"
            )
            raise

Pattern 3: Session Health Monitoring

Monitor session health and take action.

# Get session
session = await context_engine.get_session("user-123")

# Check health
if session.request_count > 0:
    error_rate = session.error_count / session.request_count
    
    if error_rate > 0.3:  # More than 30% errors
        logger.warning(
            f"Session {session.session_id} has high error rate: {error_rate}"
        )
        
        # Take action: End session or alert
        if error_rate > 0.5:
            await context_engine.end_session(
                session.session_id,
                status="failed"
            )

Production Patterns

Pattern 1: Session Limits

Enforce session limits to prevent resource exhaustion.

# Check session count before creating new session
active_sessions = await context_engine.list_sessions(status="active")

if len(active_sessions) >= MAX_SESSIONS:
    # Clean up oldest inactive sessions
    await context_engine.cleanup_inactive_sessions(
        max_idle_seconds=900  # 15 minutes
    )
    
    # Check again
    active_sessions = await context_engine.list_sessions(status="active")
    if len(active_sessions) >= MAX_SESSIONS:
        raise Exception("Maximum session limit reached")

# Create session
session = await context_engine.create_session(
    session_id="user-123",
    user_id="user-456"
)

Pattern 2: Session Timeout

Implement session timeout.

# Check if session expired
session = await context_engine.get_session("user-123")

if session and session.is_expired(max_idle_seconds=1800):
    # Session expired, create new session
    await context_engine.end_session("user-123", status="expired")
    session = await context_engine.create_session(
        session_id="user-123-new",
        user_id="user-456"
    )

Pattern 3: Session Pooling

Reuse sessions for better performance.

# Session pool
session_pool = {}

async def get_or_create_session(user_id: str) -> str:
    """Get existing session or create new one"""
    session_id = f"user-{user_id}"
    
    if session_id not in session_pool:
        session = await context_engine.get_session(session_id)
        
        if not session or session.is_expired(max_idle_seconds=1800):
            # Create new session
            session = await context_engine.create_session(
                session_id=session_id,
                user_id=user_id
            )
        
        session_pool[session_id] = session
    
    return session_id

# Use session pool
session_id = await get_or_create_session("user-456")
result = await agent.execute_task(
    {"description": "Task"},
    {"session_id": session_id}
)

Best Practices

1. Use Consistent Session IDs

Always use consistent session IDs:

# Good: Consistent session ID
session_id = f"user-{user_id}"

# Bad: Random session IDs
session_id = str(uuid.uuid4())  # Don't do this!

2. End Sessions Explicitly

Always end sessions when done:

try:
    # Use session
    result = await agent.execute_task(
        {"description": "Task"},
        {"session_id": session_id}
    )
finally:
    # End session
    await context_engine.end_session(session_id, status="completed")

3. Track Metrics

Track metrics for monitoring:

# Update metrics after each request
await context_engine.update_session(
    session_id=session_id,
    increment_requests=True,
    add_processing_time=duration,
    mark_error=is_error
)

4. Clean Up Inactive Sessions

Regularly clean up inactive sessions:

# Clean up sessions inactive for 30 minutes
await context_engine.cleanup_inactive_sessions(max_idle_seconds=1800)

5. Monitor Session Health

Monitor session health and take action:

session = await context_engine.get_session(session_id)
if session.error_count > 10:
    logger.warning(f"Session {session_id} has high error count")

6. Handle Errors Gracefully

Always handle errors:

try:
    result = await agent.execute_task(
        {"description": "Task"},
        {"session_id": session_id}
    )
except Exception as e:
    # Track error
    await context_engine.update_session(
        session_id=session_id,
        mark_error=True
    )
    # Handle error
    logger.error(f"Task failed: {e}")

7. Use Appropriate Timeouts

Set appropriate timeouts for sessions:

# Check expiration with appropriate timeout
if session.is_expired(max_idle_seconds=1800):  # 30 minutes
    await context_engine.end_session(session_id, status="expired")

Summary

Session management best practices:

  • ✅ Use consistent session IDs

  • ✅ Track metrics for monitoring

  • ✅ Clean up inactive sessions regularly

  • ✅ Monitor session health

  • ✅ Handle errors gracefully

  • ✅ End sessions explicitly

  • ✅ Use appropriate timeouts

For more details, see: