AI Document Orchestrator Configuration Guide

Overview

The AI Document Orchestrator is a powerful tool that coordinates document parsing with AI analysis, manages AI provider interactions, and handles complex document processing workflows. It provides intelligent content analysis and extraction capabilities, integrating with DocumentParserTool for document parsing and various AI providers for content analysis. The tool supports multiple processing modes (summarize, extract_info, analyze, translate, classify, answer_questions, custom), multiple AI providers (OpenAI, Vertex AI, XAI, Local), and both synchronous and asynchronous processing. The tool can be configured via environment variables using the AI_DOC_ORCHESTRATOR_ prefix or through programmatic configuration when initializing the tool.

Using .env Files in Your Project

When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The AI Document Orchestrator reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.

Setting Up .env Files

1. Install python-dotenv:

pip install python-dotenv

2. Create a .env file in your project root:

# .env file in your project root
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
AI_DOC_ORCHESTRATOR_TIMEOUT=60

3. Load the .env file in your application:

# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv

# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()

# Now import and use aiecs tools
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator

# The tool will automatically use the environment variables
orchestrator = AIDocumentOrchestrator()

Multiple Environment Files

You can use different .env files for different environments:

import os
from dotenv import load_dotenv

# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')

if env == 'production':
    load_dotenv('.env.production')
elif env == 'staging':
    load_dotenv('.env.staging')
else:
    load_dotenv('.env.development')

from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator
orchestrator = AIDocumentOrchestrator()

Example .env.production:

# Production settings - optimized for performance and reliability
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120

Example .env.development:

# Development settings - optimized for testing and debugging
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30

Best Practices for .env Files

Never commit .env files to version control - Add .env to your .gitignore:

# .gitignore
.env
.env.local
.env.*.local
.env.production
.env.staging

Provide a template - Create .env.example with documented dummy values:

# .env.example
# AI Document Orchestrator Configuration

# Default AI provider to use
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai

# Maximum chunk size for AI processing
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000

# Maximum concurrent AI requests
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5

# Default temperature for AI model
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1

# Maximum tokens for AI response
AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000

# Timeout in seconds for AI operations
AI_DOC_ORCHESTRATOR_TIMEOUT=60

Document your variables - Add comments explaining each setting
Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports
Format values correctly:
- Strings: Plain text: openai, vertex_ai
- Integers: Plain numbers: 4000, 5, 2000, 60
- Floats: Decimal numbers: 0.1, 0.3

Configuration Options

1. Default AI Provider

Environment Variable: AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER

Type: String

Default: "openai"

Description: Default AI provider to use for document processing operations. This provider is used when no specific provider is specified in the processing request.

Supported Providers:

openai - OpenAI API (default)
vertex_ai - Google Vertex AI
xai - XAI (xAI)
local - Local AI model

Example:

export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai

Provider Note: Ensure the selected provider is properly configured with API keys and credentials.

2. Max Chunk Size

Environment Variable: AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE

Type: Integer

Default: 4000

Description: Maximum chunk size for AI processing. Documents larger than this size will be chunked before being sent to AI providers.

Common Values:

2000 - Small chunks (faster processing, more API calls)
4000 - Default chunks (balanced)
8000 - Large chunks (fewer API calls, more memory)
16000 - Very large chunks (maximum efficiency)

Example:

export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000

Chunking Note: Larger chunks reduce API calls but may hit token limits. Smaller chunks provide better granularity but increase costs.

3. Max Concurrent Requests

Environment Variable: AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS

Type: Integer

Default: 5

Description: Maximum number of concurrent AI requests that can be processed simultaneously. This controls the parallelism of batch processing operations.

Common Values:

2 - Conservative (low resource usage)
5 - Default (balanced)
10 - Aggressive (high throughput)
20 - Maximum (requires high resources)

Example:

export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10

Concurrency Note: Higher values increase throughput but may hit API rate limits or resource constraints.

4. Default Temperature

Environment Variable: AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE

Type: Float

Default: 0.1

Description: Default temperature setting for AI models. Controls the randomness and creativity of AI responses.

Temperature Ranges:

0.0 - Deterministic (most focused)
0.1 - Low creativity (default, good for factual tasks)
0.3 - Moderate creativity
0.7 - High creativity
1.0 - Maximum creativity

Example:

export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3

Temperature Note: Lower values are better for factual extraction, higher values for creative tasks.

5. Max Tokens

Environment Variable: AI_DOC_ORCHESTRATOR_MAX_TOKENS

Type: Integer

Default: 2000

Description: Maximum number of tokens for AI response generation. This limits the length of AI-generated content.

Common Values:

1000 - Short responses
2000 - Default responses
4000 - Long responses
8000 - Very long responses

Example:

export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000

Token Note: Higher values allow longer responses but increase costs and processing time.

6. Timeout

Environment Variable: AI_DOC_ORCHESTRATOR_TIMEOUT

Type: Integer

Default: 60

Description: Timeout in seconds for AI operations. Operations that exceed this timeout will be cancelled.

Common Values:

30 - Fast timeout (quick operations)
60 - Default timeout (balanced)
120 - Long timeout (complex operations)
300 - Very long timeout (batch operations)

Example:

export AI_DOC_ORCHESTRATOR_TIMEOUT=120

Timeout Note: Increase for complex documents or slow AI providers.

Usage Examples

Example 1: Basic Environment Configuration

# Set basic AI processing parameters
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60

# Run your application
python app.py

Example 2: High-Performance Configuration

# Optimized for high throughput
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
export AI_DOC_ORCHESTRATOR_TIMEOUT=120

Example 3: Development Configuration

# Development-friendly settings
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
export AI_DOC_ORCHESTRATOR_TIMEOUT=30

Example 4: Programmatic Configuration

from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator

# Initialize with custom configuration
orchestrator = AIDocumentOrchestrator(config={
    'default_ai_provider': 'openai',
    'max_chunk_size': 4000,
    'max_concurrent_requests': 5,
    'default_temperature': 0.1,
    'max_tokens': 2000,
    'timeout': 60
})

Example 5: Mixed Configuration

Environment variables are used as defaults, but can be overridden programmatically:

# Set environment defaults
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1

# Override for specific instance
orchestrator = AIDocumentOrchestrator(config={
    'max_chunk_size': 8000,  # This overrides the environment variable
    'default_temperature': 0.3  # This overrides the environment variable
})

Configuration Priority

When the AI Document Orchestrator is initialized, configuration values are resolved in the following order (highest to lowest priority):

Programmatic config - Values passed to the constructor
Environment variables - Values set via AI_DOC_ORCHESTRATOR_* variables
Default values - Built-in defaults as specified above

Data Type Parsing

String Values

Strings should be provided as plain text without quotes:

export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai

Integer Values

Integers should be provided as numeric strings:

export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60

Float Values

Floats should be provided as decimal strings:

export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3

Validation

Automatic Type Validation

Pydantic automatically validates configuration values:

default_ai_provider must be a valid provider string
max_chunk_size must be a positive integer
max_concurrent_requests must be a positive integer
default_temperature must be a float between 0.0 and 2.0
max_tokens must be a positive integer
timeout must be a positive integer

Runtime Validation

When processing documents, the tool validates:

AI Provider availability - Selected provider must be configured
Chunk size limits - Content must fit within chunk size
Concurrency limits - Request count must not exceed limits
Token limits - Responses must not exceed token limits
Timeout limits - Operations must complete within timeout

Processing Modes

The AI Document Orchestrator supports various processing modes:

Basic Modes

Summarize - Create concise document summaries
Extract Info - Extract specific information from documents
Analyze - Provide thorough document analysis
Translate - Translate document content
Classify - Classify documents into categories
Answer Questions - Answer questions based on document content

Advanced Modes

Custom - Use custom processing templates and prompts

AI Providers

Supported Providers

OpenAI - OpenAI API integration
Vertex AI - Google Cloud Vertex AI
XAI - xAI integration
Local - Local AI model integration

Provider Configuration

Each provider requires specific configuration:

OpenAI:

export OPENAI_API_KEY=your-api-key
export OPENAI_ORG_ID=your-org-id  # Optional

Vertex AI:

export GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
export GOOGLE_CLOUD_PROJECT=your-project-id

XAI:

export XAI_API_KEY=your-api-key

Local:

export LOCAL_MODEL_PATH=path/to/model
export LOCAL_MODEL_TYPE=llama2  # or other model type

Operations Supported

The AI Document Orchestrator supports comprehensive document processing operations:

Basic Processing

process_document - Process a single document with AI
analyze_document - Perform AI-first document analysis
batch_process_documents - Process multiple documents in batch

Async Processing

process_document_async - Async version of document processing
_batch_process_async - Async batch processing with concurrency control

Custom Processing

create_custom_processor - Create custom processing functions
get_processing_stats - Get processing statistics

Document Integration

Integration with DocumentParserTool for document parsing
Support for various document formats (PDF, DOCX, TXT, HTML, etc.)
Intelligent content chunking and preparation

AI Integration

Integration with AIECS client for AI operations
Support for multiple AI providers
Intelligent prompt templating and formatting
Response validation and post-processing

Troubleshooting

Issue: AI Provider not available

Error: AIProviderError when calling AI providers

Solutions:

# Check provider configuration
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai

# Verify API keys
export OPENAI_API_KEY=your-valid-api-key

# Test with local provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local

Issue: Document parsing fails

Error: ProcessingError during document parsing

Solutions:

Check DocumentParserTool availability
Verify document format support
Check file accessibility and permissions
Validate document content

Issue: Timeout errors

Error: Operations timeout before completion

Solutions:

# Increase timeout
export AI_DOC_ORCHESTRATOR_TIMEOUT=120

# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2

Issue: Memory issues

Error: Out of memory during processing

Solutions:

# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2

# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000

Issue: Concurrency limits

Error: Too many concurrent requests

Solutions:

# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2

# Check API rate limits
# Adjust based on provider limits

Issue: Token limit exceeded

Error: Response exceeds token limits

Solutions:

# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000

# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Use more specific prompts

Issue: Invalid AI provider

Error: Unsupported AI provider

Solutions:

# Use supported provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai

# Check provider availability
# Verify provider configuration

Best Practices

Performance Optimization

Chunk Size Management - Balance chunk size for optimal processing
Concurrency Control - Set appropriate concurrent request limits
Provider Selection - Choose providers based on task requirements
Timeout Configuration - Set reasonable timeouts for operations
Token Management - Optimize token usage for cost efficiency

Error Handling

Graceful Degradation - Handle AI provider failures gracefully
Retry Logic - Implement retry for transient failures
Fallback Strategies - Provide fallback processing methods
Error Logging - Log errors for debugging and monitoring
User Feedback - Provide clear error messages

Security

API Key Management - Secure storage of API keys
Content Validation - Validate document content before processing
Access Control - Control access to AI providers
Data Privacy - Ensure data privacy in AI processing
Audit Logging - Log processing activities for compliance

Resource Management

Memory Usage - Monitor memory consumption during processing
API Rate Limits - Respect provider rate limits
Cost Management - Monitor and control AI processing costs
Processing Time - Set reasonable timeouts
Cleanup - Clean up resources after processing

Integration

Tool Dependencies - Ensure required tools are available
API Compatibility - Maintain API compatibility
Error Propagation - Properly propagate errors
Logging Integration - Integrate with logging systems
Monitoring - Monitor tool performance and usage

Development vs Production

Development:

AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30

Production:

AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120

Error Handling

Always wrap AI processing operations in try-except blocks:

from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator, AIDocumentOrchestratorError, AIProviderError, ProcessingError

orchestrator = AIDocumentOrchestrator()

try:
    result = orchestrator.process_document(
        source="document.pdf",
        processing_mode="summarize",
        ai_provider="openai"
    )
except AIProviderError as e:
    print(f"AI provider error: {e}")
except ProcessingError as e:
    print(f"Processing error: {e}")
except AIDocumentOrchestratorError as e:
    print(f"Orchestrator error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Dependencies

Core Dependencies

# Install core dependencies
pip install pydantic python-dotenv

# Install AI provider dependencies
pip install openai google-cloud-aiplatform

# Install document processing dependencies
pip install python-docx openpyxl python-pptx

Optional Dependencies

# For advanced AI providers
pip install anthropic cohere

# For local AI models
pip install transformers torch

# For enhanced document processing
pip install PyPDF2 pdfplumber

# For async processing
pip install aiohttp asyncio

Verification

# Test dependency availability
try:
    import pydantic
    import openai
    import asyncio
    print("Core dependencies available")
except ImportError as e:
    print(f"Missing dependency: {e}")

# Test AI provider availability
try:
    import openai
    print("OpenAI available")
except ImportError:
    print("OpenAI not available")

try:
    from google.cloud import aiplatform
    print("Vertex AI available")
except ImportError:
    print("Vertex AI not available")

# Test document processing availability
try:
    from aiecs.tools.docs.document_parser_tool import DocumentParserTool
    print("DocumentParserTool available")
except ImportError:
    print("DocumentParserTool not available")

Support

For issues or questions about AI Document Orchestrator configuration:

Check the tool source code for implementation details
Review AI provider documentation for specific features
Consult the main aiecs documentation for architecture overview
Test with simple documents first to isolate configuration vs. processing issues
Monitor API rate limits and costs
Verify AI provider configuration and credentials
Ensure proper chunk size and timeout limits
Check concurrency and token limits
Validate processing mode and provider compatibility

AI Document Orchestrator Configuration Guide

Overview

Using .env Files in Your Project

Setting Up .env Files

Multiple Environment Files

Best Practices for .env Files

Configuration Options

1. Default AI Provider

2. Max Chunk Size

3. Max Concurrent Requests

4. Default Temperature

5. Max Tokens

6. Timeout

Usage Examples

Example 1: Basic Environment Configuration

Example 2: High-Performance Configuration

Example 3: Development Configuration

Example 4: Programmatic Configuration

Example 5: Mixed Configuration

Configuration Priority

Data Type Parsing

String Values

Integer Values

Float Values

Validation

Automatic Type Validation

Runtime Validation

Processing Modes

Basic Modes

Advanced Modes

AI Providers

Supported Providers

Provider Configuration

Operations Supported

Basic Processing

Async Processing

Custom Processing

Document Integration

AI Integration

Troubleshooting

Issue: AI Provider not available

Issue: Document parsing fails

Issue: Timeout errors

Issue: Memory issues

Issue: Concurrency limits

Issue: Token limit exceeded

Issue: Invalid AI provider

Best Practices

Performance Optimization

Error Handling

Security

Resource Management

Integration

Development vs Production

Error Handling

Dependencies

Core Dependencies

Optional Dependencies

Verification

Related Documentation

Support