AI Document Orchestrator Configuration Guide
Overview
The AI Document Orchestrator is a powerful tool that coordinates document parsing with AI analysis, manages AI provider interactions, and handles complex document processing workflows. It provides intelligent content analysis and extraction capabilities, integrating with DocumentParserTool for document parsing and various AI providers for content analysis. The tool supports multiple processing modes (summarize, extract_info, analyze, translate, classify, answer_questions, custom), multiple AI providers (OpenAI, Vertex AI, XAI, Local), and both synchronous and asynchronous processing. The tool can be configured via environment variables using the AI_DOC_ORCHESTRATOR_ prefix or through programmatic configuration when initializing the tool.
Using .env Files in Your Project
When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The AI Document Orchestrator reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.
Setting Up .env Files
1. Install python-dotenv:
pip install python-dotenv
2. Create a .env file in your project root:
# .env file in your project root
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
AI_DOC_ORCHESTRATOR_TIMEOUT=60
3. Load the .env file in your application:
# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv
# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()
# Now import and use aiecs tools
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator
# The tool will automatically use the environment variables
orchestrator = AIDocumentOrchestrator()
Multiple Environment Files
You can use different .env files for different environments:
import os
from dotenv import load_dotenv
# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')
if env == 'production':
load_dotenv('.env.production')
elif env == 'staging':
load_dotenv('.env.staging')
else:
load_dotenv('.env.development')
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator
orchestrator = AIDocumentOrchestrator()
Example .env.production:
# Production settings - optimized for performance and reliability
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120
Example .env.development:
# Development settings - optimized for testing and debugging
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30
Best Practices for .env Files
Never commit .env files to version control - Add
.envto your.gitignore:# .gitignore .env .env.local .env.*.local .env.production .env.staging
Provide a template - Create
.env.examplewith documented dummy values:# .env.example # AI Document Orchestrator Configuration # Default AI provider to use AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai # Maximum chunk size for AI processing AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000 # Maximum concurrent AI requests AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5 # Default temperature for AI model AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1 # Maximum tokens for AI response AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000 # Timeout in seconds for AI operations AI_DOC_ORCHESTRATOR_TIMEOUT=60
Document your variables - Add comments explaining each setting
Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports
Format values correctly:
Strings: Plain text:
openai,vertex_aiIntegers: Plain numbers:
4000,5,2000,60Floats: Decimal numbers:
0.1,0.3
Configuration Options
1. Default AI Provider
Environment Variable: AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER
Type: String
Default: "openai"
Description: Default AI provider to use for document processing operations. This provider is used when no specific provider is specified in the processing request.
Supported Providers:
openai- OpenAI API (default)vertex_ai- Google Vertex AIxai- XAI (xAI)local- Local AI model
Example:
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai
Provider Note: Ensure the selected provider is properly configured with API keys and credentials.
2. Max Chunk Size
Environment Variable: AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE
Type: Integer
Default: 4000
Description: Maximum chunk size for AI processing. Documents larger than this size will be chunked before being sent to AI providers.
Common Values:
2000- Small chunks (faster processing, more API calls)4000- Default chunks (balanced)8000- Large chunks (fewer API calls, more memory)16000- Very large chunks (maximum efficiency)
Example:
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
Chunking Note: Larger chunks reduce API calls but may hit token limits. Smaller chunks provide better granularity but increase costs.
3. Max Concurrent Requests
Environment Variable: AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS
Type: Integer
Default: 5
Description: Maximum number of concurrent AI requests that can be processed simultaneously. This controls the parallelism of batch processing operations.
Common Values:
2- Conservative (low resource usage)5- Default (balanced)10- Aggressive (high throughput)20- Maximum (requires high resources)
Example:
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
Concurrency Note: Higher values increase throughput but may hit API rate limits or resource constraints.
4. Default Temperature
Environment Variable: AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE
Type: Float
Default: 0.1
Description: Default temperature setting for AI models. Controls the randomness and creativity of AI responses.
Temperature Ranges:
0.0- Deterministic (most focused)0.1- Low creativity (default, good for factual tasks)0.3- Moderate creativity0.7- High creativity1.0- Maximum creativity
Example:
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
Temperature Note: Lower values are better for factual extraction, higher values for creative tasks.
5. Max Tokens
Environment Variable: AI_DOC_ORCHESTRATOR_MAX_TOKENS
Type: Integer
Default: 2000
Description: Maximum number of tokens for AI response generation. This limits the length of AI-generated content.
Common Values:
1000- Short responses2000- Default responses4000- Long responses8000- Very long responses
Example:
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
Token Note: Higher values allow longer responses but increase costs and processing time.
6. Timeout
Environment Variable: AI_DOC_ORCHESTRATOR_TIMEOUT
Type: Integer
Default: 60
Description: Timeout in seconds for AI operations. Operations that exceed this timeout will be cancelled.
Common Values:
30- Fast timeout (quick operations)60- Default timeout (balanced)120- Long timeout (complex operations)300- Very long timeout (batch operations)
Example:
export AI_DOC_ORCHESTRATOR_TIMEOUT=120
Timeout Note: Increase for complex documents or slow AI providers.
Usage Examples
Example 1: Basic Environment Configuration
# Set basic AI processing parameters
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60
# Run your application
python app.py
Example 2: High-Performance Configuration
# Optimized for high throughput
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
export AI_DOC_ORCHESTRATOR_TIMEOUT=120
Example 3: Development Configuration
# Development-friendly settings
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
export AI_DOC_ORCHESTRATOR_TIMEOUT=30
Example 4: Programmatic Configuration
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator
# Initialize with custom configuration
orchestrator = AIDocumentOrchestrator(config={
'default_ai_provider': 'openai',
'max_chunk_size': 4000,
'max_concurrent_requests': 5,
'default_temperature': 0.1,
'max_tokens': 2000,
'timeout': 60
})
Example 5: Mixed Configuration
Environment variables are used as defaults, but can be overridden programmatically:
# Set environment defaults
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
# Override for specific instance
orchestrator = AIDocumentOrchestrator(config={
'max_chunk_size': 8000, # This overrides the environment variable
'default_temperature': 0.3 # This overrides the environment variable
})
Configuration Priority
When the AI Document Orchestrator is initialized, configuration values are resolved in the following order (highest to lowest priority):
Programmatic config - Values passed to the constructor
Environment variables - Values set via
AI_DOC_ORCHESTRATOR_*variablesDefault values - Built-in defaults as specified above
Data Type Parsing
String Values
Strings should be provided as plain text without quotes:
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai
Integer Values
Integers should be provided as numeric strings:
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60
Float Values
Floats should be provided as decimal strings:
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
Validation
Automatic Type Validation
Pydantic automatically validates configuration values:
default_ai_providermust be a valid provider stringmax_chunk_sizemust be a positive integermax_concurrent_requestsmust be a positive integerdefault_temperaturemust be a float between 0.0 and 2.0max_tokensmust be a positive integertimeoutmust be a positive integer
Runtime Validation
When processing documents, the tool validates:
AI Provider availability - Selected provider must be configured
Chunk size limits - Content must fit within chunk size
Concurrency limits - Request count must not exceed limits
Token limits - Responses must not exceed token limits
Timeout limits - Operations must complete within timeout
Processing Modes
The AI Document Orchestrator supports various processing modes:
Basic Modes
Summarize - Create concise document summaries
Extract Info - Extract specific information from documents
Analyze - Provide thorough document analysis
Translate - Translate document content
Classify - Classify documents into categories
Answer Questions - Answer questions based on document content
Advanced Modes
Custom - Use custom processing templates and prompts
AI Providers
Supported Providers
OpenAI - OpenAI API integration
Vertex AI - Google Cloud Vertex AI
XAI - xAI integration
Local - Local AI model integration
Provider Configuration
Each provider requires specific configuration:
OpenAI:
export OPENAI_API_KEY=your-api-key
export OPENAI_ORG_ID=your-org-id # Optional
Vertex AI:
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
export GOOGLE_CLOUD_PROJECT=your-project-id
XAI:
export XAI_API_KEY=your-api-key
Local:
export LOCAL_MODEL_PATH=path/to/model
export LOCAL_MODEL_TYPE=llama2 # or other model type
Operations Supported
The AI Document Orchestrator supports comprehensive document processing operations:
Basic Processing
process_document- Process a single document with AIanalyze_document- Perform AI-first document analysisbatch_process_documents- Process multiple documents in batch
Async Processing
process_document_async- Async version of document processing_batch_process_async- Async batch processing with concurrency control
Custom Processing
create_custom_processor- Create custom processing functionsget_processing_stats- Get processing statistics
Document Integration
Integration with DocumentParserTool for document parsing
Support for various document formats (PDF, DOCX, TXT, HTML, etc.)
Intelligent content chunking and preparation
AI Integration
Integration with AIECS client for AI operations
Support for multiple AI providers
Intelligent prompt templating and formatting
Response validation and post-processing
Troubleshooting
Issue: AI Provider not available
Error: AIProviderError when calling AI providers
Solutions:
# Check provider configuration
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
# Verify API keys
export OPENAI_API_KEY=your-valid-api-key
# Test with local provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
Issue: Document parsing fails
Error: ProcessingError during document parsing
Solutions:
Check DocumentParserTool availability
Verify document format support
Check file accessibility and permissions
Validate document content
Issue: Timeout errors
Error: Operations timeout before completion
Solutions:
# Increase timeout
export AI_DOC_ORCHESTRATOR_TIMEOUT=120
# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
Issue: Memory issues
Error: Out of memory during processing
Solutions:
# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
Issue: Concurrency limits
Error: Too many concurrent requests
Solutions:
# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
# Check API rate limits
# Adjust based on provider limits
Issue: Token limit exceeded
Error: Response exceeds token limits
Solutions:
# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
# Use more specific prompts
Issue: Invalid AI provider
Error: Unsupported AI provider
Solutions:
# Use supported provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
# Check provider availability
# Verify provider configuration
Best Practices
Performance Optimization
Chunk Size Management - Balance chunk size for optimal processing
Concurrency Control - Set appropriate concurrent request limits
Provider Selection - Choose providers based on task requirements
Timeout Configuration - Set reasonable timeouts for operations
Token Management - Optimize token usage for cost efficiency
Error Handling
Graceful Degradation - Handle AI provider failures gracefully
Retry Logic - Implement retry for transient failures
Fallback Strategies - Provide fallback processing methods
Error Logging - Log errors for debugging and monitoring
User Feedback - Provide clear error messages
Security
API Key Management - Secure storage of API keys
Content Validation - Validate document content before processing
Access Control - Control access to AI providers
Data Privacy - Ensure data privacy in AI processing
Audit Logging - Log processing activities for compliance
Resource Management
Memory Usage - Monitor memory consumption during processing
API Rate Limits - Respect provider rate limits
Cost Management - Monitor and control AI processing costs
Processing Time - Set reasonable timeouts
Cleanup - Clean up resources after processing
Integration
Tool Dependencies - Ensure required tools are available
API Compatibility - Maintain API compatibility
Error Propagation - Properly propagate errors
Logging Integration - Integrate with logging systems
Monitoring - Monitor tool performance and usage
Development vs Production
Development:
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30
Production:
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120
Error Handling
Always wrap AI processing operations in try-except blocks:
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator, AIDocumentOrchestratorError, AIProviderError, ProcessingError
orchestrator = AIDocumentOrchestrator()
try:
result = orchestrator.process_document(
source="document.pdf",
processing_mode="summarize",
ai_provider="openai"
)
except AIProviderError as e:
print(f"AI provider error: {e}")
except ProcessingError as e:
print(f"Processing error: {e}")
except AIDocumentOrchestratorError as e:
print(f"Orchestrator error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Dependencies
Core Dependencies
# Install core dependencies
pip install pydantic python-dotenv
# Install AI provider dependencies
pip install openai google-cloud-aiplatform
# Install document processing dependencies
pip install python-docx openpyxl python-pptx
Optional Dependencies
# For advanced AI providers
pip install anthropic cohere
# For local AI models
pip install transformers torch
# For enhanced document processing
pip install PyPDF2 pdfplumber
# For async processing
pip install aiohttp asyncio
Verification
# Test dependency availability
try:
import pydantic
import openai
import asyncio
print("Core dependencies available")
except ImportError as e:
print(f"Missing dependency: {e}")
# Test AI provider availability
try:
import openai
print("OpenAI available")
except ImportError:
print("OpenAI not available")
try:
from google.cloud import aiplatform
print("Vertex AI available")
except ImportError:
print("Vertex AI not available")
# Test document processing availability
try:
from aiecs.tools.docs.document_parser_tool import DocumentParserTool
print("DocumentParserTool available")
except ImportError:
print("DocumentParserTool not available")
Support
For issues or questions about AI Document Orchestrator configuration:
Check the tool source code for implementation details
Review AI provider documentation for specific features
Consult the main aiecs documentation for architecture overview
Test with simple documents first to isolate configuration vs. processing issues
Monitor API rate limits and costs
Verify AI provider configuration and credentials
Ensure proper chunk size and timeout limits
Check concurrency and token limits
Validate processing mode and provider compatibility