# AI Document Orchestrator Configuration Guide

## Overview

The AI Document Orchestrator is a powerful tool that coordinates document parsing with AI analysis, manages AI provider interactions, and handles complex document processing workflows. It provides intelligent content analysis and extraction capabilities, integrating with DocumentParserTool for document parsing and various AI providers for content analysis. The tool supports multiple processing modes (summarize, extract_info, analyze, translate, classify, answer_questions, custom), multiple AI providers (OpenAI, Vertex AI, XAI, Local), and both synchronous and asynchronous processing. The tool can be configured via environment variables using the `AI_DOC_ORCHESTRATOR_` prefix or through programmatic configuration when initializing the tool.

## Using .env Files in Your Project

When using aiecs as a dependency in your project, you can store configuration in a `.env` file for convenience. The AI Document Orchestrator reads from environment variables that are already loaded into the process, so you need to load the `.env` file in your application before importing aiecs tools.

### Setting Up .env Files

**1. Install python-dotenv:**

```bash
pip install python-dotenv
```

**2. Create a `.env` file in your project root:**

```bash
# .env file in your project root
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
AI_DOC_ORCHESTRATOR_TIMEOUT=60
```

**3. Load the .env file in your application:**

```python
# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv

# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()

# Now import and use aiecs tools
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator

# The tool will automatically use the environment variables
orchestrator = AIDocumentOrchestrator()
```

### Multiple Environment Files

You can use different `.env` files for different environments:

```python
import os
from dotenv import load_dotenv

# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')

if env == 'production':
    load_dotenv('.env.production')
elif env == 'staging':
    load_dotenv('.env.staging')
else:
    load_dotenv('.env.development')

from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator
orchestrator = AIDocumentOrchestrator()
```

**Example `.env.production`:**
```bash
# Production settings - optimized for performance and reliability
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120
```

**Example `.env.development`:**
```bash
# Development settings - optimized for testing and debugging
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30
```

### Best Practices for .env Files

1. **Never commit .env files to version control** - Add `.env` to your `.gitignore`:
   ```gitignore
   # .gitignore
   .env
   .env.local
   .env.*.local
   .env.production
   .env.staging
   ```

2. **Provide a template** - Create `.env.example` with documented dummy values:
   ```bash
   # .env.example
   # AI Document Orchestrator Configuration
   
   # Default AI provider to use
   AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
   
   # Maximum chunk size for AI processing
   AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
   
   # Maximum concurrent AI requests
   AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
   
   # Default temperature for AI model
   AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
   
   # Maximum tokens for AI response
   AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
   
   # Timeout in seconds for AI operations
   AI_DOC_ORCHESTRATOR_TIMEOUT=60
   ```

3. **Document your variables** - Add comments explaining each setting

4. **Use load_dotenv() early** - Call it at the very top of your entry point, before any aiecs imports

5. **Format values correctly**:
   - Strings: Plain text: `openai`, `vertex_ai`
   - Integers: Plain numbers: `4000`, `5`, `2000`, `60`
   - Floats: Decimal numbers: `0.1`, `0.3`

## Configuration Options

### 1. Default AI Provider

**Environment Variable:** `AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER`

**Type:** String

**Default:** `"openai"`

**Description:** Default AI provider to use for document processing operations. This provider is used when no specific provider is specified in the processing request.

**Supported Providers:**
- `openai` - OpenAI API (default)
- `vertex_ai` - Google Vertex AI
- `xai` - XAI (xAI)
- `local` - Local AI model

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai
```

**Provider Note:** Ensure the selected provider is properly configured with API keys and credentials.

### 2. Max Chunk Size

**Environment Variable:** `AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE`

**Type:** Integer

**Default:** `4000`

**Description:** Maximum chunk size for AI processing. Documents larger than this size will be chunked before being sent to AI providers.

**Common Values:**
- `2000` - Small chunks (faster processing, more API calls)
- `4000` - Default chunks (balanced)
- `8000` - Large chunks (fewer API calls, more memory)
- `16000` - Very large chunks (maximum efficiency)

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
```

**Chunking Note:** Larger chunks reduce API calls but may hit token limits. Smaller chunks provide better granularity but increase costs.

### 3. Max Concurrent Requests

**Environment Variable:** `AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS`

**Type:** Integer

**Default:** `5`

**Description:** Maximum number of concurrent AI requests that can be processed simultaneously. This controls the parallelism of batch processing operations.

**Common Values:**
- `2` - Conservative (low resource usage)
- `5` - Default (balanced)
- `10` - Aggressive (high throughput)
- `20` - Maximum (requires high resources)

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
```

**Concurrency Note:** Higher values increase throughput but may hit API rate limits or resource constraints.

### 4. Default Temperature

**Environment Variable:** `AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE`

**Type:** Float

**Default:** `0.1`

**Description:** Default temperature setting for AI models. Controls the randomness and creativity of AI responses.

**Temperature Ranges:**
- `0.0` - Deterministic (most focused)
- `0.1` - Low creativity (default, good for factual tasks)
- `0.3` - Moderate creativity
- `0.7` - High creativity
- `1.0` - Maximum creativity

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
```

**Temperature Note:** Lower values are better for factual extraction, higher values for creative tasks.

### 5. Max Tokens

**Environment Variable:** `AI_DOC_ORCHESTRATOR_MAX_TOKENS`

**Type:** Integer

**Default:** `2000`

**Description:** Maximum number of tokens for AI response generation. This limits the length of AI-generated content.

**Common Values:**
- `1000` - Short responses
- `2000` - Default responses
- `4000` - Long responses
- `8000` - Very long responses

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
```

**Token Note:** Higher values allow longer responses but increase costs and processing time.

### 6. Timeout

**Environment Variable:** `AI_DOC_ORCHESTRATOR_TIMEOUT`

**Type:** Integer

**Default:** `60`

**Description:** Timeout in seconds for AI operations. Operations that exceed this timeout will be cancelled.

**Common Values:**
- `30` - Fast timeout (quick operations)
- `60` - Default timeout (balanced)
- `120` - Long timeout (complex operations)
- `300` - Very long timeout (batch operations)

**Example:**
```bash
export AI_DOC_ORCHESTRATOR_TIMEOUT=120
```

**Timeout Note:** Increase for complex documents or slow AI providers.

## Usage Examples

### Example 1: Basic Environment Configuration

```bash
# Set basic AI processing parameters
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60

# Run your application
python app.py
```

### Example 2: High-Performance Configuration

```bash
# Optimized for high throughput
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
export AI_DOC_ORCHESTRATOR_TIMEOUT=120
```

### Example 3: Development Configuration

```bash
# Development-friendly settings
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
export AI_DOC_ORCHESTRATOR_TIMEOUT=30
```

### Example 4: Programmatic Configuration

```python
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator

# Initialize with custom configuration
orchestrator = AIDocumentOrchestrator(config={
    'default_ai_provider': 'openai',
    'max_chunk_size': 4000,
    'max_concurrent_requests': 5,
    'default_temperature': 0.1,
    'max_tokens': 2000,
    'timeout': 60
})
```

### Example 5: Mixed Configuration

Environment variables are used as defaults, but can be overridden programmatically:

```bash
# Set environment defaults
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
```

```python
# Override for specific instance
orchestrator = AIDocumentOrchestrator(config={
    'max_chunk_size': 8000,  # This overrides the environment variable
    'default_temperature': 0.3  # This overrides the environment variable
})
```

## Configuration Priority

When the AI Document Orchestrator is initialized, configuration values are resolved in the following order (highest to lowest priority):

1. **Programmatic config** - Values passed to the constructor
2. **Environment variables** - Values set via `AI_DOC_ORCHESTRATOR_*` variables
3. **Default values** - Built-in defaults as specified above

## Data Type Parsing

### String Values

Strings should be provided as plain text without quotes:

```bash
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=vertex_ai
```

### Integer Values

Integers should be provided as numeric strings:

```bash
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=4000
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=5
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=2000
export AI_DOC_ORCHESTRATOR_TIMEOUT=60
```

### Float Values

Floats should be provided as decimal strings:

```bash
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
export AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
```

## Validation

### Automatic Type Validation

Pydantic automatically validates configuration values:

- `default_ai_provider` must be a valid provider string
- `max_chunk_size` must be a positive integer
- `max_concurrent_requests` must be a positive integer
- `default_temperature` must be a float between 0.0 and 2.0
- `max_tokens` must be a positive integer
- `timeout` must be a positive integer

### Runtime Validation

When processing documents, the tool validates:

1. **AI Provider availability** - Selected provider must be configured
2. **Chunk size limits** - Content must fit within chunk size
3. **Concurrency limits** - Request count must not exceed limits
4. **Token limits** - Responses must not exceed token limits
5. **Timeout limits** - Operations must complete within timeout

## Processing Modes

The AI Document Orchestrator supports various processing modes:

### Basic Modes
- **Summarize** - Create concise document summaries
- **Extract Info** - Extract specific information from documents
- **Analyze** - Provide thorough document analysis
- **Translate** - Translate document content
- **Classify** - Classify documents into categories
- **Answer Questions** - Answer questions based on document content

### Advanced Modes
- **Custom** - Use custom processing templates and prompts

## AI Providers

### Supported Providers
- **OpenAI** - OpenAI API integration
- **Vertex AI** - Google Cloud Vertex AI
- **XAI** - xAI integration
- **Local** - Local AI model integration

### Provider Configuration

Each provider requires specific configuration:

**OpenAI:**
```bash
export OPENAI_API_KEY=your-api-key
export OPENAI_ORG_ID=your-org-id  # Optional
```

**Vertex AI:**
```bash
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
export GOOGLE_CLOUD_PROJECT=your-project-id
```

**XAI:**
```bash
export XAI_API_KEY=your-api-key
```

**Local:**
```bash
export LOCAL_MODEL_PATH=path/to/model
export LOCAL_MODEL_TYPE=llama2  # or other model type
```

## Operations Supported

The AI Document Orchestrator supports comprehensive document processing operations:

### Basic Processing
- `process_document` - Process a single document with AI
- `analyze_document` - Perform AI-first document analysis
- `batch_process_documents` - Process multiple documents in batch

### Async Processing
- `process_document_async` - Async version of document processing
- `_batch_process_async` - Async batch processing with concurrency control

### Custom Processing
- `create_custom_processor` - Create custom processing functions
- `get_processing_stats` - Get processing statistics

### Document Integration
- Integration with DocumentParserTool for document parsing
- Support for various document formats (PDF, DOCX, TXT, HTML, etc.)
- Intelligent content chunking and preparation

### AI Integration
- Integration with AIECS client for AI operations
- Support for multiple AI providers
- Intelligent prompt templating and formatting
- Response validation and post-processing

## Troubleshooting

### Issue: AI Provider not available

**Error:** `AIProviderError` when calling AI providers

**Solutions:**
```bash
# Check provider configuration
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai

# Verify API keys
export OPENAI_API_KEY=your-valid-api-key

# Test with local provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
```

### Issue: Document parsing fails

**Error:** `ProcessingError` during document parsing

**Solutions:**
1. Check DocumentParserTool availability
2. Verify document format support
3. Check file accessibility and permissions
4. Validate document content

### Issue: Timeout errors

**Error:** Operations timeout before completion

**Solutions:**
```bash
# Increase timeout
export AI_DOC_ORCHESTRATOR_TIMEOUT=120

# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
```

### Issue: Memory issues

**Error:** Out of memory during processing

**Solutions:**
```bash
# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2

# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
```

### Issue: Concurrency limits

**Error:** Too many concurrent requests

**Solutions:**
```bash
# Reduce concurrent requests
export AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2

# Check API rate limits
# Adjust based on provider limits
```

### Issue: Token limit exceeded

**Error:** Response exceeds token limits

**Solutions:**
```bash
# Reduce max tokens
export AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000

# Reduce chunk size
export AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000

# Use more specific prompts
```

### Issue: Invalid AI provider

**Error:** Unsupported AI provider

**Solutions:**
```bash
# Use supported provider
export AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai

# Check provider availability
# Verify provider configuration
```

## Best Practices

### Performance Optimization

1. **Chunk Size Management** - Balance chunk size for optimal processing
2. **Concurrency Control** - Set appropriate concurrent request limits
3. **Provider Selection** - Choose providers based on task requirements
4. **Timeout Configuration** - Set reasonable timeouts for operations
5. **Token Management** - Optimize token usage for cost efficiency

### Error Handling

1. **Graceful Degradation** - Handle AI provider failures gracefully
2. **Retry Logic** - Implement retry for transient failures
3. **Fallback Strategies** - Provide fallback processing methods
4. **Error Logging** - Log errors for debugging and monitoring
5. **User Feedback** - Provide clear error messages

### Security

1. **API Key Management** - Secure storage of API keys
2. **Content Validation** - Validate document content before processing
3. **Access Control** - Control access to AI providers
4. **Data Privacy** - Ensure data privacy in AI processing
5. **Audit Logging** - Log processing activities for compliance

### Resource Management

1. **Memory Usage** - Monitor memory consumption during processing
2. **API Rate Limits** - Respect provider rate limits
3. **Cost Management** - Monitor and control AI processing costs
4. **Processing Time** - Set reasonable timeouts
5. **Cleanup** - Clean up resources after processing

### Integration

1. **Tool Dependencies** - Ensure required tools are available
2. **API Compatibility** - Maintain API compatibility
3. **Error Propagation** - Properly propagate errors
4. **Logging Integration** - Integrate with logging systems
5. **Monitoring** - Monitor tool performance and usage

### Development vs Production

**Development:**
```bash
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=2000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=2
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.3
AI_DOC_ORCHESTRATOR_MAX_TOKENS=1000
AI_DOC_ORCHESTRATOR_TIMEOUT=30
```

**Production:**
```bash
AI_DOC_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DOC_ORCHESTRATOR_MAX_CHUNK_SIZE=8000
AI_DOC_ORCHESTRATOR_MAX_CONCURRENT_REQUESTS=10
AI_DOC_ORCHESTRATOR_DEFAULT_TEMPERATURE=0.1
AI_DOC_ORCHESTRATOR_MAX_TOKENS=4000
AI_DOC_ORCHESTRATOR_TIMEOUT=120
```

### Error Handling

Always wrap AI processing operations in try-except blocks:

```python
from aiecs.tools.docs.ai_document_orchestrator import AIDocumentOrchestrator, AIDocumentOrchestratorError, AIProviderError, ProcessingError

orchestrator = AIDocumentOrchestrator()

try:
    result = orchestrator.process_document(
        source="document.pdf",
        processing_mode="summarize",
        ai_provider="openai"
    )
except AIProviderError as e:
    print(f"AI provider error: {e}")
except ProcessingError as e:
    print(f"Processing error: {e}")
except AIDocumentOrchestratorError as e:
    print(f"Orchestrator error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
```

## Dependencies

### Core Dependencies

```bash
# Install core dependencies
pip install pydantic python-dotenv

# Install AI provider dependencies
pip install openai google-cloud-aiplatform

# Install document processing dependencies
pip install python-docx openpyxl python-pptx
```

### Optional Dependencies

```bash
# For advanced AI providers
pip install anthropic cohere

# For local AI models
pip install transformers torch

# For enhanced document processing
pip install PyPDF2 pdfplumber

# For async processing
pip install aiohttp asyncio
```

### Verification

```python
# Test dependency availability
try:
    import pydantic
    import openai
    import asyncio
    print("Core dependencies available")
except ImportError as e:
    print(f"Missing dependency: {e}")

# Test AI provider availability
try:
    import openai
    print("OpenAI available")
except ImportError:
    print("OpenAI not available")

try:
    from google.cloud import aiplatform
    print("Vertex AI available")
except ImportError:
    print("Vertex AI not available")

# Test document processing availability
try:
    from aiecs.tools.docs.document_parser_tool import DocumentParserTool
    print("DocumentParserTool available")
except ImportError:
    print("DocumentParserTool not available")
```

## Related Documentation

- Tool implementation details in the source code
- DocumentParserTool documentation for document parsing
- AIECS client documentation for AI operations
- Main aiecs documentation for architecture overview

## Support

For issues or questions about AI Document Orchestrator configuration:
- Check the tool source code for implementation details
- Review AI provider documentation for specific features
- Consult the main aiecs documentation for architecture overview
- Test with simple documents first to isolate configuration vs. processing issues
- Monitor API rate limits and costs
- Verify AI provider configuration and credentials
- Ensure proper chunk size and timeout limits
- Check concurrency and token limits
- Validate processing mode and provider compatibility