AI Data Analysis Orchestrator Configuration Guide
Overview
The AI Data Analysis Orchestrator is a powerful tool that coordinates multiple foundation tools to provide natural language driven analysis, automated workflow orchestration, multi-tool coordination, and comprehensive analysis execution. It supports various analysis modes (exploratory, diagnostic, predictive, prescriptive, comparative, causal) and coordinates foundation tools including data_loader, data_profiler, data_transformer, data_visualizer, statistical_analyzer, and model_trainer. The tool can be configured via environment variables using the AI_DATA_ORCHESTRATOR_ prefix or through programmatic configuration when initializing the tool.
Using .env Files in Your Project
When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The AI Data Analysis Orchestrator reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.
Setting Up .env Files
1. Install python-dotenv:
pip install python-dotenv
2. Create a .env file in your project root:
# .env file in your project root
AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=10
AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
3. Load the .env file in your application:
# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv
# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()
# Now import and use aiecs tools
from aiecs.tools.statistics.ai_data_analysis_orchestrator import AIDataAnalysisOrchestrator
# The tool will automatically use the environment variables
orchestrator = AIDataAnalysisOrchestrator()
Multiple Environment Files
You can use different .env files for different environments:
import os
from dotenv import load_dotenv
# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')
if env == 'production':
load_dotenv('.env.production')
elif env == 'staging':
load_dotenv('.env.staging')
else:
load_dotenv('.env.development')
from aiecs.tools.statistics.ai_data_analysis_orchestrator import AIDataAnalysisOrchestrator
orchestrator = AIDataAnalysisOrchestrator()
Example .env.production:
# Production settings - optimized for performance and reliability
AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
Example .env.development:
# Development settings - optimized for testing and debugging
AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=5
AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=false
AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DATA_ORCHESTRATOR_ENABLE_CACHING=false
Best Practices for .env Files
Never commit .env files to version control - Add
.envto your.gitignore:# .gitignore .env .env.local .env.*.local .env.production .env.staging
Provide a template - Create
.env.examplewith documented dummy values:# .env.example # AI Data Analysis Orchestrator Configuration # Default analysis mode to use AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory # Maximum number of analysis iterations AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=10 # Whether to enable automatic workflow generation AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true # Default AI provider to use AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai # Whether to enable result caching AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
Document your variables - Add comments explaining each setting
Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports
Format values correctly:
Strings: Plain text:
exploratory,openaiIntegers: Plain numbers:
10,20Booleans:
trueorfalse
Configuration Options
1. Default Mode
Environment Variable: AI_DATA_ORCHESTRATOR_DEFAULT_MODE
Type: String
Default: "exploratory"
Description: Default analysis mode to use for data analysis operations. This mode is used when no specific mode is specified in the analysis request.
Supported Modes:
exploratory- Exploratory data analysis (default)diagnostic- Diagnostic analysispredictive- Predictive analysisprescriptive- Prescriptive analysiscomparative- Comparative analysiscausal- Causal analysis
Example:
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=predictive
Mode Note: Choose the mode that best fits your typical analysis requirements.
2. Max Iterations
Environment Variable: AI_DATA_ORCHESTRATOR_MAX_ITERATIONS
Type: Integer
Default: 10
Description: Maximum number of analysis iterations that can be performed in a single analysis workflow. This controls the depth and complexity of analysis operations.
Common Values:
5- Quick analysis (basic insights)10- Standard analysis (default, balanced)20- Deep analysis (comprehensive insights)50- Maximum analysis (exhaustive exploration)
Example:
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
Iteration Note: Higher values provide more comprehensive analysis but may increase processing time and resource usage.
3. Enable Auto Workflow
Environment Variable: AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW
Type: Boolean
Default: True
Description: Whether to enable automatic workflow generation. When enabled, the orchestrator automatically designs analysis workflows based on the data and requirements.
Values:
true- Enable auto workflow (default)false- Disable auto workflow
Example:
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
Workflow Note: Auto workflow provides intelligent analysis design but may require more computational resources.
4. Default AI Provider
Environment Variable: AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER
Type: String
Default: "openai"
Description: Default AI provider to use for analysis operations. This provider is used when no specific provider is specified in the request.
Supported Providers:
openai- OpenAI API (default)anthropic- Anthropic Claudegoogle- Google AIlocal- Local AI model
Example:
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=anthropic
Provider Note: Ensure the selected provider is properly configured with API keys and credentials.
5. Enable Caching
Environment Variable: AI_DATA_ORCHESTRATOR_ENABLE_CACHING
Type: Boolean
Default: True
Description: Whether to enable result caching. When enabled, analysis results are cached to improve performance for similar requests.
Values:
true- Enable caching (default)false- Disable caching
Example:
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
Caching Note: Caching improves performance but requires additional memory and storage.
Usage Examples
Example 1: Basic Environment Configuration
# Set basic analysis parameters
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=10
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
# Run your application
python app.py
Example 2: High-Performance Configuration
# Optimized for comprehensive analysis
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
Example 3: Development Configuration
# Development-friendly settings
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=5
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=false
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=false
Example 4: Programmatic Configuration
from aiecs.tools.statistics.ai_data_analysis_orchestrator import AIDataAnalysisOrchestrator
# Initialize with custom configuration
orchestrator = AIDataAnalysisOrchestrator(config={
'default_mode': 'exploratory',
'max_iterations': 10,
'enable_auto_workflow': True,
'default_ai_provider': 'openai',
'enable_caching': True
})
Example 5: Mixed Configuration
Environment variables are used as defaults, but can be overridden programmatically:
# Set environment defaults
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=10
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
# Override for specific instance
orchestrator = AIDataAnalysisOrchestrator(config={
'max_iterations': 20, # This overrides the environment variable
'default_mode': 'predictive' # This overrides the environment variable
})
Configuration Priority
When the AI Data Analysis Orchestrator is initialized, configuration values are resolved in the following order (highest to lowest priority):
Programmatic config - Values passed to the constructor
Environment variables - Values set via
AI_DATA_ORCHESTRATOR_*variablesDefault values - Built-in defaults as specified above
Data Type Parsing
String Values
Strings should be provided as plain text without quotes:
export AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
Integer Values
Integers should be provided as numeric strings:
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=10
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
Boolean Values
Booleans should be provided as lowercase strings:
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=false
Validation
Automatic Type Validation
Pydantic automatically validates configuration values:
default_modemust be a valid analysis mode stringmax_iterationsmust be a positive integerenable_auto_workflowmust be a booleandefault_ai_providermust be a valid provider stringenable_cachingmust be a boolean
Runtime Validation
When performing analysis, the tool validates:
Analysis mode - Mode must be supported
Iteration limits - Analysis must not exceed max iterations
AI provider availability - Provider must be configured
Workflow constraints - Auto workflow must be properly configured
Caching requirements - Cache must be accessible if enabled
Analysis Modes
The AI Data Analysis Orchestrator supports various analysis modes:
Basic Modes
Exploratory - Initial data exploration and discovery
Diagnostic - Root cause analysis and problem diagnosis
Predictive - Future trend prediction and forecasting
Prescriptive - Actionable recommendations and solutions
Advanced Modes
Comparative - Compare different datasets or time periods
Causal - Identify cause-and-effect relationships
AI Providers
Supported Providers
OpenAI - OpenAI API integration
Anthropic - Anthropic Claude integration
Google - Google AI integration
Local - Local AI model integration
Provider Configuration
Each provider requires specific configuration:
OpenAI:
export OPENAI_API_KEY=your-api-key
export OPENAI_ORG_ID=your-org-id # Optional
Anthropic:
export ANTHROPIC_API_KEY=your-api-key
Google:
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
export GOOGLE_CLOUD_PROJECT=your-project-id
Local:
export LOCAL_MODEL_PATH=path/to/model
export LOCAL_MODEL_TYPE=llama2 # or other model type
Operations Supported
The AI Data Analysis Orchestrator supports comprehensive data analysis operations:
Basic Analysis
analyze_data- Perform comprehensive data analysisexploratory_analysis- Perform exploratory data analysisdiagnostic_analysis- Perform diagnostic analysispredictive_analysis- Perform predictive analysisprescriptive_analysis- Perform prescriptive analysis
Advanced Analysis
comparative_analysis- Compare different datasetscausal_analysis- Identify causal relationshipsworkflow_analysis- Execute custom analysis workflowsiterative_analysis- Perform iterative analysis with feedback
Workflow Management
design_workflow- Design analysis workflowsexecute_workflow- Execute analysis workflowsoptimize_workflow- Optimize workflow performancecache_workflow- Cache workflow results
Tool Coordination
coordinate_tools- Coordinate multiple analysis toolsintegrate_results- Integrate results from multiple toolsvalidate_analysis- Validate analysis resultsgenerate_report- Generate comprehensive analysis reports
Troubleshooting
Issue: AI Provider not available
Error: OrchestratorError when calling AI providers
Solutions:
# Check provider configuration
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
# Verify API keys
export OPENAI_API_KEY=your-valid-api-key
# Test with local provider
export AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
Issue: Analysis workflow fails
Error: WorkflowError during workflow execution
Solutions:
Check foundation tool availability
Verify data accessibility
Check workflow configuration
Validate analysis parameters
Issue: Max iterations exceeded
Error: Analysis exceeds maximum iterations
Solutions:
# Increase max iterations
export AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
# Or optimize analysis workflow
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
Issue: Caching problems
Error: Cache operations fail
Solutions:
# Disable caching for testing
export AI_DATA_ORCHESTRATOR_ENABLE_CACHING=false
# Check cache directory permissions
# Verify cache configuration
Issue: Auto workflow issues
Error: Auto workflow generation fails
Solutions:
# Disable auto workflow for testing
export AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=false
# Check AI provider configuration
# Verify workflow templates
Issue: Foundation tool errors
Error: Foundation tool operations fail
Solutions:
Check tool availability and dependencies
Verify data format compatibility
Check tool configuration
Validate input data
Best Practices
Performance Optimization
Iteration Management - Set appropriate max iterations
Caching Strategy - Enable caching for repeated analyses
Workflow Optimization - Use auto workflow for efficiency
Provider Selection - Choose providers based on task requirements
Resource Management - Monitor memory and CPU usage
Error Handling
Graceful Degradation - Handle tool failures gracefully
Retry Logic - Implement retry for transient failures
Fallback Strategies - Provide fallback analysis methods
Error Logging - Log errors for debugging and monitoring
User Feedback - Provide clear error messages
Security
API Key Management - Secure storage of API keys
Data Privacy - Ensure data privacy in analysis
Access Control - Control access to analysis tools
Audit Logging - Log analysis activities for compliance
Data Validation - Validate input data before analysis
Resource Management
Memory Usage - Monitor memory consumption during analysis
API Rate Limits - Respect provider rate limits
Cost Management - Monitor and control analysis costs
Processing Time - Set reasonable timeouts
Cleanup - Clean up temporary files and resources
Integration
Tool Dependencies - Ensure required tools are available
API Compatibility - Maintain API compatibility
Error Propagation - Properly propagate errors
Logging Integration - Integrate with logging systems
Monitoring - Monitor tool performance and usage
Development vs Production
Development:
AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=5
AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=false
AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=local
AI_DATA_ORCHESTRATOR_ENABLE_CACHING=false
Production:
AI_DATA_ORCHESTRATOR_DEFAULT_MODE=exploratory
AI_DATA_ORCHESTRATOR_MAX_ITERATIONS=20
AI_DATA_ORCHESTRATOR_ENABLE_AUTO_WORKFLOW=true
AI_DATA_ORCHESTRATOR_DEFAULT_AI_PROVIDER=openai
AI_DATA_ORCHESTRATOR_ENABLE_CACHING=true
Error Handling
Always wrap analysis operations in try-except blocks:
from aiecs.tools.statistics.ai_data_analysis_orchestrator import AIDataAnalysisOrchestrator, OrchestratorError, WorkflowError
orchestrator = AIDataAnalysisOrchestrator()
try:
result = orchestrator.analyze_data(
data_source="dataset.csv",
analysis_mode="exploratory",
max_iterations=10
)
except WorkflowError as e:
print(f"Workflow error: {e}")
except OrchestratorError as e:
print(f"Orchestrator error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Dependencies
Core Dependencies
# Install core dependencies
pip install pydantic python-dotenv pandas
# Install AI provider dependencies
pip install openai anthropic google-cloud-aiplatform
# Install analysis dependencies
pip install numpy scipy scikit-learn matplotlib seaborn
Optional Dependencies
# For advanced analysis
pip install plotly dash streamlit
# For machine learning
pip install xgboost lightgbm catboost
# For statistical analysis
pip install statsmodels pingouin
# For data processing
pip install dask vaex
Verification
# Test dependency availability
try:
import pydantic
import pandas
import numpy
print("Core dependencies available")
except ImportError as e:
print(f"Missing dependency: {e}")
# Test AI provider availability
try:
import openai
print("OpenAI available")
except ImportError:
print("OpenAI not available")
try:
import anthropic
print("Anthropic available")
except ImportError:
print("Anthropic not available")
# Test analysis tool availability
try:
from aiecs.tools.statistics.data_loader import DataLoader
from aiecs.tools.statistics.data_profiler import DataProfiler
print("Foundation tools available")
except ImportError:
print("Foundation tools not available")
Support
For issues or questions about AI Data Analysis Orchestrator configuration:
Check the tool source code for implementation details
Review foundation tool documentation for specific features
Consult the main aiecs documentation for architecture overview
Test with simple datasets first to isolate configuration vs. analysis issues
Monitor API rate limits and costs
Verify AI provider configuration and credentials
Ensure proper iteration and workflow limits
Check foundation tool availability and configuration
Validate analysis mode and provider compatibility