Statistical Analyzer Tool Configuration Guide
Overview
The Statistical Analyzer Tool is an advanced statistical analysis and hypothesis testing tool that provides comprehensive statistical analysis with descriptive and inferential statistics, hypothesis testing (t-test, ANOVA, chi-square), regression analysis, time series analysis, and correlation and causality analysis. It can perform hypothesis testing, conduct regression analysis, analyze time series, and perform correlation and causal analysis. The tool integrates with stats_tool for core statistical operations and supports various analysis types (descriptive, t_test, anova, chi_square, linear_regression, logistic_regression, correlation, time_series). The tool can be configured via environment variables using the STATISTICAL_ANALYZER_ prefix or through programmatic configuration when initializing the tool.
Using .env Files in Your Project
When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The Statistical Analyzer Tool reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.
Setting Up .env Files
1. Install python-dotenv:
pip install python-dotenv
2. Create a .env file in your project root:
# .env file in your project root
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
3. Load the .env file in your application:
# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv
# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()
# Now import and use aiecs tools
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool
# The tool will automatically use the environment variables
statistical_analyzer = StatisticalAnalyzerTool()
Multiple Environment Files
You can use different .env files for different environments:
import os
from dotenv import load_dotenv
# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')
if env == 'production':
load_dotenv('.env.production')
elif env == 'staging':
load_dotenv('.env.staging')
else:
load_dotenv('.env.development')
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool
statistical_analyzer = StatisticalAnalyzerTool()
Example .env.production:
# Production settings - optimized for rigorous statistical analysis
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
Example .env.development:
# Development settings - optimized for testing and debugging
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false
Best Practices for .env Files
Never commit .env files to version control - Add
.envto your.gitignore:# .gitignore .env .env.local .env.*.local .env.production .env.staging
Provide a template - Create
.env.examplewith documented dummy values:# .env.example # Statistical Analyzer Tool Configuration # Significance level for hypothesis testing STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05 # Confidence level for statistical intervals STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95 # Whether to calculate effect sizes in analyses STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
Document your variables - Add comments explaining each setting
Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports
Format values correctly:
Floats: Decimal numbers:
0.05,0.95Booleans:
trueorfalse
Configuration Options
1. Significance Level
Environment Variable: STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL
Type: Float
Default: 0.05
Description: Significance level (alpha) for hypothesis testing. This determines the threshold for rejecting the null hypothesis.
Common Values:
0.01- Very strict significance (1% level)0.05- Standard significance (5% level, default)0.10- Lenient significance (10% level)0.001- Extremely strict significance (0.1% level)
Example:
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
Significance Note: Lower values are more strict and require stronger evidence to reject the null hypothesis.
2. Confidence Level
Environment Variable: STATISTICAL_ANALYZER_CONFIDENCE_LEVEL
Type: Float
Default: 0.95
Description: Confidence level for statistical intervals and confidence intervals. This determines the probability that the true parameter lies within the calculated interval.
Common Values:
0.90- 90% confidence level0.95- 95% confidence level (default)0.99- 99% confidence level0.999- 99.9% confidence level
Example:
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
Confidence Note: Higher confidence levels provide wider intervals but greater certainty.
3. Enable Effect Size
Environment Variable: STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE
Type: Boolean
Default: True
Description: Whether to calculate effect sizes in statistical analyses. Effect sizes provide information about the practical significance of results.
Values:
true- Enable effect size calculation (default)false- Disable effect size calculation
Example:
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
Effect Size Note: Effect sizes help interpret the practical significance of statistical results.
Usage Examples
Example 1: Basic Environment Configuration
# Set basic statistical analysis parameters
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
# Run your application
python app.py
Example 2: Rigorous Analysis Configuration
# Optimized for rigorous statistical analysis
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
Example 3: Development Configuration
# Development-friendly settings
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false
Example 4: Programmatic Configuration
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool
# Initialize with custom configuration
statistical_analyzer = StatisticalAnalyzerTool(config={
'significance_level': 0.05,
'confidence_level': 0.95,
'enable_effect_size': True
})
Example 5: Mixed Configuration
Environment variables are used as defaults, but can be overridden programmatically:
# Set environment defaults
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
# Override for specific instance
statistical_analyzer = StatisticalAnalyzerTool(config={
'significance_level': 0.01, # This overrides the environment variable
'enable_effect_size': False # This overrides the environment variable
})
Configuration Priority
When the Statistical Analyzer Tool is initialized, configuration values are resolved in the following order (highest to lowest priority):
Programmatic config - Values passed to the constructor
Environment variables - Values set via
STATISTICAL_ANALYZER_*variablesDefault values - Built-in defaults as specified above
Data Type Parsing
Float Values
Floats should be provided as decimal numbers:
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
Boolean Values
Booleans should be provided as lowercase strings:
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false
Validation
Automatic Type Validation
Pydantic automatically validates configuration values:
significance_levelmust be a float between 0 and 1confidence_levelmust be a float between 0 and 1enable_effect_sizemust be a boolean
Runtime Validation
When performing statistical analyses, the tool validates:
Significance level - Level must be appropriate for the analysis type
Confidence level - Level must be reasonable for interval estimation
Data compatibility - Data must be compatible with statistical tests
Sample size - Sample size must be adequate for the analysis
Assumptions - Data must meet test assumptions
Analysis Types
The Statistical Analyzer Tool supports various analysis types:
Descriptive Statistics
Descriptive - Basic descriptive statistics (mean, median, std, etc.)
Summary statistics - Comprehensive data summaries
Distribution analysis - Distribution characteristics
Hypothesis Testing
T-test - Student’s t-test for means
ANOVA - Analysis of variance
Chi-square - Chi-square test for independence
Regression Analysis
Linear Regression - Linear regression analysis
Logistic Regression - Logistic regression analysis
Multiple Regression - Multiple variable regression
Correlation Analysis
Correlation - Correlation analysis
Partial Correlation - Partial correlation analysis
Causality - Causal analysis
Time Series Analysis
Time Series - Time series analysis
Trend Analysis - Trend detection and analysis
Seasonality - Seasonal pattern analysis
Operations Supported
The Statistical Analyzer Tool supports comprehensive statistical analysis operations:
Basic Analysis
analyze_data- Perform comprehensive statistical analysisdescriptive_statistics- Generate descriptive statisticssummary_statistics- Create data summariesdistribution_analysis- Analyze data distributionscorrelation_analysis- Perform correlation analysis
Hypothesis Testing
t_test- Perform t-testsanova_test- Perform ANOVA testschi_square_test- Perform chi-square testsmann_whitney_test- Perform Mann-Whitney U testswilcoxon_test- Perform Wilcoxon signed-rank tests
Regression Analysis
linear_regression- Perform linear regressionlogistic_regression- Perform logistic regressionmultiple_regression- Perform multiple regressionpolynomial_regression- Perform polynomial regressionridge_regression- Perform ridge regression
Advanced Analysis
time_series_analysis- Perform time series analysiscausal_analysis- Perform causal analysiseffect_size_analysis- Calculate effect sizespower_analysis- Perform statistical power analysismeta_analysis- Perform meta-analysis
Statistical Tests
normality_tests- Test for normalityhomogeneity_tests- Test for homogeneity of varianceindependence_tests- Test for independencestationarity_tests- Test for stationaritycointegration_tests- Test for cointegration
Reporting Operations
generate_report- Generate statistical analysis reportcreate_summary- Create analysis summaryexport_results- Export analysis resultsvisualize_results- Create result visualizationsinterpret_results- Provide result interpretations
Troubleshooting
Issue: Statistical test assumptions not met
Error: Test assumptions violated
Solutions:
Check data normality
Verify homogeneity of variance
Use non-parametric alternatives
Transform data if needed
Issue: Insufficient sample size
Error: Sample size too small for analysis
Solutions:
Increase sample size
Use appropriate tests for small samples
Adjust significance level
Consider effect size requirements
Issue: Multiple comparison problems
Error: Multiple testing issues
Solutions:
Apply Bonferroni correction
Use FDR correction
Adjust significance level
Use appropriate post-hoc tests
Issue: Non-normal data
Error: Data not normally distributed
Solutions:
Use non-parametric tests
Transform data
Use robust statistical methods
Check for outliers
Issue: Missing data
Error: Missing values in analysis
Solutions:
Handle missing data appropriately
Use complete case analysis
Apply imputation methods
Use maximum likelihood estimation
Issue: Correlation vs causation
Error: Confusing correlation with causation
Solutions:
Use causal analysis methods
Control for confounding variables
Apply appropriate statistical techniques
Consider experimental design
Issue: Effect size interpretation
Error: Effect size calculation or interpretation issues
Solutions:
# Enable effect size calculation
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
# Use appropriate effect size measures
statistical_analyzer.calculate_effect_size(data, measure='cohens_d')
Best Practices
Statistical Rigor
Significance Level - Choose appropriate significance level
Effect Size - Always report effect sizes
Assumptions - Check test assumptions
Multiple Testing - Account for multiple comparisons
Sample Size - Ensure adequate sample size
Error Handling
Graceful Degradation - Handle analysis failures gracefully
Validation - Validate data before analysis
Fallback Methods - Provide alternative analysis methods
Error Logging - Log errors for debugging and monitoring
User Feedback - Provide clear error messages
Security
Data Privacy - Ensure data privacy during analysis
Access Control - Control access to analysis results
Audit Logging - Log analysis activities
Data Sanitization - Sanitize sensitive data
Compliance - Ensure compliance with regulations
Resource Management
Memory Monitoring - Monitor memory usage during analysis
Processing Time - Set reasonable timeouts
Storage Optimization - Optimize result storage
Cleanup - Clean up temporary files
Resource Limits - Set appropriate resource limits
Integration
Tool Dependencies - Ensure required tools are available
API Compatibility - Maintain API compatibility
Error Propagation - Properly propagate errors
Logging Integration - Integrate with logging systems
Monitoring - Monitor tool performance and usage
Development vs Production
Development:
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false
Production:
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
Error Handling
Always wrap statistical analysis operations in try-except blocks:
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool, StatisticalAnalyzerError, AnalysisError
statistical_analyzer = StatisticalAnalyzerTool()
try:
result = statistical_analyzer.analyze_data(
data=df,
analysis_type='t_test',
significance_level=0.05
)
except AnalysisError as e:
print(f"Analysis error: {e}")
except StatisticalAnalyzerError as e:
print(f"Statistical analyzer error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Dependencies
Core Dependencies
# Install core dependencies
pip install pydantic python-dotenv
# Install data processing dependencies
pip install pandas numpy scipy
# Install statistical analysis dependencies
pip install scipy statsmodels
Optional Dependencies
# For advanced statistical analysis
pip install scikit-learn
# For time series analysis
pip install statsmodels
# For effect size calculations
pip install pingouin
# For power analysis
pip install statsmodels
Verification
# Test dependency availability
try:
import pandas
import numpy
import scipy
print("Core dependencies available")
except ImportError as e:
print(f"Missing dependency: {e}")
# Test statistical libraries availability
try:
from scipy import stats
import statsmodels
print("Statistical libraries available")
except ImportError:
print("Statistical libraries not available")
# Test advanced analysis availability
try:
import pingouin
print("Advanced statistical analysis available")
except ImportError:
print("Advanced statistical analysis not available")
# Test time series analysis availability
try:
from statsmodels.tsa import seasonal
print("Time series analysis available")
except ImportError:
print("Time series analysis not available")
Support
For issues or questions about Statistical Analyzer Tool configuration:
Check the tool source code for implementation details
Review stats tool documentation for core statistical operations
Consult the main aiecs documentation for architecture overview
Test with simple datasets first to isolate configuration vs. analysis issues
Verify data compatibility and format requirements
Check significance and confidence level settings
Ensure proper statistical test assumptions
Validate data quality and statistical requirements