Statistical Analyzer Tool Configuration Guide

Overview

The Statistical Analyzer Tool is an advanced statistical analysis and hypothesis testing tool that provides comprehensive statistical analysis with descriptive and inferential statistics, hypothesis testing (t-test, ANOVA, chi-square), regression analysis, time series analysis, and correlation and causality analysis. It can perform hypothesis testing, conduct regression analysis, analyze time series, and perform correlation and causal analysis. The tool integrates with stats_tool for core statistical operations and supports various analysis types (descriptive, t_test, anova, chi_square, linear_regression, logistic_regression, correlation, time_series). The tool can be configured via environment variables using the STATISTICAL_ANALYZER_ prefix or through programmatic configuration when initializing the tool.

Using .env Files in Your Project

When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The Statistical Analyzer Tool reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.

Setting Up .env Files

1. Install python-dotenv:

pip install python-dotenv

2. Create a .env file in your project root:

# .env file in your project root
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

3. Load the .env file in your application:

# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv

# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()

# Now import and use aiecs tools
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool

# The tool will automatically use the environment variables
statistical_analyzer = StatisticalAnalyzerTool()

Multiple Environment Files

You can use different .env files for different environments:

import os
from dotenv import load_dotenv

# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')

if env == 'production':
    load_dotenv('.env.production')
elif env == 'staging':
    load_dotenv('.env.staging')
else:
    load_dotenv('.env.development')

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool
statistical_analyzer = StatisticalAnalyzerTool()

Example .env.production:

# Production settings - optimized for rigorous statistical analysis
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Example .env.development:

# Development settings - optimized for testing and debugging
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Best Practices for .env Files

Never commit .env files to version control - Add .env to your .gitignore:

# .gitignore
.env
.env.local
.env.*.local
.env.production
.env.staging

Provide a template - Create .env.example with documented dummy values:

# .env.example
# Statistical Analyzer Tool Configuration

# Significance level for hypothesis testing
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05

# Confidence level for statistical intervals
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95

# Whether to calculate effect sizes in analyses
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Document your variables - Add comments explaining each setting
Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports
Format values correctly:
- Floats: Decimal numbers: 0.05, 0.95
- Booleans: true or false

Configuration Options

1. Significance Level

Environment Variable: STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL

Type: Float

Default: 0.05

Description: Significance level (alpha) for hypothesis testing. This determines the threshold for rejecting the null hypothesis.

Common Values:

0.01 - Very strict significance (1% level)
0.05 - Standard significance (5% level, default)
0.10 - Lenient significance (10% level)
0.001 - Extremely strict significance (0.1% level)

Example:

export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01

Significance Note: Lower values are more strict and require stronger evidence to reject the null hypothesis.

2. Confidence Level

Environment Variable: STATISTICAL_ANALYZER_CONFIDENCE_LEVEL

Type: Float

Default: 0.95

Description: Confidence level for statistical intervals and confidence intervals. This determines the probability that the true parameter lies within the calculated interval.

Common Values:

0.90 - 90% confidence level
0.95 - 95% confidence level (default)
0.99 - 99% confidence level
0.999 - 99.9% confidence level

Example:

export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99

Confidence Note: Higher confidence levels provide wider intervals but greater certainty.

3. Enable Effect Size

Environment Variable: STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE

Type: Boolean

Default: True

Description: Whether to calculate effect sizes in statistical analyses. Effect sizes provide information about the practical significance of results.

Values:

true - Enable effect size calculation (default)
false - Disable effect size calculation

Example:

export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Effect Size Note: Effect sizes help interpret the practical significance of statistical results.

Usage Examples

Example 1: Basic Environment Configuration

# Set basic statistical analysis parameters
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

# Run your application
python app.py

Example 2: Rigorous Analysis Configuration

# Optimized for rigorous statistical analysis
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Example 3: Development Configuration

# Development-friendly settings
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Example 4: Programmatic Configuration

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool

# Initialize with custom configuration
statistical_analyzer = StatisticalAnalyzerTool(config={
    'significance_level': 0.05,
    'confidence_level': 0.95,
    'enable_effect_size': True
})

Example 5: Mixed Configuration

Environment variables are used as defaults, but can be overridden programmatically:

# Set environment defaults
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

# Override for specific instance
statistical_analyzer = StatisticalAnalyzerTool(config={
    'significance_level': 0.01,  # This overrides the environment variable
    'enable_effect_size': False  # This overrides the environment variable
})

Configuration Priority

When the Statistical Analyzer Tool is initialized, configuration values are resolved in the following order (highest to lowest priority):

Programmatic config - Values passed to the constructor
Environment variables - Values set via STATISTICAL_ANALYZER_* variables
Default values - Built-in defaults as specified above

Data Type Parsing

Float Values

Floats should be provided as decimal numbers:

export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95

Boolean Values

Booleans should be provided as lowercase strings:

export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Validation

Automatic Type Validation

Pydantic automatically validates configuration values:

significance_level must be a float between 0 and 1
confidence_level must be a float between 0 and 1
enable_effect_size must be a boolean

Runtime Validation

When performing statistical analyses, the tool validates:

Significance level - Level must be appropriate for the analysis type
Confidence level - Level must be reasonable for interval estimation
Data compatibility - Data must be compatible with statistical tests
Sample size - Sample size must be adequate for the analysis
Assumptions - Data must meet test assumptions

Analysis Types

The Statistical Analyzer Tool supports various analysis types:

Descriptive Statistics

Descriptive - Basic descriptive statistics (mean, median, std, etc.)
Summary statistics - Comprehensive data summaries
Distribution analysis - Distribution characteristics

Hypothesis Testing

T-test - Student’s t-test for means
ANOVA - Analysis of variance
Chi-square - Chi-square test for independence

Regression Analysis

Linear Regression - Linear regression analysis
Logistic Regression - Logistic regression analysis
Multiple Regression - Multiple variable regression

Correlation Analysis

Correlation - Correlation analysis
Partial Correlation - Partial correlation analysis
Causality - Causal analysis

Time Series Analysis

Time Series - Time series analysis
Trend Analysis - Trend detection and analysis
Seasonality - Seasonal pattern analysis

Operations Supported

The Statistical Analyzer Tool supports comprehensive statistical analysis operations:

Basic Analysis

analyze_data - Perform comprehensive statistical analysis
descriptive_statistics - Generate descriptive statistics
summary_statistics - Create data summaries
distribution_analysis - Analyze data distributions
correlation_analysis - Perform correlation analysis

Hypothesis Testing

t_test - Perform t-tests
anova_test - Perform ANOVA tests
chi_square_test - Perform chi-square tests
mann_whitney_test - Perform Mann-Whitney U tests
wilcoxon_test - Perform Wilcoxon signed-rank tests

Regression Analysis

linear_regression - Perform linear regression
logistic_regression - Perform logistic regression
multiple_regression - Perform multiple regression
polynomial_regression - Perform polynomial regression
ridge_regression - Perform ridge regression

Advanced Analysis

time_series_analysis - Perform time series analysis
causal_analysis - Perform causal analysis
effect_size_analysis - Calculate effect sizes
power_analysis - Perform statistical power analysis
meta_analysis - Perform meta-analysis

Statistical Tests

normality_tests - Test for normality
homogeneity_tests - Test for homogeneity of variance
independence_tests - Test for independence
stationarity_tests - Test for stationarity
cointegration_tests - Test for cointegration

Reporting Operations

generate_report - Generate statistical analysis report
create_summary - Create analysis summary
export_results - Export analysis results
visualize_results - Create result visualizations
interpret_results - Provide result interpretations

Troubleshooting

Issue: Statistical test assumptions not met

Error: Test assumptions violated

Solutions:

Check data normality
Verify homogeneity of variance
Use non-parametric alternatives
Transform data if needed

Issue: Insufficient sample size

Error: Sample size too small for analysis

Solutions:

Increase sample size
Use appropriate tests for small samples
Adjust significance level
Consider effect size requirements

Issue: Multiple comparison problems

Error: Multiple testing issues

Solutions:

Apply Bonferroni correction
Use FDR correction
Adjust significance level
Use appropriate post-hoc tests

Issue: Non-normal data

Error: Data not normally distributed

Solutions:

Use non-parametric tests
Transform data
Use robust statistical methods
Check for outliers

Issue: Missing data

Error: Missing values in analysis

Solutions:

Handle missing data appropriately
Use complete case analysis
Apply imputation methods
Use maximum likelihood estimation

Issue: Correlation vs causation

Error: Confusing correlation with causation

Solutions:

Use causal analysis methods
Control for confounding variables
Apply appropriate statistical techniques
Consider experimental design

Issue: Effect size interpretation

Error: Effect size calculation or interpretation issues

Solutions:

# Enable effect size calculation
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

# Use appropriate effect size measures
statistical_analyzer.calculate_effect_size(data, measure='cohens_d')

Best Practices

Statistical Rigor

Significance Level - Choose appropriate significance level
Effect Size - Always report effect sizes
Assumptions - Check test assumptions
Multiple Testing - Account for multiple comparisons
Sample Size - Ensure adequate sample size

Error Handling

Graceful Degradation - Handle analysis failures gracefully
Validation - Validate data before analysis
Fallback Methods - Provide alternative analysis methods
Error Logging - Log errors for debugging and monitoring
User Feedback - Provide clear error messages

Security

Data Privacy - Ensure data privacy during analysis
Access Control - Control access to analysis results
Audit Logging - Log analysis activities
Data Sanitization - Sanitize sensitive data
Compliance - Ensure compliance with regulations

Resource Management

Memory Monitoring - Monitor memory usage during analysis
Processing Time - Set reasonable timeouts
Storage Optimization - Optimize result storage
Cleanup - Clean up temporary files
Resource Limits - Set appropriate resource limits

Integration

Tool Dependencies - Ensure required tools are available
API Compatibility - Maintain API compatibility
Error Propagation - Properly propagate errors
Logging Integration - Integrate with logging systems
Monitoring - Monitor tool performance and usage

Development vs Production

Development:

STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Production:

STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Error Handling

Always wrap statistical analysis operations in try-except blocks:

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool, StatisticalAnalyzerError, AnalysisError

statistical_analyzer = StatisticalAnalyzerTool()

try:
    result = statistical_analyzer.analyze_data(
        data=df,
        analysis_type='t_test',
        significance_level=0.05
    )
except AnalysisError as e:
    print(f"Analysis error: {e}")
except StatisticalAnalyzerError as e:
    print(f"Statistical analyzer error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Dependencies

Core Dependencies

# Install core dependencies
pip install pydantic python-dotenv

# Install data processing dependencies
pip install pandas numpy scipy

# Install statistical analysis dependencies
pip install scipy statsmodels

Optional Dependencies

# For advanced statistical analysis
pip install scikit-learn

# For time series analysis
pip install statsmodels

# For effect size calculations
pip install pingouin

# For power analysis
pip install statsmodels

Verification

# Test dependency availability
try:
    import pandas
    import numpy
    import scipy
    print("Core dependencies available")
except ImportError as e:
    print(f"Missing dependency: {e}")

# Test statistical libraries availability
try:
    from scipy import stats
    import statsmodels
    print("Statistical libraries available")
except ImportError:
    print("Statistical libraries not available")

# Test advanced analysis availability
try:
    import pingouin
    print("Advanced statistical analysis available")
except ImportError:
    print("Advanced statistical analysis not available")

# Test time series analysis availability
try:
    from statsmodels.tsa import seasonal
    print("Time series analysis available")
except ImportError:
    print("Time series analysis not available")

Support

For issues or questions about Statistical Analyzer Tool configuration:

Check the tool source code for implementation details
Review stats tool documentation for core statistical operations
Consult the main aiecs documentation for architecture overview
Test with simple datasets first to isolate configuration vs. analysis issues
Verify data compatibility and format requirements
Check significance and confidence level settings
Ensure proper statistical test assumptions
Validate data quality and statistical requirements

Statistical Analyzer Tool Configuration Guide

Overview

Using .env Files in Your Project

Setting Up .env Files

Multiple Environment Files

Best Practices for .env Files

Configuration Options

1. Significance Level

2. Confidence Level

3. Enable Effect Size

Usage Examples

Example 1: Basic Environment Configuration

Example 2: Rigorous Analysis Configuration

Example 3: Development Configuration

Example 4: Programmatic Configuration

Example 5: Mixed Configuration

Configuration Priority

Data Type Parsing

Float Values

Boolean Values

Validation

Automatic Type Validation

Runtime Validation

Analysis Types

Descriptive Statistics

Hypothesis Testing

Regression Analysis

Correlation Analysis

Time Series Analysis

Operations Supported

Basic Analysis

Hypothesis Testing

Regression Analysis

Advanced Analysis

Statistical Tests

Reporting Operations

Troubleshooting

Issue: Statistical test assumptions not met

Issue: Insufficient sample size

Issue: Multiple comparison problems

Issue: Non-normal data

Issue: Missing data

Issue: Correlation vs causation

Issue: Effect size interpretation

Best Practices

Statistical Rigor

Error Handling

Security

Resource Management

Integration

Development vs Production

Error Handling

Dependencies

Core Dependencies

Optional Dependencies

Verification

Related Documentation

Support