Statistical Analyzer Tool Configuration Guide

Overview

The Statistical Analyzer Tool is an advanced statistical analysis and hypothesis testing tool that provides comprehensive statistical analysis with descriptive and inferential statistics, hypothesis testing (t-test, ANOVA, chi-square), regression analysis, time series analysis, and correlation and causality analysis. It can perform hypothesis testing, conduct regression analysis, analyze time series, and perform correlation and causal analysis. The tool integrates with stats_tool for core statistical operations and supports various analysis types (descriptive, t_test, anova, chi_square, linear_regression, logistic_regression, correlation, time_series). The tool can be configured via environment variables using the STATISTICAL_ANALYZER_ prefix or through programmatic configuration when initializing the tool.

Using .env Files in Your Project

When using aiecs as a dependency in your project, you can store configuration in a .env file for convenience. The Statistical Analyzer Tool reads from environment variables that are already loaded into the process, so you need to load the .env file in your application before importing aiecs tools.

Setting Up .env Files

1. Install python-dotenv:

pip install python-dotenv

2. Create a .env file in your project root:

# .env file in your project root
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

3. Load the .env file in your application:

# main.py or app.py - at the top of your entry point
from dotenv import load_dotenv

# Load environment variables from .env file
# This must be done BEFORE importing aiecs tools
load_dotenv()

# Now import and use aiecs tools
from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool

# The tool will automatically use the environment variables
statistical_analyzer = StatisticalAnalyzerTool()

Multiple Environment Files

You can use different .env files for different environments:

import os
from dotenv import load_dotenv

# Load environment-specific configuration
env = os.getenv('APP_ENV', 'development')

if env == 'production':
    load_dotenv('.env.production')
elif env == 'staging':
    load_dotenv('.env.staging')
else:
    load_dotenv('.env.development')

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool
statistical_analyzer = StatisticalAnalyzerTool()

Example .env.production:

# Production settings - optimized for rigorous statistical analysis
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Example .env.development:

# Development settings - optimized for testing and debugging
STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Best Practices for .env Files

  1. Never commit .env files to version control - Add .env to your .gitignore:

    # .gitignore
    .env
    .env.local
    .env.*.local
    .env.production
    .env.staging
    
  2. Provide a template - Create .env.example with documented dummy values:

    # .env.example
    # Statistical Analyzer Tool Configuration
    
    # Significance level for hypothesis testing
    STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
    
    # Confidence level for statistical intervals
    STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
    
    # Whether to calculate effect sizes in analyses
    STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
    
  3. Document your variables - Add comments explaining each setting

  4. Use load_dotenv() early - Call it at the very top of your entry point, before any aiecs imports

  5. Format values correctly:

    • Floats: Decimal numbers: 0.05, 0.95

    • Booleans: true or false

Configuration Options

1. Significance Level

Environment Variable: STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL

Type: Float

Default: 0.05

Description: Significance level (alpha) for hypothesis testing. This determines the threshold for rejecting the null hypothesis.

Common Values:

  • 0.01 - Very strict significance (1% level)

  • 0.05 - Standard significance (5% level, default)

  • 0.10 - Lenient significance (10% level)

  • 0.001 - Extremely strict significance (0.1% level)

Example:

export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01

Significance Note: Lower values are more strict and require stronger evidence to reject the null hypothesis.

2. Confidence Level

Environment Variable: STATISTICAL_ANALYZER_CONFIDENCE_LEVEL

Type: Float

Default: 0.95

Description: Confidence level for statistical intervals and confidence intervals. This determines the probability that the true parameter lies within the calculated interval.

Common Values:

  • 0.90 - 90% confidence level

  • 0.95 - 95% confidence level (default)

  • 0.99 - 99% confidence level

  • 0.999 - 99.9% confidence level

Example:

export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99

Confidence Note: Higher confidence levels provide wider intervals but greater certainty.

3. Enable Effect Size

Environment Variable: STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE

Type: Boolean

Default: True

Description: Whether to calculate effect sizes in statistical analyses. Effect sizes provide information about the practical significance of results.

Values:

  • true - Enable effect size calculation (default)

  • false - Disable effect size calculation

Example:

export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Effect Size Note: Effect sizes help interpret the practical significance of statistical results.

Usage Examples

Example 1: Basic Environment Configuration

# Set basic statistical analysis parameters
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

# Run your application
python app.py

Example 2: Rigorous Analysis Configuration

# Optimized for rigorous statistical analysis
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Example 3: Development Configuration

# Development-friendly settings
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Example 4: Programmatic Configuration

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool

# Initialize with custom configuration
statistical_analyzer = StatisticalAnalyzerTool(config={
    'significance_level': 0.05,
    'confidence_level': 0.95,
    'enable_effect_size': True
})

Example 5: Mixed Configuration

Environment variables are used as defaults, but can be overridden programmatically:

# Set environment defaults
export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
# Override for specific instance
statistical_analyzer = StatisticalAnalyzerTool(config={
    'significance_level': 0.01,  # This overrides the environment variable
    'enable_effect_size': False  # This overrides the environment variable
})

Configuration Priority

When the Statistical Analyzer Tool is initialized, configuration values are resolved in the following order (highest to lowest priority):

  1. Programmatic config - Values passed to the constructor

  2. Environment variables - Values set via STATISTICAL_ANALYZER_* variables

  3. Default values - Built-in defaults as specified above

Data Type Parsing

Float Values

Floats should be provided as decimal numbers:

export STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
export STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95

Boolean Values

Booleans should be provided as lowercase strings:

export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Validation

Automatic Type Validation

Pydantic automatically validates configuration values:

  • significance_level must be a float between 0 and 1

  • confidence_level must be a float between 0 and 1

  • enable_effect_size must be a boolean

Runtime Validation

When performing statistical analyses, the tool validates:

  1. Significance level - Level must be appropriate for the analysis type

  2. Confidence level - Level must be reasonable for interval estimation

  3. Data compatibility - Data must be compatible with statistical tests

  4. Sample size - Sample size must be adequate for the analysis

  5. Assumptions - Data must meet test assumptions

Analysis Types

The Statistical Analyzer Tool supports various analysis types:

Descriptive Statistics

  • Descriptive - Basic descriptive statistics (mean, median, std, etc.)

  • Summary statistics - Comprehensive data summaries

  • Distribution analysis - Distribution characteristics

Hypothesis Testing

  • T-test - Student’s t-test for means

  • ANOVA - Analysis of variance

  • Chi-square - Chi-square test for independence

Regression Analysis

  • Linear Regression - Linear regression analysis

  • Logistic Regression - Logistic regression analysis

  • Multiple Regression - Multiple variable regression

Correlation Analysis

  • Correlation - Correlation analysis

  • Partial Correlation - Partial correlation analysis

  • Causality - Causal analysis

Time Series Analysis

  • Time Series - Time series analysis

  • Trend Analysis - Trend detection and analysis

  • Seasonality - Seasonal pattern analysis

Operations Supported

The Statistical Analyzer Tool supports comprehensive statistical analysis operations:

Basic Analysis

  • analyze_data - Perform comprehensive statistical analysis

  • descriptive_statistics - Generate descriptive statistics

  • summary_statistics - Create data summaries

  • distribution_analysis - Analyze data distributions

  • correlation_analysis - Perform correlation analysis

Hypothesis Testing

  • t_test - Perform t-tests

  • anova_test - Perform ANOVA tests

  • chi_square_test - Perform chi-square tests

  • mann_whitney_test - Perform Mann-Whitney U tests

  • wilcoxon_test - Perform Wilcoxon signed-rank tests

Regression Analysis

  • linear_regression - Perform linear regression

  • logistic_regression - Perform logistic regression

  • multiple_regression - Perform multiple regression

  • polynomial_regression - Perform polynomial regression

  • ridge_regression - Perform ridge regression

Advanced Analysis

  • time_series_analysis - Perform time series analysis

  • causal_analysis - Perform causal analysis

  • effect_size_analysis - Calculate effect sizes

  • power_analysis - Perform statistical power analysis

  • meta_analysis - Perform meta-analysis

Statistical Tests

  • normality_tests - Test for normality

  • homogeneity_tests - Test for homogeneity of variance

  • independence_tests - Test for independence

  • stationarity_tests - Test for stationarity

  • cointegration_tests - Test for cointegration

Reporting Operations

  • generate_report - Generate statistical analysis report

  • create_summary - Create analysis summary

  • export_results - Export analysis results

  • visualize_results - Create result visualizations

  • interpret_results - Provide result interpretations

Troubleshooting

Issue: Statistical test assumptions not met

Error: Test assumptions violated

Solutions:

  1. Check data normality

  2. Verify homogeneity of variance

  3. Use non-parametric alternatives

  4. Transform data if needed

Issue: Insufficient sample size

Error: Sample size too small for analysis

Solutions:

  1. Increase sample size

  2. Use appropriate tests for small samples

  3. Adjust significance level

  4. Consider effect size requirements

Issue: Multiple comparison problems

Error: Multiple testing issues

Solutions:

  1. Apply Bonferroni correction

  2. Use FDR correction

  3. Adjust significance level

  4. Use appropriate post-hoc tests

Issue: Non-normal data

Error: Data not normally distributed

Solutions:

  1. Use non-parametric tests

  2. Transform data

  3. Use robust statistical methods

  4. Check for outliers

Issue: Missing data

Error: Missing values in analysis

Solutions:

  1. Handle missing data appropriately

  2. Use complete case analysis

  3. Apply imputation methods

  4. Use maximum likelihood estimation

Issue: Correlation vs causation

Error: Confusing correlation with causation

Solutions:

  1. Use causal analysis methods

  2. Control for confounding variables

  3. Apply appropriate statistical techniques

  4. Consider experimental design

Issue: Effect size interpretation

Error: Effect size calculation or interpretation issues

Solutions:

# Enable effect size calculation
export STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

# Use appropriate effect size measures
statistical_analyzer.calculate_effect_size(data, measure='cohens_d')

Best Practices

Statistical Rigor

  1. Significance Level - Choose appropriate significance level

  2. Effect Size - Always report effect sizes

  3. Assumptions - Check test assumptions

  4. Multiple Testing - Account for multiple comparisons

  5. Sample Size - Ensure adequate sample size

Error Handling

  1. Graceful Degradation - Handle analysis failures gracefully

  2. Validation - Validate data before analysis

  3. Fallback Methods - Provide alternative analysis methods

  4. Error Logging - Log errors for debugging and monitoring

  5. User Feedback - Provide clear error messages

Security

  1. Data Privacy - Ensure data privacy during analysis

  2. Access Control - Control access to analysis results

  3. Audit Logging - Log analysis activities

  4. Data Sanitization - Sanitize sensitive data

  5. Compliance - Ensure compliance with regulations

Resource Management

  1. Memory Monitoring - Monitor memory usage during analysis

  2. Processing Time - Set reasonable timeouts

  3. Storage Optimization - Optimize result storage

  4. Cleanup - Clean up temporary files

  5. Resource Limits - Set appropriate resource limits

Integration

  1. Tool Dependencies - Ensure required tools are available

  2. API Compatibility - Maintain API compatibility

  3. Error Propagation - Properly propagate errors

  4. Logging Integration - Integrate with logging systems

  5. Monitoring - Monitor tool performance and usage

Development vs Production

Development:

STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.05
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.95
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=false

Production:

STATISTICAL_ANALYZER_SIGNIFICANCE_LEVEL=0.01
STATISTICAL_ANALYZER_CONFIDENCE_LEVEL=0.99
STATISTICAL_ANALYZER_ENABLE_EFFECT_SIZE=true

Error Handling

Always wrap statistical analysis operations in try-except blocks:

from aiecs.tools.statistics.statistical_analyzer_tool import StatisticalAnalyzerTool, StatisticalAnalyzerError, AnalysisError

statistical_analyzer = StatisticalAnalyzerTool()

try:
    result = statistical_analyzer.analyze_data(
        data=df,
        analysis_type='t_test',
        significance_level=0.05
    )
except AnalysisError as e:
    print(f"Analysis error: {e}")
except StatisticalAnalyzerError as e:
    print(f"Statistical analyzer error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Dependencies

Core Dependencies

# Install core dependencies
pip install pydantic python-dotenv

# Install data processing dependencies
pip install pandas numpy scipy

# Install statistical analysis dependencies
pip install scipy statsmodels

Optional Dependencies

# For advanced statistical analysis
pip install scikit-learn

# For time series analysis
pip install statsmodels

# For effect size calculations
pip install pingouin

# For power analysis
pip install statsmodels

Verification

# Test dependency availability
try:
    import pandas
    import numpy
    import scipy
    print("Core dependencies available")
except ImportError as e:
    print(f"Missing dependency: {e}")

# Test statistical libraries availability
try:
    from scipy import stats
    import statsmodels
    print("Statistical libraries available")
except ImportError:
    print("Statistical libraries not available")

# Test advanced analysis availability
try:
    import pingouin
    print("Advanced statistical analysis available")
except ImportError:
    print("Advanced statistical analysis not available")

# Test time series analysis availability
try:
    from statsmodels.tsa import seasonal
    print("Time series analysis available")
except ImportError:
    print("Time series analysis not available")

Support

For issues or questions about Statistical Analyzer Tool configuration:

  • Check the tool source code for implementation details

  • Review stats tool documentation for core statistical operations

  • Consult the main aiecs documentation for architecture overview

  • Test with simple datasets first to isolate configuration vs. analysis issues

  • Verify data compatibility and format requirements

  • Check significance and confidence level settings

  • Ensure proper statistical test assumptions

  • Validate data quality and statistical requirements