Search Tool - Configuration Reference
Table of Contents
1. Configuration Overview
1.1 Configuration Methods
The Search Tool supports three configuration methods (in priority order):
Programmatic Configuration (Highest Priority)
tool = SearchTool(config={'google_api_key': 'key'})
Environment Variables
export SEARCH_TOOL_GOOGLE_API_KEY="key"
Global Settings (via AIECS config)
export GOOGLE_API_KEY="key"
Default Values (Lowest Priority)
1.2 Configuration Schema
class Config(BaseModel):
# API Credentials
google_api_key: Optional[str] = None
google_cse_id: Optional[str] = None
google_application_credentials: Optional[str] = None
# Performance
max_results_per_query: int = 10
cache_ttl: int = 3600
timeout: int = 30
user_agent: str = "AIECS-SearchTool/2.0"
# Rate Limiting
rate_limit_requests: int = 100
rate_limit_window: int = 86400
# Circuit Breaker
circuit_breaker_threshold: int = 5
circuit_breaker_timeout: int = 60
# Retry Logic
retry_attempts: int = 3
retry_backoff: float = 2.0
# Enhanced Features
enable_quality_analysis: bool = True
enable_intent_analysis: bool = True
enable_deduplication: bool = True
enable_context_tracking: bool = True
enable_intelligent_cache: bool = True
# Tuning
similarity_threshold: float = 0.85
max_search_history: int = 10
2. API Credentials
2.1 Google API Key
Parameter: google_api_key
Environment Variable: GOOGLE_API_KEY or SEARCH_TOOL_GOOGLE_API_KEY
Type: Optional[str]
Default: None
Required: Yes (unless using service account)
Description: Google API key for Custom Search API access.
How to Obtain:
Go to Google Cloud Console
Create or select a project
Navigate to “APIs & Services” → “Credentials”
Click “Create Credentials” → “API Key”
Copy the generated key
(Recommended) Restrict the key to “Custom Search API”
Configuration:
# Environment variable
export GOOGLE_API_KEY="AIzaSyBvOkBwv7wjHjf7hK8l9m0n1o2p3q4r5s6t7u8v9w0"
# Or in .env file
GOOGLE_API_KEY=AIzaSyBvOkBwv7wjHjf7hK8l9m0n1o2p3q4r5s6t7u8v9w0
# Programmatic
tool = SearchTool(config={
'google_api_key': 'AIzaSyBvOkBwv7wjHjf7hK8l9m0n1o2p3q4r5s6t7u8v9w0'
})
Security Best Practices:
Never commit API keys to version control
Use environment variables or secret management
Restrict API key to specific APIs and IPs
Rotate keys regularly
Monitor usage in Google Cloud Console
2.2 Google CSE ID
Parameter: google_cse_id
Environment Variable: GOOGLE_CSE_ID or SEARCH_TOOL_GOOGLE_CSE_ID
Type: Optional[str]
Default: None
Required: Yes
Description: Custom Search Engine ID that identifies your search engine configuration.
How to Obtain:
Click “Add” to create a new search engine
Configure search settings:
Sites to search (leave blank for web-wide)
Language preferences
Search features
Click “Create”
Copy the “Search engine ID”
Configuration:
# Environment variable
export GOOGLE_CSE_ID="012345678901234567890:abcdefghijk"
# Or in .env file
GOOGLE_CSE_ID=012345678901234567890:abcdefghijk
# Programmatic
tool = SearchTool(config={
'google_cse_id': '012345678901234567890:abcdefghijk'
})
2.3 Service Account Credentials
Parameter: google_application_credentials
Environment Variable: GOOGLE_APPLICATION_CREDENTIALS or SEARCH_TOOL_GOOGLE_APPLICATION_CREDENTIALS
Type: Optional[str]
Default: None
Required: No (alternative to API key)
Description: Path to Google service account JSON file for authentication.
How to Set Up:
Go to Google Cloud Console
Navigate to “IAM & Admin” → “Service Accounts”
Click “Create Service Account”
Fill in details and create
Click on the service account
Go to “Keys” tab
Click “Add Key” → “Create new key”
Choose JSON format and download
Enable “Custom Search API” for the project
Configuration:
# Environment variable
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
# Or in .env file
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Programmatic
tool = SearchTool(config={
'google_application_credentials': '/path/to/service-account.json'
})
When to Use:
Server-to-server applications
Enhanced security requirements
Centralized credential management
Multiple API access needs
3. Performance Settings
3.1 Max Results Per Query
Parameter: max_results_per_query
Environment Variable: SEARCH_TOOL_MAX_RESULTS_PER_QUERY
Type: int
Default: 10
Range: 1-100
Description: Maximum number of results to return per single search query.
Configuration:
# Environment variable
export SEARCH_TOOL_MAX_RESULTS_PER_QUERY=20
# Programmatic
tool = SearchTool(config={'max_results_per_query': 20})
Recommendations:
Development:
5- Faster testingProduction:
10- StandardData Collection:
20-50- Comprehensive resultsMaximum:
100- Google API limit
Impact:
Higher values increase API quota usage
Higher values increase response time
Higher values provide more comprehensive results
3.2 Cache TTL
Parameter: cache_ttl
Environment Variable: SEARCH_TOOL_CACHE_TTL
Type: int
Default: 3600 (1 hour)
Unit: Seconds
Description: Default cache time-to-live for search results.
Configuration:
# Environment variable
export SEARCH_TOOL_CACHE_TTL=7200 # 2 hours
# Programmatic
tool = SearchTool(config={'cache_ttl': 7200})
Recommendations:
News/Real-time:
1800(30 minutes)General:
3600(1 hour)Stable Content:
7200(2 hours)Definitions:
86400(24 hours)
Note: Intelligent cache uses intent-aware TTL that may override this default.
3.3 Timeout
Parameter: timeout
Environment Variable: SEARCH_TOOL_TIMEOUT
Type: int
Default: 30
Unit: Seconds
Description: API request timeout duration.
Configuration:
export SEARCH_TOOL_TIMEOUT=45
tool = SearchTool(config={'timeout': 45})
Recommendations:
Fast Networks:
15-20secondsStandard:
30secondsSlow Networks:
45-60secondsEnterprise:
60-120seconds
3.4 User Agent
Parameter: user_agent
Environment Variable: SEARCH_TOOL_USER_AGENT
Type: str
Default: "AIECS-SearchTool/2.0"
Description: User agent string sent with API requests.
Configuration:
export SEARCH_TOOL_USER_AGENT="MyCompanyBot/1.0 (contact@company.com)"
tool = SearchTool(config={
'user_agent': 'MyCompanyBot/1.0 (contact@company.com)'
})
Best Practices:
Include application name and version
Include contact information
Follow Google’s user agent guidelines
Be descriptive and professional
4. Rate Limiting
4.1 Rate Limit Requests
Parameter: rate_limit_requests
Environment Variable: SEARCH_TOOL_RATE_LIMIT_REQUESTS
Type: int
Default: 100
Description: Maximum number of requests allowed within the rate limit window.
Configuration:
export SEARCH_TOOL_RATE_LIMIT_REQUESTS=50
tool = SearchTool(config={'rate_limit_requests': 50})
Recommendations:
Free Tier:
100(Google’s free limit)Paid Tier:
1000-10000(based on your quota)Development:
10-20(conservative)Production: Match your API quota
Google Custom Search Quotas:
Free: 100 queries/day
Paid: Up to 10,000 queries/day
4.2 Rate Limit Window
Parameter: rate_limit_window
Environment Variable: SEARCH_TOOL_RATE_LIMIT_WINDOW
Type: int
Default: 86400 (24 hours)
Unit: Seconds
Description: Time window for rate limiting.
Configuration:
export SEARCH_TOOL_RATE_LIMIT_WINDOW=3600 # 1 hour
tool = SearchTool(config={'rate_limit_window': 3600})
Common Values:
3600- 1 hour86400- 24 hours (default, matches Google’s quota reset)604800- 7 days
Note: Should align with your API quota reset period.
4.3 Rate Limiting Algorithm
The Search Tool uses a Token Bucket algorithm:
Initial tokens: rate_limit_requests
Refill rate: rate_limit_requests / rate_limit_window per second
Each request consumes 1 token
Requests blocked when tokens < 1
Example:
rate_limit_requests = 100
rate_limit_window = 86400 (24 hours)
Refill rate = 100 / 86400 = 0.00116 tokens/second
= 1 token every ~14.4 minutes
5. Circuit Breaker
5.1 Circuit Breaker Threshold
Parameter: circuit_breaker_threshold
Environment Variable: SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD
Type: int
Default: 5
Description: Number of consecutive failures before opening the circuit breaker.
Configuration:
export SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD=3
tool = SearchTool(config={'circuit_breaker_threshold': 3})
Recommendations:
Sensitive:
3- Opens quickly on failuresStandard:
5- BalancedTolerant:
10- More permissive
Impact:
Lower values: Faster failure detection, may open on transient issues
Higher values: More tolerant, slower to detect persistent failures
5.2 Circuit Breaker Timeout
Parameter: circuit_breaker_timeout
Environment Variable: SEARCH_TOOL_CIRCUIT_BREAKER_TIMEOUT
Type: int
Default: 60
Unit: Seconds
Description: Time to wait before attempting to close the circuit breaker (half-open state).
Configuration:
export SEARCH_TOOL_CIRCUIT_BREAKER_TIMEOUT=120
tool = SearchTool(config={'circuit_breaker_timeout': 120})
Recommendations:
Quick Recovery:
30secondsStandard:
60secondsConservative:
120-300seconds
Circuit Breaker States:
CLOSED: Normal operation, requests pass through
OPEN: Failures exceeded threshold, requests blocked
HALF_OPEN: Testing recovery, limited requests allowed
6. Enhanced Features
6.1 Quality Analysis
Parameter: enable_quality_analysis
Environment Variable: SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS
Type: bool
Default: True
Description: Enable automatic result quality assessment and scoring.
Configuration:
export SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS=true
tool = SearchTool(config={'enable_quality_analysis': True})
Features When Enabled:
Domain authority scoring
Relevance scoring
Freshness scoring
Credibility level classification
Quality signals analysis
Warning detection
Performance Impact: ~10-20ms per result
6.2 Intent Analysis
Parameter: enable_intent_analysis
Environment Variable: SEARCH_TOOL_ENABLE_INTENT_ANALYSIS
Type: bool
Default: True
Description: Enable query intent detection and automatic query enhancement.
Configuration:
export SEARCH_TOOL_ENABLE_INTENT_ANALYSIS=true
tool = SearchTool(config={'enable_intent_analysis': True})
Features When Enabled:
Intent type detection (definition, how-to, comparison, etc.)
Automatic query enhancement
Parameter suggestions
Entity extraction
Optimization suggestions
Performance Impact: ~5-10ms per query
6.3 Deduplication
Parameter: enable_deduplication
Environment Variable: SEARCH_TOOL_ENABLE_DEDUPLICATION
Type: bool
Default: True
Description: Enable result deduplication to remove duplicate and similar results.
Configuration:
export SEARCH_TOOL_ENABLE_DEDUPLICATION=true
tool = SearchTool(config={'enable_deduplication': True})
Features When Enabled:
URL normalization
Content similarity detection
Configurable similarity threshold
Related Setting: similarity_threshold
6.4 Context Tracking
Parameter: enable_context_tracking
Environment Variable: SEARCH_TOOL_ENABLE_CONTEXT_TRACKING
Type: bool
Default: True
Description: Enable search context tracking and preference learning.
Configuration:
export SEARCH_TOOL_ENABLE_CONTEXT_TRACKING=true
tool = SearchTool(config={'enable_context_tracking': True})
Features When Enabled:
Search history tracking
Topic context awareness
Preference learning
Related query suggestions
Domain preference tracking
Related Setting: max_search_history
6.5 Intelligent Cache
Parameter: enable_intelligent_cache
Environment Variable: SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE
Type: bool
Default: True
Description: Enable Redis-based intelligent caching with intent-aware TTL.
Configuration:
export SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=true
tool = SearchTool(config={'enable_intelligent_cache': True})
Requirements: Redis server must be available
Features When Enabled:
Intent-aware TTL strategies
Dynamic TTL adjustment
Quality-based caching
Automatic cache invalidation
Intent-Aware TTL:
Definitions: 30 days
Tutorials: 7 days
News: 1 hour
Academic: 30 days
Products: 1 day
General: 1 hour (default)
6.6 Similarity Threshold
Parameter: similarity_threshold
Environment Variable: SEARCH_TOOL_SIMILARITY_THRESHOLD
Type: float
Default: 0.85
Range: 0.0-1.0
Description: Similarity threshold for deduplication (0=different, 1=identical).
Configuration:
export SEARCH_TOOL_SIMILARITY_THRESHOLD=0.90
tool = SearchTool(config={'similarity_threshold': 0.90})
Recommendations:
Strict:
0.90-0.95- Only very similar results removedStandard:
0.85- BalancedAggressive:
0.70-0.80- More deduplication
6.7 Max Search History
Parameter: max_search_history
Environment Variable: SEARCH_TOOL_MAX_SEARCH_HISTORY
Type: int
Default: 10
Description: Maximum number of searches to keep in context history.
Configuration:
export SEARCH_TOOL_MAX_SEARCH_HISTORY=20
tool = SearchTool(config={'max_search_history': 20})
Recommendations:
Minimal:
5- Recent context onlyStandard:
10- BalancedComprehensive:
20-50- Extended context
7. Caching Configuration
7.1 Redis Configuration
The intelligent cache requires Redis. Configure Redis connection via AIECS global settings:
# .env file
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD=optional_password
REDIS_SSL=false
7.2 Cache Key Structure
search:{query}:{num_results}:{language}:{country}:{date_restrict}:{file_type}
Example:
search:machine learning:10:en:us::
search:climate change:10:en:us:m6:pdf
7.3 Cache Invalidation
Caches are automatically invalidated based on:
TTL expiration
Intent-aware TTL strategies
Quality-based adjustments
Manual invalidation (if needed)
8. Environment Variables
8.1 Complete Environment Variable List
# API Credentials
GOOGLE_API_KEY=your_api_key
GOOGLE_CSE_ID=your_cse_id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Performance
SEARCH_TOOL_MAX_RESULTS_PER_QUERY=10
SEARCH_TOOL_CACHE_TTL=3600
SEARCH_TOOL_TIMEOUT=30
SEARCH_TOOL_USER_AGENT="AIECS-SearchTool/2.0"
# Rate Limiting
SEARCH_TOOL_RATE_LIMIT_REQUESTS=100
SEARCH_TOOL_RATE_LIMIT_WINDOW=86400
# Circuit Breaker
SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD=5
SEARCH_TOOL_CIRCUIT_BREAKER_TIMEOUT=60
# Retry Logic
SEARCH_TOOL_RETRY_ATTEMPTS=3
SEARCH_TOOL_RETRY_BACKOFF=2.0
# Enhanced Features
SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS=true
SEARCH_TOOL_ENABLE_INTENT_ANALYSIS=true
SEARCH_TOOL_ENABLE_DEDUPLICATION=true
SEARCH_TOOL_ENABLE_CONTEXT_TRACKING=true
SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=true
# Tuning
SEARCH_TOOL_SIMILARITY_THRESHOLD=0.85
SEARCH_TOOL_MAX_SEARCH_HISTORY=10
# Redis (for caching)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD=
8.2 Environment Variable Naming
Pattern: SEARCH_TOOL_{PARAMETER_NAME}
Examples:
google_api_key→SEARCH_TOOL_GOOGLE_API_KEYmax_results_per_query→SEARCH_TOOL_MAX_RESULTS_PER_QUERYenable_quality_analysis→SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS
Global Overrides (without SEARCH_TOOL_ prefix):
GOOGLE_API_KEYGOOGLE_CSE_IDGOOGLE_APPLICATION_CREDENTIALS
9. Configuration Examples
9.1 Development Configuration
# .env.development
GOOGLE_API_KEY=dev_api_key
GOOGLE_CSE_ID=dev_cse_id
# Conservative limits for testing
SEARCH_TOOL_MAX_RESULTS_PER_QUERY=5
SEARCH_TOOL_CACHE_TTL=1800
SEARCH_TOOL_RATE_LIMIT_REQUESTS=10
SEARCH_TOOL_RATE_LIMIT_WINDOW=3600
SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD=10
SEARCH_TOOL_TIMEOUT=15
# Enable all features for testing
SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS=true
SEARCH_TOOL_ENABLE_INTENT_ANALYSIS=true
SEARCH_TOOL_ENABLE_DEDUPLICATION=true
SEARCH_TOOL_ENABLE_CONTEXT_TRACKING=true
SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=true
9.2 Production Configuration
# .env.production
GOOGLE_API_KEY=prod_api_key
GOOGLE_CSE_ID=prod_cse_id
# Optimized for performance and reliability
SEARCH_TOOL_MAX_RESULTS_PER_QUERY=10
SEARCH_TOOL_CACHE_TTL=7200
SEARCH_TOOL_RATE_LIMIT_REQUESTS=100
SEARCH_TOOL_RATE_LIMIT_WINDOW=86400
SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD=3
SEARCH_TOOL_CIRCUIT_BREAKER_TIMEOUT=60
SEARCH_TOOL_RETRY_ATTEMPTS=5
SEARCH_TOOL_RETRY_BACKOFF=2.0
SEARCH_TOOL_TIMEOUT=45
# All features enabled
SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS=true
SEARCH_TOOL_ENABLE_INTENT_ANALYSIS=true
SEARCH_TOOL_ENABLE_DEDUPLICATION=true
SEARCH_TOOL_ENABLE_CONTEXT_TRACKING=true
SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=true
# Redis for caching
REDIS_HOST=redis.production.com
REDIS_PORT=6379
REDIS_PASSWORD=secure_password
REDIS_SSL=true
9.3 High-Volume Configuration
# .env.high-volume
GOOGLE_API_KEY=enterprise_api_key
GOOGLE_CSE_ID=enterprise_cse_id
# High throughput settings
SEARCH_TOOL_MAX_RESULTS_PER_QUERY=20
SEARCH_TOOL_CACHE_TTL=3600
SEARCH_TOOL_RATE_LIMIT_REQUESTS=10000
SEARCH_TOOL_RATE_LIMIT_WINDOW=86400
SEARCH_TOOL_CIRCUIT_BREAKER_THRESHOLD=10
SEARCH_TOOL_TIMEOUT=60
# Aggressive caching
SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=true
REDIS_HOST=redis-cluster.com
REDIS_PORT=6379
9.4 Minimal Configuration
# .env.minimal
GOOGLE_API_KEY=your_api_key
GOOGLE_CSE_ID=your_cse_id
# Disable enhanced features for minimal overhead
SEARCH_TOOL_ENABLE_QUALITY_ANALYSIS=false
SEARCH_TOOL_ENABLE_INTENT_ANALYSIS=false
SEARCH_TOOL_ENABLE_DEDUPLICATION=false
SEARCH_TOOL_ENABLE_CONTEXT_TRACKING=false
SEARCH_TOOL_ENABLE_INTELLIGENT_CACHE=false
9.5 Programmatic Configuration
from aiecs.tools.search_tool import SearchTool
# Development
dev_tool = SearchTool(config={
'google_api_key': 'dev_key',
'google_cse_id': 'dev_cse',
'max_results_per_query': 5,
'rate_limit_requests': 10,
'circuit_breaker_threshold': 10
})
# Production
prod_tool = SearchTool(config={
'google_api_key': 'prod_key',
'google_cse_id': 'prod_cse',
'max_results_per_query': 10,
'cache_ttl': 7200,
'rate_limit_requests': 100,
'circuit_breaker_threshold': 3,
'retry_attempts': 5,
'enable_quality_analysis': True,
'enable_intent_analysis': True,
'enable_intelligent_cache': True
})
# Custom
custom_tool = SearchTool(config={
'google_api_key': 'key',
'google_cse_id': 'cse',
'user_agent': 'MyBot/1.0 (contact@company.com)',
'similarity_threshold': 0.90,
'max_search_history': 20
})
10. Validation & Testing
10.1 Validate Configuration
from aiecs.tools import get_tool
tool = get_tool('search')
# Validate credentials
status = tool.validate_credentials()
print(f"Valid: {status['valid']}")
print(f"Method: {status['method']}")
print(f"CSE ID Present: {status['cse_id_present']}")
if not status['valid']:
print(f"Error: {status['error']}")
10.2 Test Configuration
# Test basic search
try:
results = tool.search_web("test", num_results=1)
print("✓ Search working")
except Exception as e:
print(f"✗ Search failed: {e}")
# Check quota
quota = tool.get_quota_status()
print(f"Quota: {quota['remaining_quota']}/{quota['quota_limit']}")
print(f"Circuit Breaker: {quota['circuit_breaker_state']}")
# Check features
print(f"Quality Analysis: {tool.config.enable_quality_analysis}")
print(f"Intent Analysis: {tool.config.enable_intent_analysis}")
print(f"Intelligent Cache: {tool.config.enable_intelligent_cache}")
10.3 Configuration Checklist
API credentials configured (API key or service account)
CSE ID configured
Rate limits match API quota
Circuit breaker threshold appropriate
Cache TTL appropriate for use case
Enhanced features enabled as needed
Redis configured (if using intelligent cache)
User agent set appropriately
Timeout appropriate for network conditions
Configuration validated successfully
Test search successful
Document Version: 2.0
Last Updated: 2025-10-18
Maintainer: AIECS Tools Team