Search Tool - Complete Technical Documentation
Table of Contents
1. Overview
1.1 Purpose
The SearchTool is an enterprise-grade web search tool that integrates Google Custom Search API with advanced AI-agent-optimized features. It provides intelligent search capabilities with quality assessment, intent analysis, context awareness, and comprehensive reliability mechanisms.
1.2 Key Capabilities
Multi-Type Search: Web, image, news, and video search
Quality Assessment: Automatic result quality scoring and credibility analysis
Intent Analysis: Query intent detection with automatic enhancement
Context Awareness: Search history tracking and preference learning
Intelligent Caching: Redis-based caching with intent-aware TTL strategies
Reliability: Rate limiting, circuit breaker, and retry mechanisms
Deduplication: Advanced result deduplication with similarity detection
Metrics & Monitoring: Comprehensive performance tracking and health scoring
1.3 Architecture Overview
SearchTool (BaseTool)
├── Core Components
│ ├── Google Custom Search API Client
│ ├── Rate Limiter (Token Bucket)
│ ├── Circuit Breaker (3-State)
│ └── Retry Handler (Exponential Backoff)
├── Enhanced Features
│ ├── ResultQualityAnalyzer
│ ├── QueryIntentAnalyzer
│ ├── ResultDeduplicator
│ ├── SearchContext
│ ├── IntelligentCache (Redis)
│ ├── ResultSummarizer
│ └── EnhancedMetrics
└── Error Handling
└── AgentFriendlyErrorHandler
2. Architecture
2.1 Package Structure
aiecs/tools/search_tool/
├── __init__.py # Package entry point with tool registration
├── core.py # Main SearchTool class
├── constants.py # Enums, exceptions, and constants
├── schemas.py # Pydantic schemas for input validation
├── analyzers.py # Quality, intent, and summarization analyzers
├── deduplicator.py # Result deduplication logic
├── context.py # Search context management
├── cache.py # Intelligent Redis caching
├── metrics.py # Enhanced metrics collection
├── error_handler.py # Agent-friendly error formatting
├── rate_limiter.py # Rate limiting and circuit breaker
└── README.md # Package documentation
2.2 Component Interaction Flow
User Request
↓
SearchTool.search_web()
↓
[Rate Limiter Check] → RateLimitError if exceeded
↓
[Circuit Breaker Check] → CircuitBreakerOpenError if open
↓
[Intent Analysis] → Query enhancement
↓
[Cache Check] → Return cached if available
↓
[Google API Call] → With retry logic
↓
[Quality Analysis] → Score each result
↓
[Deduplication] → Remove duplicates
↓
[Context Update] → Track search history
↓
[Cache Store] → Store with intelligent TTL
↓
[Metrics Update] → Record performance
↓
Return Results
2.3 Integration Points
AIECS Base Tool: Inherits from
BaseToolfor standardized interfaceRedis: Optional integration for intelligent caching
LangChain: Full adapter support for agent integration
Google Custom Search API: Primary search backend
Metrics System: Integration with AIECS metrics infrastructure
3. Core Components
3.1 SearchTool Class
Location: aiecs/tools/search_tool/core.py
Inheritance: BaseTool
Key Attributes:
class SearchTool(BaseTool):
config: Config # Configuration object
rate_limiter: RateLimiter # Rate limiting
circuit_breaker: CircuitBreaker # Failure protection
quality_analyzer: ResultQualityAnalyzer # Quality assessment
intent_analyzer: QueryIntentAnalyzer # Intent detection
deduplicator: ResultDeduplicator # Deduplication
search_context: SearchContext # Context tracking
intelligent_cache: IntelligentCache # Redis caching
metrics: EnhancedMetrics # Performance metrics
error_handler: AgentFriendlyErrorHandler # Error formatting
Configuration Schema:
class Config(BaseModel):
# API Configuration
google_api_key: Optional[str]
google_cse_id: Optional[str]
google_application_credentials: Optional[str]
# Performance
max_results_per_query: int = 10
cache_ttl: int = 3600
timeout: int = 30
# Rate Limiting
rate_limit_requests: int = 100
rate_limit_window: int = 86400
# Circuit Breaker
circuit_breaker_threshold: int = 5
circuit_breaker_timeout: int = 60
# Retry Logic
retry_attempts: int = 3
retry_backoff: float = 2.0
# Enhanced Features
enable_quality_analysis: bool = True
enable_intent_analysis: bool = True
enable_deduplication: bool = True
enable_context_tracking: bool = True
enable_intelligent_cache: bool = True
# Tuning
similarity_threshold: float = 0.85
max_search_history: int = 10
user_agent: str = "AIECS-SearchTool/2.0"
3.2 Rate Limiter
Location: aiecs/tools/search_tool/rate_limiter.py
Algorithm: Token Bucket
Purpose: Prevents API quota exhaustion by limiting request rate
Implementation:
class RateLimiter:
def __init__(self, max_requests: int, window_seconds: int):
self.max_requests = max_requests
self.window_seconds = window_seconds
self.tokens = max_requests
self.last_refill = time.time()
def acquire(self) -> bool:
"""Attempt to acquire a token for request"""
self._refill_tokens()
if self.tokens >= 1:
self.tokens -= 1
return True
return False
def _refill_tokens(self):
"""Refill tokens based on elapsed time"""
now = time.time()
elapsed = now - self.last_refill
refill_amount = (elapsed / self.window_seconds) * self.max_requests
self.tokens = min(self.max_requests, self.tokens + refill_amount)
self.last_refill = now
Features:
Automatic token refill based on time window
Thread-safe implementation
Configurable request limits
Real-time quota tracking
3.3 Circuit Breaker
Location: aiecs/tools/search_tool/rate_limiter.py
Pattern: Three-State Circuit Breaker
States:
CLOSED: Normal operation, requests pass through
OPEN: Failures exceeded threshold, requests blocked
HALF_OPEN: Testing recovery, limited requests allowed
Implementation:
class CircuitBreaker:
def __init__(self, threshold: int, timeout: int):
self.threshold = threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
"""Execute function with circuit breaker protection"""
if self.state == CircuitState.OPEN:
if self._should_attempt_reset():
self.state = CircuitState.HALF_OPEN
else:
raise CircuitBreakerOpenError("Circuit breaker is open")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
Features:
Automatic failure detection
Configurable failure threshold
Time-based recovery attempts
Health check mechanism
4. Enhanced Features
4.1 Result Quality Analyzer
Location: aiecs/tools/search_tool/analyzers.py
Purpose: Assess search result quality using multiple factors
Quality Factors:
Domain Authority (0-1 score)
Authoritative domains (
.gov,.edu, academic sites)Major media outlets
Technical documentation sites
Community platforms
Relevance Score (0-1 score)
Query term matching in title
Query term matching in snippet
Position in search results
Freshness Score (0-1 score)
Publication date analysis
Content age assessment
Quality Signals
HTTPS usage
Content length
Metadata presence
Low-quality indicator detection
Authoritative Domains:
AUTHORITATIVE_DOMAINS = {
# Academic and research
'scholar.google.com': 0.95,
'arxiv.org': 0.95,
'ieee.org': 0.95,
'nature.com': 0.95,
# Government and official
'.gov': 0.90,
'.edu': 0.85,
# Major media
'reuters.com': 0.85,
'apnews.com': 0.85,
# Technical documentation
'docs.python.org': 0.90,
'developer.mozilla.org': 0.90,
'stackoverflow.com': 0.75,
}
Output Structure:
{
'quality_score': 0.85, # Overall quality (0-1)
'authority_score': 0.90, # Domain authority (0-1)
'relevance_score': 0.80, # Query relevance (0-1)
'freshness_score': 0.85, # Content freshness (0-1)
'credibility_level': 'high', # high/medium/low
'quality_signals': {
'domain_authority': 'high',
'has_https': True,
'has_metadata': True,
'content_length': 'adequate'
},
'warnings': [] # Quality warnings
}
4.2 Query Intent Analyzer
Location: aiecs/tools/search_tool/analyzers.py
Purpose: Detect query intent and enhance queries automatically
Intent Types:
class QueryIntentType(str, Enum):
DEFINITION = "definition" # "what is X"
HOW_TO = "how_to" # "how to X"
COMPARISON = "comparison" # "X vs Y"
FACTUAL = "factual" # "when/where/who"
RECENT_NEWS = "recent_news" # "latest X"
ACADEMIC = "academic" # "research on X"
PRODUCT = "product" # "buy X", "X review"
GENERAL = "general" # General queries
Intent Detection Patterns:
INTENT_PATTERNS = {
'definition': {
'patterns': [r'\bwhat is\b', r'\bdefine\b', r'\bmeaning of\b'],
'query_enhancement': 'definition explanation',
'suggested_params': {'num_results': 5}
},
'how_to': {
'patterns': [r'\bhow to\b', r'\bhow do\b', r'\bsteps to\b'],
'query_enhancement': 'tutorial guide step-by-step',
'suggested_params': {'num_results': 10}
},
'comparison': {
'patterns': [r'\bvs\b', r'\bversus\b', r'\bcompare\b', r'\bdifference between\b'],
'query_enhancement': 'comparison differences',
'suggested_params': {'num_results': 10}
},
'academic': {
'patterns': [r'\bresearch\b', r'\bstudy\b', r'\bpaper\b', r'\bjournal\b'],
'query_enhancement': 'research paper study',
'suggested_params': {'file_type': 'pdf', 'num_results': 10}
}
}
Query Enhancement:
Automatically adds relevant search operators
Suggests optimal search parameters
Improves result quality for specific intent types
Output Structure:
{
'intent_type': 'how_to',
'confidence': 0.95,
'original_query': 'how to build REST API',
'enhanced_query': 'how to build REST API tutorial guide step-by-step',
'suggested_params': {'num_results': 10},
'query_entities': ['REST API', 'build'],
'query_modifiers': ['how to'],
'suggestions': ['Consider adding programming language', 'Specify framework']
}
4.3 Result Deduplicator
Location: aiecs/tools/search_tool/deduplicator.py
Purpose: Remove duplicate and highly similar results
Deduplication Methods:
URL Normalization
Remove query parameters
Normalize protocols (http/https)
Handle URL variations
Content Similarity
Title similarity comparison
Snippet similarity comparison
Configurable threshold (default: 0.85)
Implementation:
class ResultDeduplicator:
def __init__(self, similarity_threshold: float = 0.85):
self.similarity_threshold = similarity_threshold
def deduplicate(self, results: List[Dict]) -> List[Dict]:
"""Remove duplicate and similar results"""
seen_urls = set()
unique_results = []
for result in results:
normalized_url = self._normalize_url(result['link'])
if normalized_url in seen_urls:
continue
if self._is_similar_to_existing(result, unique_results):
continue
seen_urls.add(normalized_url)
unique_results.append(result)
return unique_results
4.4 Search Context
Location: aiecs/tools/search_tool/context.py
Purpose: Track search history and learn user preferences
Features:
Search history management (configurable limit)
Topic context tracking
Preference learning from feedback
Related query suggestions
Domain preference tracking
Context Structure:
{
'history': [
{
'query': 'machine learning',
'timestamp': '2025-10-18T10:30:00',
'results_count': 10,
'avg_quality': 0.85
}
],
'preferences': {
'preferred_domains': ['arxiv.org', 'github.com'],
'avoided_domains': ['spam-site.com'],
'preferred_quality_level': 'high'
},
'topic_context': {
'current_topic': 'machine learning',
'related_queries': ['deep learning', 'neural networks']
}
}
4.5 Intelligent Cache
Location: aiecs/tools/search_tool/cache.py
Backend: Redis
Purpose: Reduce API calls with smart caching strategies
Intent-Aware TTL:
TTL_STRATEGIES = {
'definition': 2592000, # 30 days (stable content)
'how_to': 604800, # 7 days (tutorials)
'academic': 2592000, # 30 days (research papers)
'recent_news': 3600, # 1 hour (news)
'product': 86400, # 1 day (products)
'general': 3600 # 1 hour (default)
}
Dynamic TTL Adjustment:
Higher quality results cached longer
Fresh content cached shorter
User feedback influences TTL
Cache Key Generation:
def _generate_cache_key(query: str, params: Dict) -> str:
"""Generate unique cache key"""
key_parts = [
query.lower().strip(),
str(params.get('num_results', 10)),
params.get('language', 'en'),
params.get('country', 'us'),
params.get('date_restrict', ''),
params.get('file_type', '')
]
return f"search:{':'.join(key_parts)}"
4.6 Enhanced Metrics
Location: aiecs/tools/search_tool/metrics.py
Purpose: Comprehensive performance tracking and health monitoring
Metrics Categories:
Request Metrics
Total requests
Successful requests
Failed requests
Cached requests
Performance Metrics
Response times (P50, P95, P99)
Average response time
Slowest queries
Quality Metrics
Average quality score
High-quality result percentage
Results per query
No-result queries
Cache Metrics
Hit rate
Cache hits/misses
Cache efficiency
Error Metrics
Error rate
Errors by type
Recent errors
Query Pattern Metrics
Top query types
Top domains
Average query length
Health Score Calculation:
def calculate_health_score(self) -> float:
"""Calculate overall system health (0-1)"""
factors = {
'success_rate': 0.4, # 40% weight
'cache_hit_rate': 0.2, # 20% weight
'avg_quality': 0.2, # 20% weight
'error_rate': 0.2 # 20% weight (inverted)
}
health = (
factors['success_rate'] * self.success_rate +
factors['cache_hit_rate'] * self.cache_hit_rate +
factors['avg_quality'] * self.avg_quality_score +
factors['error_rate'] * (1 - self.error_rate)
)
return max(0.0, min(1.0, health))
5. API Reference
5.1 Search Operations
search_web()
Purpose: Perform web search with comprehensive filters
Signature:
def search_web(
query: str,
num_results: int = 10,
start_index: int = 1,
language: str = "en",
country: str = "us",
safe_search: str = "medium",
date_restrict: Optional[str] = None,
file_type: Optional[str] = None,
exclude_terms: Optional[str] = None,
auto_enhance: bool = True,
return_summary: bool = False
) -> Union[List[Dict], Dict[str, Any]]
Parameters:
query(str): Search query stringnum_results(int): Number of results (1-100)start_index(int): Pagination start (1-91)language(str): Language code (e.g., ‘en’, ‘zh-CN’)country(str): Country code (e.g., ‘us’, ‘cn’)safe_search(str): ‘off’, ‘medium’, or ‘high’date_restrict(Optional[str]): Date filter (e.g., ‘d7’, ‘m3’, ‘y1’)file_type(Optional[str]): File type filter (e.g., ‘pdf’, ‘doc’)exclude_terms(Optional[str]): Terms to excludeauto_enhance(bool): Enable query enhancementreturn_summary(bool): Return structured summary
Returns:
If
return_summary=False:List[Dict]- List of search resultsIf
return_summary=True:Dictwith ‘results’ and ‘summary’ keys
Result Structure:
{
'title': 'Result Title',
'link': 'https://example.com',
'snippet': 'Result description...',
'displayLink': 'example.com',
'formattedUrl': 'https://example.com/page',
# Enhanced fields (if quality analysis enabled)
'_quality_summary': {
'score': 0.85,
'level': 'high',
'is_authoritative': True,
'authority_score': 0.90,
'relevance_score': 0.80
},
# Metadata (if intent analysis enabled)
'_search_metadata': {
'original_query': 'machine learning',
'enhanced_query': 'machine learning tutorial guide',
'intent_type': 'how_to',
'intent_confidence': 0.95
}
}
Example:
# Basic search
results = tool.search_web("artificial intelligence", num_results=10)
# Advanced search with filters
results = tool.search_web(
query="climate change research",
num_results=10,
language="en",
date_restrict="m6", # Last 6 months
file_type="pdf",
auto_enhance=True,
return_summary=True
)
# Access results
for result in results['results']:
print(f"Title: {result['title']}")
print(f"Quality: {result['_quality_summary']['score']}")
search_images()
Purpose: Search for images with size and type filters
Signature:
def search_images(
query: str,
num_results: int = 10,
image_size: Optional[str] = None,
image_type: Optional[str] = None,
image_color_type: Optional[str] = None,
safe_search: str = "medium"
) -> List[Dict[str, Any]]
Parameters:
query(str): Image search querynum_results(int): Number of images (1-100)image_size(Optional[str]): ‘icon’, ‘small’, ‘medium’, ‘large’, ‘xlarge’, ‘xxlarge’, ‘huge’image_type(Optional[str]): ‘clipart’, ‘face’, ‘lineart’, ‘stock’, ‘photo’, ‘animated’image_color_type(Optional[str]): ‘color’, ‘gray’, ‘mono’, ‘trans’safe_search(str): ‘off’, ‘medium’, or ‘high’
Returns: List of image results with URLs and metadata
Example:
images = tool.search_images(
query="sunset beach",
num_results=10,
image_size="large",
image_type="photo",
image_color_type="color"
)
for img in images:
print(f"Image: {img['link']}")
print(f"Thumbnail: {img['image']['thumbnailLink']}")
search_news()
Purpose: Search for news articles
Signature:
def search_news(
query: str,
num_results: int = 10,
start_index: int = 1,
language: str = "en",
date_restrict: Optional[str] = None,
sort_by: str = "date"
) -> List[Dict[str, Any]]
Parameters:
query(str): News search querynum_results(int): Number of articles (1-100)start_index(int): Pagination startlanguage(str): Language codedate_restrict(Optional[str]): Date filter (e.g., ‘d7’ for last 7 days)sort_by(str): ‘date’ or ‘relevance’
Example:
news = tool.search_news(
query="technology innovation",
num_results=10,
date_restrict="d7", # Last 7 days
sort_by="date"
)
search_videos()
Purpose: Search for videos
Signature:
def search_videos(
query: str,
num_results: int = 10,
safe_search: str = "medium",
language: str = "en"
) -> List[Dict[str, Any]]
search_paginated()
Purpose: Retrieve more than 10 results (up to 100) with automatic pagination
Signature:
def search_paginated(
query: str,
total_results: int = 50,
search_type: str = "web",
**kwargs
) -> List[Dict[str, Any]]
Parameters:
query(str): Search querytotal_results(int): Total results to retrieve (1-100)search_type(str): ‘web’, ‘images’, ‘news’, or ‘videos’**kwargs: Additional parameters for specific search type
Example:
# Get 50 web results
results = tool.search_paginated(
query="machine learning",
total_results=50,
search_type="web",
language="en"
)
search_batch()
Purpose: Execute multiple queries in parallel
Signature:
async def search_batch(
queries: List[str],
search_type: str = "web",
num_results: int = 10,
**kwargs
) -> Dict[str, List[Dict]]
Parameters:
queries(List[str]): List of search queries (max 50)search_type(str): Type of searchnum_results(int): Results per query**kwargs: Additional search parameters
Returns: Dictionary mapping queries to their results
Example:
import asyncio
queries = ["AI", "ML", "DL", "NLP"]
results = asyncio.run(tool.search_batch(
queries=queries,
search_type="web",
num_results=5
))
for query, query_results in results.items():
print(f"Results for '{query}': {len(query_results)}")
5.2 Monitoring Operations
get_metrics()
Purpose: Get detailed performance metrics
Signature:
def get_metrics(self) -> Dict[str, Any]
Returns:
{
'requests': {
'total': 150,
'successful': 142,
'failed': 8,
'cached': 45
},
'performance': {
'avg_response_time': 234.5,
'p50_response_time': 200.0,
'p95_response_time': 450.0,
'p99_response_time': 800.0
},
'quality': {
'avg_results_per_query': 8.3,
'avg_quality_score': 0.78,
'high_quality_percentage': 62.5,
'no_results_count': 3
},
'cache': {
'hit_rate': 0.30,
'hits': 45,
'misses': 105
},
'errors': {
'error_rate': 0.053,
'errors_by_type': {
'QuotaExceededError': 3,
'NetworkError': 2
}
}
}
get_metrics_report()
Purpose: Get human-readable metrics report
Signature:
def get_metrics_report(self) -> str
Returns: Formatted string report
get_health_score()
Purpose: Get overall system health score
Signature:
def get_health_score(self) -> float
Returns: Health score (0-1), where >0.8 is healthy
get_quota_status()
Purpose: Get current quota and circuit breaker status
Signature:
def get_quota_status(self) -> Dict[str, Any]
Returns:
{
'remaining_quota': 85,
'quota_limit': 100,
'quota_window_seconds': 86400,
'circuit_breaker_state': 'closed',
'circuit_breaker_failures': 0,
'metrics': {
'total_requests': 15,
'successful_requests': 15,
'failed_requests': 0
}
}
validate_credentials()
Purpose: Validate Google API credentials
Signature:
def validate_credentials(self) -> Dict[str, Any]
Returns:
{
'valid': True,
'method': 'api_key', # or 'service_account'
'cse_id_present': True,
'error': None
}
5.3 Context Operations
get_search_context()
Purpose: Get current search context and history
Signature:
def get_search_context(self) -> Dict[str, Any]
Returns: Search context with history and preferences
6. Data Structures
6.1 Enumerations
# Search Types
class SearchType(str, Enum):
WEB = "web"
IMAGE = "image"
NEWS = "news"
VIDEO = "video"
# Safe Search Levels
class SafeSearch(str, Enum):
OFF = "off"
MEDIUM = "medium"
HIGH = "high"
# Query Intent Types
class QueryIntentType(str, Enum):
DEFINITION = "definition"
HOW_TO = "how_to"
COMPARISON = "comparison"
FACTUAL = "factual"
RECENT_NEWS = "recent_news"
ACADEMIC = "academic"
PRODUCT = "product"
GENERAL = "general"
# Credibility Levels
class CredibilityLevel(str, Enum):
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
# Circuit Breaker States
class CircuitState(str, Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
6.2 Exception Hierarchy
SearchToolError # Base exception
├── AuthenticationError # Invalid/missing credentials
├── QuotaExceededError # API quota exceeded
├── RateLimitError # Rate limit reached
├── CircuitBreakerOpenError # Circuit breaker open
├── SearchAPIError # Google API errors
├── ValidationError # Input validation errors
└── CacheError # Cache-related errors
7. Error Handling
7.1 Agent-Friendly Error Handler
Location: aiecs/tools/search_tool/error_handler.py
Purpose: Format errors in a way that AI agents can understand and act upon
Error Structure:
{
'error_type': 'QuotaExceededError',
'message': 'API quota exceeded for today',
'severity': 'high',
'is_retryable': False,
'suggested_actions': [
'Wait 24 hours for quota reset',
'Upgrade to paid tier',
'Use cached results if available'
],
'alternative_approaches': [
'Use alternative search source',
'Reduce search frequency'
],
'recovery_time_estimate': '24 hours',
'context': {
'current_quota': 100,
'quota_limit': 100,
'reset_time': '2025-10-19T00:00:00Z'
}
}
7.2 Error Handling Best Practices
from aiecs.tools.search_tool import (
SearchTool,
RateLimitError,
QuotaExceededError,
CircuitBreakerOpenError,
AuthenticationError
)
tool = SearchTool()
try:
results = tool.search_web("query")
except RateLimitError as e:
# Rate limit exceeded - wait and retry
error_info = tool.error_handler.format_error(e)
wait_time = error_info.get('recovery_time_estimate', 60)
time.sleep(wait_time)
# Retry...
except QuotaExceededError as e:
# Quota exceeded - use fallback
error_info = tool.error_handler.format_error(e)
# Use cached results or alternative source
except CircuitBreakerOpenError as e:
# Circuit breaker open - API is down
error_info = tool.error_handler.format_error(e)
# Wait for recovery or use fallback
except AuthenticationError as e:
# Invalid credentials - fix configuration
error_info = tool.error_handler.format_error(e)
# Check API key and CSE ID
except Exception as e:
# Unexpected error
logger.error(f"Unexpected error: {e}")
8. Performance & Optimization
8.1 Performance Benchmarks
Average Response Times:
With cache hit: ~50ms
Without cache (API call): ~200-500ms
Quality analysis overhead: ~10-20ms per result
Intent detection: ~5-10ms per query
Cache Performance:
Typical hit rate: 30-50%
API call reduction: 30-50%
Storage overhead: ~5KB per cached query
8.2 Optimization Strategies
Enable All Caching
config = {
'enable_intelligent_cache': True,
'cache_ttl': 3600 # Adjust based on content freshness needs
}
Use Batch Operations
# Instead of multiple individual calls
results = await tool.search_batch(queries=['q1', 'q2', 'q3'])
Optimize Result Count
# Only request what you need
results = tool.search_web(query, num_results=5) # Not 100
Leverage Context
# Context helps avoid redundant searches
tool.search_web("python basics")
tool.search_web("python advanced") # Context aware
Configure Rate Limits Appropriately
config = {
'rate_limit_requests': 100, # Match your API quota
'rate_limit_window': 86400 # 24 hours
}
8.3 Scalability Considerations
Horizontal Scaling:
Redis cache shared across instances
Stateless design (except context)
Thread-safe implementation
Vertical Scaling:
Async batch operations
Connection pooling
Efficient memory usage
Quota Management:
Distributed rate limiting via Redis
Circuit breaker prevents cascading failures
Intelligent caching reduces API calls
9. Testing
9.1 Unit Tests
Location: test/unit_tests/tools/test_search_tool_enhanced.py
Test Coverage:
Result quality analysis
Query intent detection
Deduplication logic
Context management
Metrics collection
Error handling
Cache operations
Example Tests:
def test_quality_analysis():
analyzer = ResultQualityAnalyzer()
result = {
'title': 'Machine Learning Tutorial',
'snippet': 'Learn machine learning basics',
'displayLink': 'docs.python.org'
}
analysis = analyzer.analyze_result_quality(result, 'machine learning', 1)
assert analysis['authority_score'] > 0.8
assert analysis['credibility_level'] == 'high'
def test_intent_detection():
analyzer = QueryIntentAnalyzer()
analysis = analyzer.analyze_query_intent('how to build REST API')
assert analysis['intent_type'] == 'how_to'
assert analysis['confidence'] > 0.8
9.2 Integration Tests
def test_web_search_integration():
tool = SearchTool()
results = tool.search_web("test query", num_results=5)
assert isinstance(results, list)
assert len(results) <= 5
assert all('title' in r for r in results)
assert all('link' in r for r in results)
def test_cache_integration():
tool = SearchTool()
# First call - cache miss
results1 = tool.search_web("cache test")
# Second call - cache hit
results2 = tool.search_web("cache test")
assert results1 == results2
10. Advanced Topics
10.1 Custom Quality Analyzers
You can extend the quality analyzer with custom domain authorities:
from aiecs.tools.search_tool.analyzers import ResultQualityAnalyzer
class CustomQualityAnalyzer(ResultQualityAnalyzer):
AUTHORITATIVE_DOMAINS = {
**ResultQualityAnalyzer.AUTHORITATIVE_DOMAINS,
'mycompany.com': 0.95,
'trusted-source.org': 0.90
}
# Use custom analyzer
tool = SearchTool()
tool.quality_analyzer = CustomQualityAnalyzer()
10.2 Custom Intent Patterns
Add custom intent patterns:
from aiecs.tools.search_tool.analyzers import QueryIntentAnalyzer
class CustomIntentAnalyzer(QueryIntentAnalyzer):
INTENT_PATTERNS = {
**QueryIntentAnalyzer.INTENT_PATTERNS,
'troubleshooting': {
'patterns': [r'\berror\b', r'\bfix\b', r'\btroubleshoot\b'],
'query_enhancement': 'solution fix troubleshooting',
'suggested_params': {'num_results': 10}
}
}
10.3 Custom Cache Strategies
Implement custom TTL strategies:
from aiecs.tools.search_tool.cache import IntelligentCache
def custom_ttl_strategy(result, args, kwargs):
"""Custom TTL based on result quality"""
quality_score = result.get('_quality_summary', {}).get('score', 0)
if quality_score > 0.9:
return 86400 # 24 hours for high quality
elif quality_score > 0.7:
return 3600 # 1 hour for medium quality
else:
return 1800 # 30 minutes for low quality
# Apply custom strategy
tool.intelligent_cache.set_ttl_strategy(custom_ttl_strategy)
10.4 Monitoring Integration
Integrate with external monitoring systems:
from aiecs.tools.search_tool import SearchTool
tool = SearchTool()
# Get metrics periodically
import time
while True:
metrics = tool.get_metrics()
health = tool.get_health_score()
# Send to monitoring system
monitoring_system.send_metric('search_tool.health', health)
monitoring_system.send_metric('search_tool.requests', metrics['requests']['total'])
monitoring_system.send_metric('search_tool.cache_hit_rate', metrics['cache']['hit_rate'])
time.sleep(60) # Every minute
Document Version: 2.0 Last Updated: 2025-10-18 Maintainer: AIECS Tools Team