# Search Tool - Complete Technical Documentation ## Table of Contents 1. [Overview](#overview) 2. [Architecture](#architecture) 3. [Core Components](#core-components) 4. [Enhanced Features](#enhanced-features) 5. [API Reference](#api-reference) 6. [Data Structures](#data-structures) 7. [Error Handling](#error-handling) 8. [Performance & Optimization](#performance--optimization) 9. [Testing](#testing) 10. [Advanced Topics](#advanced-topics) --- ## 1. Overview ### 1.1 Purpose The **SearchTool** is an enterprise-grade web search tool that integrates Google Custom Search API with advanced AI-agent-optimized features. It provides intelligent search capabilities with quality assessment, intent analysis, context awareness, and comprehensive reliability mechanisms. ### 1.2 Key Capabilities - **Multi-Type Search**: Web, image, news, and video search - **Quality Assessment**: Automatic result quality scoring and credibility analysis - **Intent Analysis**: Query intent detection with automatic enhancement - **Context Awareness**: Search history tracking and preference learning - **Intelligent Caching**: Redis-based caching with intent-aware TTL strategies - **Reliability**: Rate limiting, circuit breaker, and retry mechanisms - **Deduplication**: Advanced result deduplication with similarity detection - **Metrics & Monitoring**: Comprehensive performance tracking and health scoring ### 1.3 Architecture Overview ``` SearchTool (BaseTool) ├── Core Components │ ├── Google Custom Search API Client │ ├── Rate Limiter (Token Bucket) │ ├── Circuit Breaker (3-State) │ └── Retry Handler (Exponential Backoff) ├── Enhanced Features │ ├── ResultQualityAnalyzer │ ├── QueryIntentAnalyzer │ ├── ResultDeduplicator │ ├── SearchContext │ ├── IntelligentCache (Redis) │ ├── ResultSummarizer │ └── EnhancedMetrics └── Error Handling └── AgentFriendlyErrorHandler ``` --- ## 2. Architecture ### 2.1 Package Structure ``` aiecs/tools/search_tool/ ├── __init__.py # Package entry point with tool registration ├── core.py # Main SearchTool class ├── constants.py # Enums, exceptions, and constants ├── schemas.py # Pydantic schemas for input validation ├── analyzers.py # Quality, intent, and summarization analyzers ├── deduplicator.py # Result deduplication logic ├── context.py # Search context management ├── cache.py # Intelligent Redis caching ├── metrics.py # Enhanced metrics collection ├── error_handler.py # Agent-friendly error formatting ├── rate_limiter.py # Rate limiting and circuit breaker └── README.md # Package documentation ``` ### 2.2 Component Interaction Flow ``` User Request ↓ SearchTool.search_web() ↓ [Rate Limiter Check] → RateLimitError if exceeded ↓ [Circuit Breaker Check] → CircuitBreakerOpenError if open ↓ [Intent Analysis] → Query enhancement ↓ [Cache Check] → Return cached if available ↓ [Google API Call] → With retry logic ↓ [Quality Analysis] → Score each result ↓ [Deduplication] → Remove duplicates ↓ [Context Update] → Track search history ↓ [Cache Store] → Store with intelligent TTL ↓ [Metrics Update] → Record performance ↓ Return Results ``` ### 2.3 Integration Points - **AIECS Base Tool**: Inherits from `BaseTool` for standardized interface - **Redis**: Optional integration for intelligent caching - **LangChain**: Full adapter support for agent integration - **Google Custom Search API**: Primary search backend - **Metrics System**: Integration with AIECS metrics infrastructure --- ## 3. Core Components ### 3.1 SearchTool Class **Location**: `aiecs/tools/search_tool/core.py` **Inheritance**: `BaseTool` **Key Attributes**: ```python class SearchTool(BaseTool): config: Config # Configuration object rate_limiter: RateLimiter # Rate limiting circuit_breaker: CircuitBreaker # Failure protection quality_analyzer: ResultQualityAnalyzer # Quality assessment intent_analyzer: QueryIntentAnalyzer # Intent detection deduplicator: ResultDeduplicator # Deduplication search_context: SearchContext # Context tracking intelligent_cache: IntelligentCache # Redis caching metrics: EnhancedMetrics # Performance metrics error_handler: AgentFriendlyErrorHandler # Error formatting ``` **Configuration Schema**: ```python class Config(BaseModel): # API Configuration google_api_key: Optional[str] google_cse_id: Optional[str] google_application_credentials: Optional[str] # Performance max_results_per_query: int = 10 cache_ttl: int = 3600 timeout: int = 30 # Rate Limiting rate_limit_requests: int = 100 rate_limit_window: int = 86400 # Circuit Breaker circuit_breaker_threshold: int = 5 circuit_breaker_timeout: int = 60 # Retry Logic retry_attempts: int = 3 retry_backoff: float = 2.0 # Enhanced Features enable_quality_analysis: bool = True enable_intent_analysis: bool = True enable_deduplication: bool = True enable_context_tracking: bool = True enable_intelligent_cache: bool = True # Tuning similarity_threshold: float = 0.85 max_search_history: int = 10 user_agent: str = "AIECS-SearchTool/2.0" ``` ### 3.2 Rate Limiter **Location**: `aiecs/tools/search_tool/rate_limiter.py` **Algorithm**: Token Bucket **Purpose**: Prevents API quota exhaustion by limiting request rate **Implementation**: ```python class RateLimiter: def __init__(self, max_requests: int, window_seconds: int): self.max_requests = max_requests self.window_seconds = window_seconds self.tokens = max_requests self.last_refill = time.time() def acquire(self) -> bool: """Attempt to acquire a token for request""" self._refill_tokens() if self.tokens >= 1: self.tokens -= 1 return True return False def _refill_tokens(self): """Refill tokens based on elapsed time""" now = time.time() elapsed = now - self.last_refill refill_amount = (elapsed / self.window_seconds) * self.max_requests self.tokens = min(self.max_requests, self.tokens + refill_amount) self.last_refill = now ``` **Features**: - Automatic token refill based on time window - Thread-safe implementation - Configurable request limits - Real-time quota tracking ### 3.3 Circuit Breaker **Location**: `aiecs/tools/search_tool/rate_limiter.py` **Pattern**: Three-State Circuit Breaker **States**: - **CLOSED**: Normal operation, requests pass through - **OPEN**: Failures exceeded threshold, requests blocked - **HALF_OPEN**: Testing recovery, limited requests allowed **Implementation**: ```python class CircuitBreaker: def __init__(self, threshold: int, timeout: int): self.threshold = threshold self.timeout = timeout self.failure_count = 0 self.last_failure_time = None self.state = CircuitState.CLOSED def call(self, func, *args, **kwargs): """Execute function with circuit breaker protection""" if self.state == CircuitState.OPEN: if self._should_attempt_reset(): self.state = CircuitState.HALF_OPEN else: raise CircuitBreakerOpenError("Circuit breaker is open") try: result = func(*args, **kwargs) self._on_success() return result except Exception as e: self._on_failure() raise ``` **Features**: - Automatic failure detection - Configurable failure threshold - Time-based recovery attempts - Health check mechanism --- ## 4. Enhanced Features ### 4.1 Result Quality Analyzer **Location**: `aiecs/tools/search_tool/analyzers.py` **Purpose**: Assess search result quality using multiple factors **Quality Factors**: 1. **Domain Authority** (0-1 score) - Authoritative domains (`.gov`, `.edu`, academic sites) - Major media outlets - Technical documentation sites - Community platforms 2. **Relevance Score** (0-1 score) - Query term matching in title - Query term matching in snippet - Position in search results 3. **Freshness Score** (0-1 score) - Publication date analysis - Content age assessment 4. **Quality Signals** - HTTPS usage - Content length - Metadata presence - Low-quality indicator detection **Authoritative Domains**: ```python AUTHORITATIVE_DOMAINS = { # Academic and research 'scholar.google.com': 0.95, 'arxiv.org': 0.95, 'ieee.org': 0.95, 'nature.com': 0.95, # Government and official '.gov': 0.90, '.edu': 0.85, # Major media 'reuters.com': 0.85, 'apnews.com': 0.85, # Technical documentation 'docs.python.org': 0.90, 'developer.mozilla.org': 0.90, 'stackoverflow.com': 0.75, } ``` **Output Structure**: ```python { 'quality_score': 0.85, # Overall quality (0-1) 'authority_score': 0.90, # Domain authority (0-1) 'relevance_score': 0.80, # Query relevance (0-1) 'freshness_score': 0.85, # Content freshness (0-1) 'credibility_level': 'high', # high/medium/low 'quality_signals': { 'domain_authority': 'high', 'has_https': True, 'has_metadata': True, 'content_length': 'adequate' }, 'warnings': [] # Quality warnings } ``` ### 4.2 Query Intent Analyzer **Location**: `aiecs/tools/search_tool/analyzers.py` **Purpose**: Detect query intent and enhance queries automatically **Intent Types**: ```python class QueryIntentType(str, Enum): DEFINITION = "definition" # "what is X" HOW_TO = "how_to" # "how to X" COMPARISON = "comparison" # "X vs Y" FACTUAL = "factual" # "when/where/who" RECENT_NEWS = "recent_news" # "latest X" ACADEMIC = "academic" # "research on X" PRODUCT = "product" # "buy X", "X review" GENERAL = "general" # General queries ``` **Intent Detection Patterns**: ```python INTENT_PATTERNS = { 'definition': { 'patterns': [r'\bwhat is\b', r'\bdefine\b', r'\bmeaning of\b'], 'query_enhancement': 'definition explanation', 'suggested_params': {'num_results': 5} }, 'how_to': { 'patterns': [r'\bhow to\b', r'\bhow do\b', r'\bsteps to\b'], 'query_enhancement': 'tutorial guide step-by-step', 'suggested_params': {'num_results': 10} }, 'comparison': { 'patterns': [r'\bvs\b', r'\bversus\b', r'\bcompare\b', r'\bdifference between\b'], 'query_enhancement': 'comparison differences', 'suggested_params': {'num_results': 10} }, 'academic': { 'patterns': [r'\bresearch\b', r'\bstudy\b', r'\bpaper\b', r'\bjournal\b'], 'query_enhancement': 'research paper study', 'suggested_params': {'file_type': 'pdf', 'num_results': 10} } } ``` **Query Enhancement**: - Automatically adds relevant search operators - Suggests optimal search parameters - Improves result quality for specific intent types **Output Structure**: ```python { 'intent_type': 'how_to', 'confidence': 0.95, 'original_query': 'how to build REST API', 'enhanced_query': 'how to build REST API tutorial guide step-by-step', 'suggested_params': {'num_results': 10}, 'query_entities': ['REST API', 'build'], 'query_modifiers': ['how to'], 'suggestions': ['Consider adding programming language', 'Specify framework'] } ``` ### 4.3 Result Deduplicator **Location**: `aiecs/tools/search_tool/deduplicator.py` **Purpose**: Remove duplicate and highly similar results **Deduplication Methods**: 1. **URL Normalization** - Remove query parameters - Normalize protocols (http/https) - Handle URL variations 2. **Content Similarity** - Title similarity comparison - Snippet similarity comparison - Configurable threshold (default: 0.85) **Implementation**: ```python class ResultDeduplicator: def __init__(self, similarity_threshold: float = 0.85): self.similarity_threshold = similarity_threshold def deduplicate(self, results: List[Dict]) -> List[Dict]: """Remove duplicate and similar results""" seen_urls = set() unique_results = [] for result in results: normalized_url = self._normalize_url(result['link']) if normalized_url in seen_urls: continue if self._is_similar_to_existing(result, unique_results): continue seen_urls.add(normalized_url) unique_results.append(result) return unique_results ``` ### 4.4 Search Context **Location**: `aiecs/tools/search_tool/context.py` **Purpose**: Track search history and learn user preferences **Features**: - Search history management (configurable limit) - Topic context tracking - Preference learning from feedback - Related query suggestions - Domain preference tracking **Context Structure**: ```python { 'history': [ { 'query': 'machine learning', 'timestamp': '2025-10-18T10:30:00', 'results_count': 10, 'avg_quality': 0.85 } ], 'preferences': { 'preferred_domains': ['arxiv.org', 'github.com'], 'avoided_domains': ['spam-site.com'], 'preferred_quality_level': 'high' }, 'topic_context': { 'current_topic': 'machine learning', 'related_queries': ['deep learning', 'neural networks'] } } ``` ### 4.5 Intelligent Cache **Location**: `aiecs/tools/search_tool/cache.py` **Backend**: Redis **Purpose**: Reduce API calls with smart caching strategies **Intent-Aware TTL**: ```python TTL_STRATEGIES = { 'definition': 2592000, # 30 days (stable content) 'how_to': 604800, # 7 days (tutorials) 'academic': 2592000, # 30 days (research papers) 'recent_news': 3600, # 1 hour (news) 'product': 86400, # 1 day (products) 'general': 3600 # 1 hour (default) } ``` **Dynamic TTL Adjustment**: - Higher quality results cached longer - Fresh content cached shorter - User feedback influences TTL **Cache Key Generation**: ```python def _generate_cache_key(query: str, params: Dict) -> str: """Generate unique cache key""" key_parts = [ query.lower().strip(), str(params.get('num_results', 10)), params.get('language', 'en'), params.get('country', 'us'), params.get('date_restrict', ''), params.get('file_type', '') ] return f"search:{':'.join(key_parts)}" ``` ### 4.6 Enhanced Metrics **Location**: `aiecs/tools/search_tool/metrics.py` **Purpose**: Comprehensive performance tracking and health monitoring **Metrics Categories**: 1. **Request Metrics** - Total requests - Successful requests - Failed requests - Cached requests 2. **Performance Metrics** - Response times (P50, P95, P99) - Average response time - Slowest queries 3. **Quality Metrics** - Average quality score - High-quality result percentage - Results per query - No-result queries 4. **Cache Metrics** - Hit rate - Cache hits/misses - Cache efficiency 5. **Error Metrics** - Error rate - Errors by type - Recent errors 6. **Query Pattern Metrics** - Top query types - Top domains - Average query length **Health Score Calculation**: ```python def calculate_health_score(self) -> float: """Calculate overall system health (0-1)""" factors = { 'success_rate': 0.4, # 40% weight 'cache_hit_rate': 0.2, # 20% weight 'avg_quality': 0.2, # 20% weight 'error_rate': 0.2 # 20% weight (inverted) } health = ( factors['success_rate'] * self.success_rate + factors['cache_hit_rate'] * self.cache_hit_rate + factors['avg_quality'] * self.avg_quality_score + factors['error_rate'] * (1 - self.error_rate) ) return max(0.0, min(1.0, health)) ``` --- ## 5. API Reference ### 5.1 Search Operations #### search_web() **Purpose**: Perform web search with comprehensive filters **Signature**: ```python def search_web( query: str, num_results: int = 10, start_index: int = 1, language: str = "en", country: str = "us", safe_search: str = "medium", date_restrict: Optional[str] = None, file_type: Optional[str] = None, exclude_terms: Optional[str] = None, auto_enhance: bool = True, return_summary: bool = False ) -> Union[List[Dict], Dict[str, Any]] ``` **Parameters**: - `query` (str): Search query string - `num_results` (int): Number of results (1-100) - `start_index` (int): Pagination start (1-91) - `language` (str): Language code (e.g., 'en', 'zh-CN') - `country` (str): Country code (e.g., 'us', 'cn') - `safe_search` (str): 'off', 'medium', or 'high' - `date_restrict` (Optional[str]): Date filter (e.g., 'd7', 'm3', 'y1') - `file_type` (Optional[str]): File type filter (e.g., 'pdf', 'doc') - `exclude_terms` (Optional[str]): Terms to exclude - `auto_enhance` (bool): Enable query enhancement - `return_summary` (bool): Return structured summary **Returns**: - If `return_summary=False`: `List[Dict]` - List of search results - If `return_summary=True`: `Dict` with 'results' and 'summary' keys **Result Structure**: ```python { 'title': 'Result Title', 'link': 'https://example.com', 'snippet': 'Result description...', 'displayLink': 'example.com', 'formattedUrl': 'https://example.com/page', # Enhanced fields (if quality analysis enabled) '_quality_summary': { 'score': 0.85, 'level': 'high', 'is_authoritative': True, 'authority_score': 0.90, 'relevance_score': 0.80 }, # Metadata (if intent analysis enabled) '_search_metadata': { 'original_query': 'machine learning', 'enhanced_query': 'machine learning tutorial guide', 'intent_type': 'how_to', 'intent_confidence': 0.95 } } ``` **Example**: ```python # Basic search results = tool.search_web("artificial intelligence", num_results=10) # Advanced search with filters results = tool.search_web( query="climate change research", num_results=10, language="en", date_restrict="m6", # Last 6 months file_type="pdf", auto_enhance=True, return_summary=True ) # Access results for result in results['results']: print(f"Title: {result['title']}") print(f"Quality: {result['_quality_summary']['score']}") ``` #### search_images() **Purpose**: Search for images with size and type filters **Signature**: ```python def search_images( query: str, num_results: int = 10, image_size: Optional[str] = None, image_type: Optional[str] = None, image_color_type: Optional[str] = None, safe_search: str = "medium" ) -> List[Dict[str, Any]] ``` **Parameters**: - `query` (str): Image search query - `num_results` (int): Number of images (1-100) - `image_size` (Optional[str]): 'icon', 'small', 'medium', 'large', 'xlarge', 'xxlarge', 'huge' - `image_type` (Optional[str]): 'clipart', 'face', 'lineart', 'stock', 'photo', 'animated' - `image_color_type` (Optional[str]): 'color', 'gray', 'mono', 'trans' - `safe_search` (str): 'off', 'medium', or 'high' **Returns**: List of image results with URLs and metadata **Example**: ```python images = tool.search_images( query="sunset beach", num_results=10, image_size="large", image_type="photo", image_color_type="color" ) for img in images: print(f"Image: {img['link']}") print(f"Thumbnail: {img['image']['thumbnailLink']}") ``` #### search_news() **Purpose**: Search for news articles **Signature**: ```python def search_news( query: str, num_results: int = 10, start_index: int = 1, language: str = "en", date_restrict: Optional[str] = None, sort_by: str = "date" ) -> List[Dict[str, Any]] ``` **Parameters**: - `query` (str): News search query - `num_results` (int): Number of articles (1-100) - `start_index` (int): Pagination start - `language` (str): Language code - `date_restrict` (Optional[str]): Date filter (e.g., 'd7' for last 7 days) - `sort_by` (str): 'date' or 'relevance' **Example**: ```python news = tool.search_news( query="technology innovation", num_results=10, date_restrict="d7", # Last 7 days sort_by="date" ) ``` #### search_videos() **Purpose**: Search for videos **Signature**: ```python def search_videos( query: str, num_results: int = 10, safe_search: str = "medium", language: str = "en" ) -> List[Dict[str, Any]] ``` #### search_paginated() **Purpose**: Retrieve more than 10 results (up to 100) with automatic pagination **Signature**: ```python def search_paginated( query: str, total_results: int = 50, search_type: str = "web", **kwargs ) -> List[Dict[str, Any]] ``` **Parameters**: - `query` (str): Search query - `total_results` (int): Total results to retrieve (1-100) - `search_type` (str): 'web', 'images', 'news', or 'videos' - `**kwargs`: Additional parameters for specific search type **Example**: ```python # Get 50 web results results = tool.search_paginated( query="machine learning", total_results=50, search_type="web", language="en" ) ``` #### search_batch() **Purpose**: Execute multiple queries in parallel **Signature**: ```python async def search_batch( queries: List[str], search_type: str = "web", num_results: int = 10, **kwargs ) -> Dict[str, List[Dict]] ``` **Parameters**: - `queries` (List[str]): List of search queries (max 50) - `search_type` (str): Type of search - `num_results` (int): Results per query - `**kwargs`: Additional search parameters **Returns**: Dictionary mapping queries to their results **Example**: ```python import asyncio queries = ["AI", "ML", "DL", "NLP"] results = asyncio.run(tool.search_batch( queries=queries, search_type="web", num_results=5 )) for query, query_results in results.items(): print(f"Results for '{query}': {len(query_results)}") ``` ### 5.2 Monitoring Operations #### get_metrics() **Purpose**: Get detailed performance metrics **Signature**: ```python def get_metrics(self) -> Dict[str, Any] ``` **Returns**: ```python { 'requests': { 'total': 150, 'successful': 142, 'failed': 8, 'cached': 45 }, 'performance': { 'avg_response_time': 234.5, 'p50_response_time': 200.0, 'p95_response_time': 450.0, 'p99_response_time': 800.0 }, 'quality': { 'avg_results_per_query': 8.3, 'avg_quality_score': 0.78, 'high_quality_percentage': 62.5, 'no_results_count': 3 }, 'cache': { 'hit_rate': 0.30, 'hits': 45, 'misses': 105 }, 'errors': { 'error_rate': 0.053, 'errors_by_type': { 'QuotaExceededError': 3, 'NetworkError': 2 } } } ``` #### get_metrics_report() **Purpose**: Get human-readable metrics report **Signature**: ```python def get_metrics_report(self) -> str ``` **Returns**: Formatted string report #### get_health_score() **Purpose**: Get overall system health score **Signature**: ```python def get_health_score(self) -> float ``` **Returns**: Health score (0-1), where >0.8 is healthy #### get_quota_status() **Purpose**: Get current quota and circuit breaker status **Signature**: ```python def get_quota_status(self) -> Dict[str, Any] ``` **Returns**: ```python { 'remaining_quota': 85, 'quota_limit': 100, 'quota_window_seconds': 86400, 'circuit_breaker_state': 'closed', 'circuit_breaker_failures': 0, 'metrics': { 'total_requests': 15, 'successful_requests': 15, 'failed_requests': 0 } } ``` #### validate_credentials() **Purpose**: Validate Google API credentials **Signature**: ```python def validate_credentials(self) -> Dict[str, Any] ``` **Returns**: ```python { 'valid': True, 'method': 'api_key', # or 'service_account' 'cse_id_present': True, 'error': None } ``` ### 5.3 Context Operations #### get_search_context() **Purpose**: Get current search context and history **Signature**: ```python def get_search_context(self) -> Dict[str, Any] ``` **Returns**: Search context with history and preferences --- ## 6. Data Structures ### 6.1 Enumerations ```python # Search Types class SearchType(str, Enum): WEB = "web" IMAGE = "image" NEWS = "news" VIDEO = "video" # Safe Search Levels class SafeSearch(str, Enum): OFF = "off" MEDIUM = "medium" HIGH = "high" # Query Intent Types class QueryIntentType(str, Enum): DEFINITION = "definition" HOW_TO = "how_to" COMPARISON = "comparison" FACTUAL = "factual" RECENT_NEWS = "recent_news" ACADEMIC = "academic" PRODUCT = "product" GENERAL = "general" # Credibility Levels class CredibilityLevel(str, Enum): HIGH = "high" MEDIUM = "medium" LOW = "low" # Circuit Breaker States class CircuitState(str, Enum): CLOSED = "closed" OPEN = "open" HALF_OPEN = "half_open" ``` ### 6.2 Exception Hierarchy ```python SearchToolError # Base exception ├── AuthenticationError # Invalid/missing credentials ├── QuotaExceededError # API quota exceeded ├── RateLimitError # Rate limit reached ├── CircuitBreakerOpenError # Circuit breaker open ├── SearchAPIError # Google API errors ├── ValidationError # Input validation errors └── CacheError # Cache-related errors ``` --- ## 7. Error Handling ### 7.1 Agent-Friendly Error Handler **Location**: `aiecs/tools/search_tool/error_handler.py` **Purpose**: Format errors in a way that AI agents can understand and act upon **Error Structure**: ```python { 'error_type': 'QuotaExceededError', 'message': 'API quota exceeded for today', 'severity': 'high', 'is_retryable': False, 'suggested_actions': [ 'Wait 24 hours for quota reset', 'Upgrade to paid tier', 'Use cached results if available' ], 'alternative_approaches': [ 'Use alternative search source', 'Reduce search frequency' ], 'recovery_time_estimate': '24 hours', 'context': { 'current_quota': 100, 'quota_limit': 100, 'reset_time': '2025-10-19T00:00:00Z' } } ``` ### 7.2 Error Handling Best Practices ```python from aiecs.tools.search_tool import ( SearchTool, RateLimitError, QuotaExceededError, CircuitBreakerOpenError, AuthenticationError ) tool = SearchTool() try: results = tool.search_web("query") except RateLimitError as e: # Rate limit exceeded - wait and retry error_info = tool.error_handler.format_error(e) wait_time = error_info.get('recovery_time_estimate', 60) time.sleep(wait_time) # Retry... except QuotaExceededError as e: # Quota exceeded - use fallback error_info = tool.error_handler.format_error(e) # Use cached results or alternative source except CircuitBreakerOpenError as e: # Circuit breaker open - API is down error_info = tool.error_handler.format_error(e) # Wait for recovery or use fallback except AuthenticationError as e: # Invalid credentials - fix configuration error_info = tool.error_handler.format_error(e) # Check API key and CSE ID except Exception as e: # Unexpected error logger.error(f"Unexpected error: {e}") ``` --- ## 8. Performance & Optimization ### 8.1 Performance Benchmarks **Average Response Times**: - With cache hit: ~50ms - Without cache (API call): ~200-500ms - Quality analysis overhead: ~10-20ms per result - Intent detection: ~5-10ms per query **Cache Performance**: - Typical hit rate: 30-50% - API call reduction: 30-50% - Storage overhead: ~5KB per cached query ### 8.2 Optimization Strategies 1. **Enable All Caching** ```python config = { 'enable_intelligent_cache': True, 'cache_ttl': 3600 # Adjust based on content freshness needs } ``` 2. **Use Batch Operations** ```python # Instead of multiple individual calls results = await tool.search_batch(queries=['q1', 'q2', 'q3']) ``` 3. **Optimize Result Count** ```python # Only request what you need results = tool.search_web(query, num_results=5) # Not 100 ``` 4. **Leverage Context** ```python # Context helps avoid redundant searches tool.search_web("python basics") tool.search_web("python advanced") # Context aware ``` 5. **Configure Rate Limits Appropriately** ```python config = { 'rate_limit_requests': 100, # Match your API quota 'rate_limit_window': 86400 # 24 hours } ``` ### 8.3 Scalability Considerations **Horizontal Scaling**: - Redis cache shared across instances - Stateless design (except context) - Thread-safe implementation **Vertical Scaling**: - Async batch operations - Connection pooling - Efficient memory usage **Quota Management**: - Distributed rate limiting via Redis - Circuit breaker prevents cascading failures - Intelligent caching reduces API calls --- ## 9. Testing ### 9.1 Unit Tests **Location**: `test/unit_tests/tools/test_search_tool_enhanced.py` **Test Coverage**: - Result quality analysis - Query intent detection - Deduplication logic - Context management - Metrics collection - Error handling - Cache operations **Example Tests**: ```python def test_quality_analysis(): analyzer = ResultQualityAnalyzer() result = { 'title': 'Machine Learning Tutorial', 'snippet': 'Learn machine learning basics', 'displayLink': 'docs.python.org' } analysis = analyzer.analyze_result_quality(result, 'machine learning', 1) assert analysis['authority_score'] > 0.8 assert analysis['credibility_level'] == 'high' def test_intent_detection(): analyzer = QueryIntentAnalyzer() analysis = analyzer.analyze_query_intent('how to build REST API') assert analysis['intent_type'] == 'how_to' assert analysis['confidence'] > 0.8 ``` ### 9.2 Integration Tests ```python def test_web_search_integration(): tool = SearchTool() results = tool.search_web("test query", num_results=5) assert isinstance(results, list) assert len(results) <= 5 assert all('title' in r for r in results) assert all('link' in r for r in results) def test_cache_integration(): tool = SearchTool() # First call - cache miss results1 = tool.search_web("cache test") # Second call - cache hit results2 = tool.search_web("cache test") assert results1 == results2 ``` --- ## 10. Advanced Topics ### 10.1 Custom Quality Analyzers You can extend the quality analyzer with custom domain authorities: ```python from aiecs.tools.search_tool.analyzers import ResultQualityAnalyzer class CustomQualityAnalyzer(ResultQualityAnalyzer): AUTHORITATIVE_DOMAINS = { **ResultQualityAnalyzer.AUTHORITATIVE_DOMAINS, 'mycompany.com': 0.95, 'trusted-source.org': 0.90 } # Use custom analyzer tool = SearchTool() tool.quality_analyzer = CustomQualityAnalyzer() ``` ### 10.2 Custom Intent Patterns Add custom intent patterns: ```python from aiecs.tools.search_tool.analyzers import QueryIntentAnalyzer class CustomIntentAnalyzer(QueryIntentAnalyzer): INTENT_PATTERNS = { **QueryIntentAnalyzer.INTENT_PATTERNS, 'troubleshooting': { 'patterns': [r'\berror\b', r'\bfix\b', r'\btroubleshoot\b'], 'query_enhancement': 'solution fix troubleshooting', 'suggested_params': {'num_results': 10} } } ``` ### 10.3 Custom Cache Strategies Implement custom TTL strategies: ```python from aiecs.tools.search_tool.cache import IntelligentCache def custom_ttl_strategy(result, args, kwargs): """Custom TTL based on result quality""" quality_score = result.get('_quality_summary', {}).get('score', 0) if quality_score > 0.9: return 86400 # 24 hours for high quality elif quality_score > 0.7: return 3600 # 1 hour for medium quality else: return 1800 # 30 minutes for low quality # Apply custom strategy tool.intelligent_cache.set_ttl_strategy(custom_ttl_strategy) ``` ### 10.4 Monitoring Integration Integrate with external monitoring systems: ```python from aiecs.tools.search_tool import SearchTool tool = SearchTool() # Get metrics periodically import time while True: metrics = tool.get_metrics() health = tool.get_health_score() # Send to monitoring system monitoring_system.send_metric('search_tool.health', health) monitoring_system.send_metric('search_tool.requests', metrics['requests']['total']) monitoring_system.send_metric('search_tool.cache_hit_rate', metrics['cache']['hit_rate']) time.sleep(60) # Every minute ``` --- **Document Version**: 2.0 **Last Updated**: 2025-10-18 **Maintainer**: AIECS Tools Team