Schema Auto-Generator Technical Documentation
1. Overview
Purpose: schema_generator.py is an intelligent Schema generation component in the AIECS tool system. This module automatically generates Pydantic-compliant Schema classes by analyzing method signatures and type annotations, providing parameter validation and documentation support for tool methods, significantly reducing tool development maintenance costs.
Core Value:
Automated Generation: Automatically generates Pydantic Schema from type annotations without manual writing
Intelligent Extraction: Extracts parameter descriptions from docstrings, improving Schema quality
Type Safety: Ensures generated Schema is completely consistent with method signatures
Reduced Maintenance Costs: Automatically synchronizes when method signatures change, avoiding manual maintenance inconsistencies
Improved Coverage: Increased Tool Calling Agent Schema coverage from 30% to 94.3%
Strong Compatibility: Handles complex types (e.g., pandas.DataFrame) with graceful degradation
2. Problem Background & Design Motivation
2.1 Business Pain Points
The following key challenges are faced in tool development and maintenance:
High Manual Maintenance Costs: Out of 158 tool methods, only 48 have manually defined Schemas (30% coverage)
Code Redundancy: Methods already have complete type annotations but still require manual Schema class writing
Maintenance Inconsistency: When method signatures change, Schemas are easily forgotten to update synchronously
Massive Workload: Manually adding Schemas for 79 methods without Schemas requires approximately 13 hours
Tool Calling Agent Limitations: Methods without Schemas cannot use Tool Calling Agent
Inconsistent Documentation Quality: Manually written Schema descriptions vary in quality
2.2 Design Motivation
Based on the above pain points, an automatic generation solution based on type annotations was designed:
Type Annotation Priority: Uses Python’s type annotation system as the single source of truth
Docstring Parsing: Intelligently extracts parameter descriptions, improving generation quality
Graceful Degradation: Handles unsupported types (e.g., pandas.DataFrame)
Optional Feature: Serves as a fallback, does not affect manually defined Schemas
Zero Intrusion: No need to modify existing tool code
3. Architecture Positioning & Context
3.1 System Architecture Diagram
graph TB
subgraph "Tool Development Layer"
A["Tool Class Definition"] --> B["Method Signature + Type Annotations"]
B --> C["Docstring"]
end
subgraph "Schema Generation Layer"
D["Schema Generator"] --> E["Type Annotation Extraction"]
D --> F["Docstring Parsing"]
D --> G["Schema Class Generation"]
end
subgraph "LangChain Integration Layer"
H["LangChain Adapter"] --> I["Schema Discovery"]
I --> J["Manual Schema Priority"]
I --> K["Auto-Generated Schema Fallback"]
end
subgraph "AI Agent Layer"
L["ReAct Agent"] --> M["Tool Calling Agent"]
end
subgraph "Validation Tool Layer"
N["Type Annotation Checker"] --> O["Schema Quality Validator"]
end
B --> E
C --> F
E --> G
F --> G
G --> K
J --> M
K --> M
A --> N
G --> O
3.2 Upstream and Downstream Dependencies
Upstream Callers:
LangChain Adapter (discover_operations method)
Tool development validation scripts
Schema quality check tools
Downstream Dependencies:
Python inspect module (method signature analysis)
Python typing module (type annotation extraction)
Pydantic (Schema class generation)
Type annotations and docstrings of tool classes
Peer Components:
LangChain Adapter
BaseTool base class
Tool registry center
4. Core Features & Implementation
4.1 Feature List
Feature |
Description |
Priority |
|---|---|---|
Type Annotation Extraction |
Extract parameter types from method signatures |
P0 |
Docstring Parsing |
Extract parameter descriptions (Google/NumPy style) |
P0 |
Schema Class Generation |
Generate using Pydantic create_model |
P0 |
Type Normalization |
Handle unsupported types (DataFrame → Any) |
P1 |
Default Value Handling |
Properly handle optional parameters and default values |
P0 |
Batch Generation |
Generate Schemas for all methods of a tool class |
P1 |
4.2 Core Implementation
4.2.1 Type Annotation Extraction
def generate_schema_from_method(
method: callable,
method_name: str,
base_class: Type[BaseModel] = BaseModel
) -> Optional[Type[BaseModel]]:
"""Automatically generate Pydantic Schema from method signature"""
# 1. Get method signature
sig = inspect.signature(method)
# 2. Get type annotations
type_hints = get_type_hints(method)
# 3. Build field definitions
field_definitions = {}
for param_name, param in sig.parameters.items():
if param_name == 'self':
continue
# Get and normalize type
param_type = type_hints.get(param_name, Any)
param_type = _normalize_type(param_type)
# Extract description
description = _extract_param_description_from_docstring(
docstring, param_name
)
# Create field
field_definitions[param_name] = (
param_type,
Field(description=description)
)
# 4. Generate Schema class
return create_model(
f"{method_name.title().replace('_', '')}Schema",
__config__=ConfigDict(arbitrary_types_allowed=True),
**field_definitions
)
4.2.2 Docstring Parsing
Supports Google and NumPy style docstrings:
def _extract_param_description_from_docstring(
docstring: str,
param_name: str
) -> Optional[str]:
"""Extract parameter description"""
# Supported formats:
# Google: Args: param_name: description
# NumPy: Parameters: param_name : type description
# Parsing logic...
return description
4.2.3 Type Normalization
Handles unsupported types:
def _normalize_type(param_type: Type) -> Type:
"""Normalize type, handle unsupported types"""
type_name = getattr(param_type, '__name__', str(param_type))
# pandas.DataFrame → Any
if 'DataFrame' in type_name or 'Series' in type_name:
return Any
return param_type
5. Usage Guide
5.1 Basic Usage
from aiecs.tools.schema_generator import generate_schema_from_method
# Generate Schema for a single method
def filter(self, records: List[Dict], condition: str) -> List[Dict]:
"""Filter DataFrame based on a condition.
Args:
records: List of records to filter
condition: Filter condition (pandas query syntax)
"""
pass
schema = generate_schema_from_method(filter, 'filter')
# Generated: FilterSchema
# Fields: records (List[Dict]), condition (str)
# Description: Extracted from docstring
5.2 Batch Generation
from aiecs.tools.schema_generator import generate_schemas_for_tool
# Generate Schemas for all methods of a tool class
schemas = generate_schemas_for_tool(PandasTool)
# Returns: {'filter': FilterSchema, 'groupby': GroupbySchema, ...}
5.3 LangChain Integration
Automatically integrated into LangChain Adapter:
# LangChain Adapter automatically uses
operations = tool_registry.discover_operations(PandasTool)
# Priority:
# 1. Manually defined Schema (if exists)
# 2. Auto-generated Schema (fallback)
5.4 Tool Development Validation
Use validation scripts to check tool quality:
# Check type annotation coverage
aiecs tools check-annotations [tool_name]
# Validate Schema generation quality
aiecs tools validate-schemas [tool_name]
6. Configuration & Extension
6.1 Configuration Options
# Custom base class
schema = generate_schema_from_method(
method,
'method_name',
base_class=CustomBaseModel
)
# Configure Pydantic
# Automatically adds arbitrary_types_allowed=True
6.2 Extension Points
Custom Type Mapping: Extend
_normalize_typeto handle more typesDocstring Parser: Support more docstring styles
Description Enhancement: Use AI to generate better descriptions
Validation Rules: Add custom field validation
7. Performance & Quality Metrics
7.1 Coverage Statistics
Total Methods: 158
Successfully Generated: 149 (94.3%)
Generation Failed: 9 (5.7%) # Methods without parameters
Total Fields: 669
Type Coverage: 100.0%
Description Quality: 74.1%
Overall Score: 89.5% (B Good)
7.2 Performance Metrics
Generation Speed: ~0.1ms/method
Memory Usage: ~1KB/Schema
Batch Generation: 38 methods < 10ms
7.3 Quality Improvements
Improvement Effects:
Tool Calling Agent Availability: 30% → 94.3% (+64.3%)
Development Time Saved: 13 hours → 0 hours
Maintenance Cost: Continuous → Zero (automatic synchronization)
8. Best Practices
8.1 Tool Development Recommendations
Complete Type Annotations: Add type annotations for all parameters
Detailed Docstrings: Use Google or NumPy style
Meaningful Descriptions: Avoid generic descriptions like “Parameter xxx”
Return Types: Add return type annotations
Example:
def filter(self, records: List[Dict], condition: str) -> List[Dict]:
"""
Filter DataFrame based on a condition.
Args:
records: List of records to filter
condition: Filter condition using pandas query syntax (e.g., 'age > 30')
Returns:
Filtered list of records
"""
pass
8.2 Validation Process
# 1. Check type annotations
aiecs tools check-annotations my_tool
# 2. Validate Schema generation
aiecs tools validate-schemas my_tool
# 3. View generated Schemas
aiecs tools show-schemas my_tool --method filter
8.3 Common Questions
Q: Why do some methods fail to generate? A: Methods without parameters (except self) don’t need Schemas, this is expected behavior.
Q: How to improve description quality? A: Add detailed parameter descriptions in the Args section of the docstring.
Q: How to choose between manual Schema and auto-generation? A: Manual Schema takes priority. Auto-generation is only used when no manual Schema exists.
Q: How to handle complex types?
A: Use _normalize_type to map complex types to Any, or add custom mappings.
9. Troubleshooting
9.1 Common Errors
Error |
Cause |
Solution |
|---|---|---|
Generation Failed |
Missing type annotations |
Add complete type annotations |
Description is “Parameter xxx” |
Missing docstring |
Add Args section |
Pydantic Validation Error |
Unsupported type |
Use _normalize_type to map |
Field Count Mismatch |
Signature parsing failed |
Check method signature format |
9.2 Debugging Tips
import logging
logging.basicConfig(level=logging.DEBUG)
# View detailed logs
schema = generate_schema_from_method(method, 'method_name')
10. Future Plans
10.1 Short-term Plans
Support more docstring styles (Sphinx, reStructuredText)
AI-assisted description generation
Schema quality scoring system
Automatic fix suggestions
10.2 Long-term Vision
IDE integration (real-time Schema preview)
Schema version management
Cross-language Schema generation
Community Schema library
11. References
Document Version: 1.0
Last Updated: 2025-10-02
Maintainer: AIECS Tools Team