Schema Auto-Generator Technical Documentation

1. Overview

Purpose: schema_generator.py is an intelligent Schema generation component in the AIECS tool system. This module automatically generates Pydantic-compliant Schema classes by analyzing method signatures and type annotations, providing parameter validation and documentation support for tool methods, significantly reducing tool development maintenance costs.

Core Value:

  • Automated Generation: Automatically generates Pydantic Schema from type annotations without manual writing

  • Intelligent Extraction: Extracts parameter descriptions from docstrings, improving Schema quality

  • Type Safety: Ensures generated Schema is completely consistent with method signatures

  • Reduced Maintenance Costs: Automatically synchronizes when method signatures change, avoiding manual maintenance inconsistencies

  • Improved Coverage: Increased Tool Calling Agent Schema coverage from 30% to 94.3%

  • Strong Compatibility: Handles complex types (e.g., pandas.DataFrame) with graceful degradation

2. Problem Background & Design Motivation

2.1 Business Pain Points

The following key challenges are faced in tool development and maintenance:

  1. High Manual Maintenance Costs: Out of 158 tool methods, only 48 have manually defined Schemas (30% coverage)

  2. Code Redundancy: Methods already have complete type annotations but still require manual Schema class writing

  3. Maintenance Inconsistency: When method signatures change, Schemas are easily forgotten to update synchronously

  4. Massive Workload: Manually adding Schemas for 79 methods without Schemas requires approximately 13 hours

  5. Tool Calling Agent Limitations: Methods without Schemas cannot use Tool Calling Agent

  6. Inconsistent Documentation Quality: Manually written Schema descriptions vary in quality

2.2 Design Motivation

Based on the above pain points, an automatic generation solution based on type annotations was designed:

  • Type Annotation Priority: Uses Python’s type annotation system as the single source of truth

  • Docstring Parsing: Intelligently extracts parameter descriptions, improving generation quality

  • Graceful Degradation: Handles unsupported types (e.g., pandas.DataFrame)

  • Optional Feature: Serves as a fallback, does not affect manually defined Schemas

  • Zero Intrusion: No need to modify existing tool code

3. Architecture Positioning & Context

3.1 System Architecture Diagram

graph TB
    subgraph "Tool Development Layer"
        A["Tool Class Definition"] --> B["Method Signature + Type Annotations"]
        B --> C["Docstring"]
    end
    
    subgraph "Schema Generation Layer"
        D["Schema Generator"] --> E["Type Annotation Extraction"]
        D --> F["Docstring Parsing"]
        D --> G["Schema Class Generation"]
    end
    
    subgraph "LangChain Integration Layer"
        H["LangChain Adapter"] --> I["Schema Discovery"]
        I --> J["Manual Schema Priority"]
        I --> K["Auto-Generated Schema Fallback"]
    end
    
    subgraph "AI Agent Layer"
        L["ReAct Agent"] --> M["Tool Calling Agent"]
    end
    
    subgraph "Validation Tool Layer"
        N["Type Annotation Checker"] --> O["Schema Quality Validator"]
    end
    
    B --> E
    C --> F
    E --> G
    F --> G
    
    G --> K
    J --> M
    K --> M
    
    A --> N
    G --> O

3.2 Upstream and Downstream Dependencies

Upstream Callers:

  • LangChain Adapter (discover_operations method)

  • Tool development validation scripts

  • Schema quality check tools

Downstream Dependencies:

  • Python inspect module (method signature analysis)

  • Python typing module (type annotation extraction)

  • Pydantic (Schema class generation)

  • Type annotations and docstrings of tool classes

Peer Components:

  • LangChain Adapter

  • BaseTool base class

  • Tool registry center

4. Core Features & Implementation

4.1 Feature List

Feature

Description

Priority

Type Annotation Extraction

Extract parameter types from method signatures

P0

Docstring Parsing

Extract parameter descriptions (Google/NumPy style)

P0

Schema Class Generation

Generate using Pydantic create_model

P0

Type Normalization

Handle unsupported types (DataFrame → Any)

P1

Default Value Handling

Properly handle optional parameters and default values

P0

Batch Generation

Generate Schemas for all methods of a tool class

P1

4.2 Core Implementation

4.2.1 Type Annotation Extraction

def generate_schema_from_method(
    method: callable,
    method_name: str,
    base_class: Type[BaseModel] = BaseModel
) -> Optional[Type[BaseModel]]:
    """Automatically generate Pydantic Schema from method signature"""
    
    # 1. Get method signature
    sig = inspect.signature(method)
    
    # 2. Get type annotations
    type_hints = get_type_hints(method)
    
    # 3. Build field definitions
    field_definitions = {}
    for param_name, param in sig.parameters.items():
        if param_name == 'self':
            continue
        
        # Get and normalize type
        param_type = type_hints.get(param_name, Any)
        param_type = _normalize_type(param_type)
        
        # Extract description
        description = _extract_param_description_from_docstring(
            docstring, param_name
        )
        
        # Create field
        field_definitions[param_name] = (
            param_type,
            Field(description=description)
        )
    
    # 4. Generate Schema class
    return create_model(
        f"{method_name.title().replace('_', '')}Schema",
        __config__=ConfigDict(arbitrary_types_allowed=True),
        **field_definitions
    )

4.2.2 Docstring Parsing

Supports Google and NumPy style docstrings:

def _extract_param_description_from_docstring(
    docstring: str, 
    param_name: str
) -> Optional[str]:
    """Extract parameter description"""
    
    # Supported formats:
    # Google: Args: param_name: description
    # NumPy: Parameters: param_name : type description
    
    # Parsing logic...
    return description

4.2.3 Type Normalization

Handles unsupported types:

def _normalize_type(param_type: Type) -> Type:
    """Normalize type, handle unsupported types"""
    
    type_name = getattr(param_type, '__name__', str(param_type))
    
    # pandas.DataFrame → Any
    if 'DataFrame' in type_name or 'Series' in type_name:
        return Any
    
    return param_type

5. Usage Guide

5.1 Basic Usage

from aiecs.tools.schema_generator import generate_schema_from_method

# Generate Schema for a single method
def filter(self, records: List[Dict], condition: str) -> List[Dict]:
    """Filter DataFrame based on a condition.
    
    Args:
        records: List of records to filter
        condition: Filter condition (pandas query syntax)
    """
    pass

schema = generate_schema_from_method(filter, 'filter')
# Generated: FilterSchema
# Fields: records (List[Dict]), condition (str)
# Description: Extracted from docstring

5.2 Batch Generation

from aiecs.tools.schema_generator import generate_schemas_for_tool

# Generate Schemas for all methods of a tool class
schemas = generate_schemas_for_tool(PandasTool)
# Returns: {'filter': FilterSchema, 'groupby': GroupbySchema, ...}

5.3 LangChain Integration

Automatically integrated into LangChain Adapter:

# LangChain Adapter automatically uses
operations = tool_registry.discover_operations(PandasTool)

# Priority:
# 1. Manually defined Schema (if exists)
# 2. Auto-generated Schema (fallback)

5.4 Tool Development Validation

Use validation scripts to check tool quality:

# Check type annotation coverage
aiecs tools check-annotations [tool_name]

# Validate Schema generation quality
aiecs tools validate-schemas [tool_name]

6. Configuration & Extension

6.1 Configuration Options

# Custom base class
schema = generate_schema_from_method(
    method, 
    'method_name',
    base_class=CustomBaseModel
)

# Configure Pydantic
# Automatically adds arbitrary_types_allowed=True

6.2 Extension Points

  1. Custom Type Mapping: Extend _normalize_type to handle more types

  2. Docstring Parser: Support more docstring styles

  3. Description Enhancement: Use AI to generate better descriptions

  4. Validation Rules: Add custom field validation

7. Performance & Quality Metrics

7.1 Coverage Statistics

Total Methods:       158
Successfully Generated: 149 (94.3%)
Generation Failed:   9 (5.7%)  # Methods without parameters
Total Fields:        669
Type Coverage:       100.0%
Description Quality: 74.1%
Overall Score:       89.5% (B Good)

7.2 Performance Metrics

  • Generation Speed: ~0.1ms/method

  • Memory Usage: ~1KB/Schema

  • Batch Generation: 38 methods < 10ms

7.3 Quality Improvements

Improvement Effects:

  • Tool Calling Agent Availability: 30% → 94.3% (+64.3%)

  • Development Time Saved: 13 hours → 0 hours

  • Maintenance Cost: Continuous → Zero (automatic synchronization)

8. Best Practices

8.1 Tool Development Recommendations

  1. Complete Type Annotations: Add type annotations for all parameters

  2. Detailed Docstrings: Use Google or NumPy style

  3. Meaningful Descriptions: Avoid generic descriptions like “Parameter xxx”

  4. Return Types: Add return type annotations

Example:

def filter(self, records: List[Dict], condition: str) -> List[Dict]:
    """
    Filter DataFrame based on a condition.
    
    Args:
        records: List of records to filter
        condition: Filter condition using pandas query syntax (e.g., 'age > 30')
    
    Returns:
        Filtered list of records
    """
    pass

8.2 Validation Process

# 1. Check type annotations
aiecs tools check-annotations my_tool

# 2. Validate Schema generation
aiecs tools validate-schemas my_tool

# 3. View generated Schemas
aiecs tools show-schemas my_tool --method filter

8.3 Common Questions

Q: Why do some methods fail to generate? A: Methods without parameters (except self) don’t need Schemas, this is expected behavior.

Q: How to improve description quality? A: Add detailed parameter descriptions in the Args section of the docstring.

Q: How to choose between manual Schema and auto-generation? A: Manual Schema takes priority. Auto-generation is only used when no manual Schema exists.

Q: How to handle complex types? A: Use _normalize_type to map complex types to Any, or add custom mappings.

9. Troubleshooting

9.1 Common Errors

Error

Cause

Solution

Generation Failed

Missing type annotations

Add complete type annotations

Description is “Parameter xxx”

Missing docstring

Add Args section

Pydantic Validation Error

Unsupported type

Use _normalize_type to map

Field Count Mismatch

Signature parsing failed

Check method signature format

9.2 Debugging Tips

import logging
logging.basicConfig(level=logging.DEBUG)

# View detailed logs
schema = generate_schema_from_method(method, 'method_name')

10. Future Plans

10.1 Short-term Plans

  • Support more docstring styles (Sphinx, reStructuredText)

  • AI-assisted description generation

  • Schema quality scoring system

  • Automatic fix suggestions

10.2 Long-term Vision

  • IDE integration (real-time Schema preview)

  • Schema version management

  • Cross-language Schema generation

  • Community Schema library

11. References


Document Version: 1.0
Last Updated: 2025-10-02
Maintainer: AIECS Tools Team