# Knowledge Graph Builder Tool

**Tool Name**: `kg_builder`  
**Category**: Knowledge Graph Construction  
**Status**: ✅ Complete

## Overview

The `KnowledgeGraphBuilderTool` provides comprehensive knowledge graph construction capabilities for AIECS agents. It supports building graphs from unstructured text, documents, and structured data (CSV/JSON).

## Features

- **Text-to-Graph**: Extract entities and relations from natural language text
- **Document-to-Graph**: Process documents with chunking and batch extraction
- **Structured Data Import**: Import from CSV and JSON files with schema mapping
- **Statistics**: Get knowledge graph statistics and metrics

## Tool Registration

The tool is automatically registered with the AIECS tool registry:

```python
from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool

# Tool is registered as "kg_builder"
```

## Input Schema

### KnowledgeGraphBuilderInput

```python
{
    "action": str,  # Required: "build_from_text", "build_from_document", "build_from_structured_data", "get_stats"
    "text": str,  # Optional: Text to process (for build_from_text)
    "document_path": str,  # Optional: Document path (for build_from_document)
    "data_path": str,  # Optional: Data file path (for build_from_structured_data)
    "schema_mapping": dict,  # Optional: Schema mapping (for build_from_structured_data)
    "source": str,  # Optional: Source identifier (default: "unknown")
    "entity_types": List[str],  # Optional: Entity types to extract
    "chunk_size": int,  # Optional: Chunk size for documents (default: 1000)
    "chunk_overlap": int  # Optional: Chunk overlap (default: 200)
}
```

## Actions

### 1. Build from Text

**Action**: `"build_from_text"`

Extract entities and relations from natural language text.

**Example**:
```python
result = await tool.run(
    op="kg_builder",
    action="build_from_text",
    text="Alice works at Tech Corp in San Francisco. Bob is her colleague.",
    source="example_doc",
    entity_types=["Person", "Company", "Location"]
)
```

**Response**:
```python
{
    "success": True,
    "source": "example_doc",
    "entities_added": 4,
    "relations_added": 3,
    "entities": [
        {
            "id": "entity_1",
            "type": "Person",
            "properties": {"name": "Alice"}
        },
        {
            "id": "entity_2",
            "type": "Company",
            "properties": {"name": "Tech Corp"}
        },
        {
            "id": "entity_3",
            "type": "Location",
            "properties": {"name": "San Francisco"}
        },
        {
            "id": "entity_4",
            "type": "Person",
            "properties": {"name": "Bob"}
        }
    ],
    "relations": [
        {
            "source_id": "entity_1",
            "target_id": "entity_2",
            "relation_type": "WORKS_FOR"
        },
        {
            "source_id": "entity_2",
            "target_id": "entity_3",
            "relation_type": "LOCATED_IN"
        },
        {
            "source_id": "entity_4",
            "target_id": "entity_1",
            "relation_type": "COLLEAGUE_OF"
        }
    ]
}
```

### 2. Build from Document

**Action**: `"build_from_document"`

Process documents with automatic chunking and batch extraction.

**Example**:
```python
result = await tool.run(
    op="kg_builder",
    action="build_from_document",
    document_path="/path/to/document.txt",
    source="research_paper",
    chunk_size=1000,
    chunk_overlap=200,
    entity_types=["Person", "Organization", "Technology"]
)
```

**Response**:
```python
{
    "success": True,
    "document_path": "/path/to/document.txt",
    "source": "research_paper",
    "total_chunks": 15,
    "chunks_processed": 15,
    "total_entities_added": 87,
    "total_relations_added": 124,
    "errors": []
}
```

### 3. Build from Structured Data (NEW)

**Action**: `"build_from_structured_data"`

Import entities and relations from CSV or JSON files using schema mapping.

**Example**:
```python
# Define schema mapping
schema_mapping = {
    "entity_mappings": [
        {
            "entity_type": "Person",
            "id_column": "person_id",
            "property_mappings": {
                "name": "full_name",
                "age": "age",
                "role": "job_title"
            }
        },
        {
            "entity_type": "Company",
            "id_column": "company_id",
            "property_mappings": {
                "name": "company_name",
                "industry": "sector"
            }
        }
    ],
    "relation_mappings": [
        {
            "relation_type": "WORKS_FOR",
            "source_column": "person_id",
            "target_column": "company_id",
            "source_type": "Person",
            "target_type": "Company"
        }
    ]
}

result = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="/path/to/employees.csv",
    schema_mapping=schema_mapping
)
```

**Response**:
```python
{
    "success": True,
    "data_path": "/path/to/employees.csv",
    "entities_added": 250,
    "relations_added": 250,
    "rows_processed": 250,
    "rows_failed": 0,
    "duration_seconds": 2.5,
    "errors": [],
    "warnings": []
}
```

**Supported File Formats:**
- **CSV**: Comma-separated values with header row
- **JSON**: Array of objects or newline-delimited JSON

**Schema Mapping Structure:**

```python
{
    "entity_mappings": [
        {
            "entity_type": str,  # Entity type name
            "id_column": str,  # Column containing entity ID
            "property_mappings": {
                "property_name": "column_name",  # Map properties to columns
                ...
            }
        }
    ],
    "relation_mappings": [
        {
            "relation_type": str,  # Relation type name
            "source_column": str,  # Column containing source entity ID
            "target_column": str,  # Column containing target entity ID
            "source_type": str,  # Source entity type
            "target_type": str,  # Target entity type
            "property_mappings": {  # Optional relation properties
                "property_name": "column_name",
                ...
            }
        }
    ]
}
```

**Use Cases:**
- Importing existing databases into knowledge graphs
- Migrating from relational to graph databases
- Bulk data loading
- ETL pipelines

### 4. Get Statistics

**Action**: `"get_stats"`

Get knowledge graph statistics and metrics.

**Example**:
```python
result = await tool.run(
    op="kg_builder",
    action="get_stats"
)
```

**Response**:
```python
{
    "success": True,
    "stats": {
        "num_entities": 341,
        "num_relations": 478,
        "entity_types": {
            "Person": 150,
            "Company": 75,
            "Location": 50,
            "Technology": 66
        },
        "relation_types": {
            "WORKS_FOR": 150,
            "LOCATED_IN": 125,
            "USES": 100,
            "COLLEAGUE_OF": 103
        }
    }
}
```

## Advanced Usage

### Combining Actions

Build a comprehensive knowledge graph from multiple sources:

```python
# Step 1: Import structured data
result1 = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="employees.csv",
    schema_mapping=employee_schema
)

# Step 2: Add unstructured text
result2 = await tool.run(
    op="kg_builder",
    action="build_from_text",
    text="Alice recently led the AI initiative at Tech Corp...",
    source="news_article"
)

# Step 3: Process documents
result3 = await tool.run(
    op="kg_builder",
    action="build_from_document",
    document_path="company_report.pdf",
    source="annual_report"
)

# Step 4: Get final statistics
stats = await tool.run(
    op="kg_builder",
    action="get_stats"
)
```

### Error Handling

```python
result = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="data.csv",
    schema_mapping=schema
)

if not result["success"]:
    print(f"Error: {result['error']}")
else:
    print(f"Imported {result['entities_added']} entities")
    if result.get("warnings"):
        print(f"Warnings: {result['warnings']}")
```

## Best Practices

### 1. Choose the Right Action

- **build_from_text**: For short text snippets, chat messages, or single paragraphs
- **build_from_document**: For long documents, articles, or reports
- **build_from_structured_data**: For existing databases, CSV exports, or JSON data

### 2. Schema Mapping Tips

- Use meaningful entity and relation type names
- Map all relevant properties for rich graph data
- Include ID columns for entity deduplication
- Test with small datasets first

### 3. Performance Optimization

- Use batch processing for large documents (build_from_document)
- Adjust chunk_size and chunk_overlap for optimal extraction
- Enable error skipping for robust imports (skip_errors=True in pipeline)
- Monitor statistics to track growth

### 4. Data Quality

- Validate schema mappings before large imports
- Check for duplicate entities using entity IDs
- Review extraction results for accuracy
- Use entity types to filter and organize data

## See Also

- [Structured Data Pipeline Guide](../STRUCTURED_DATA_PIPELINE.md)
- [Schema Mapping Guide](../SCHEMA_MAPPING_GUIDE.md)
- [CSV to Graph Tutorial](../examples/csv_to_graph_tutorial.md)
- [JSON to Graph Tutorial](../examples/json_to_graph_tutorial.md)