Knowledge Graph Builder Tool

Tool Name: kg_builder
Category: Knowledge Graph Construction
Status: ✅ Complete

Overview

The KnowledgeGraphBuilderTool provides comprehensive knowledge graph construction capabilities for AIECS agents. It supports building graphs from unstructured text, documents, and structured data (CSV/JSON).

Features

  • Text-to-Graph: Extract entities and relations from natural language text

  • Document-to-Graph: Process documents with chunking and batch extraction

  • Structured Data Import: Import from CSV and JSON files with schema mapping

  • Statistics: Get knowledge graph statistics and metrics

Tool Registration

The tool is automatically registered with the AIECS tool registry:

from aiecs.tools.knowledge_graph import KnowledgeGraphBuilderTool

# Tool is registered as "kg_builder"

Input Schema

KnowledgeGraphBuilderInput

{
    "action": str,  # Required: "build_from_text", "build_from_document", "build_from_structured_data", "get_stats"
    "text": str,  # Optional: Text to process (for build_from_text)
    "document_path": str,  # Optional: Document path (for build_from_document)
    "data_path": str,  # Optional: Data file path (for build_from_structured_data)
    "schema_mapping": dict,  # Optional: Schema mapping (for build_from_structured_data)
    "source": str,  # Optional: Source identifier (default: "unknown")
    "entity_types": List[str],  # Optional: Entity types to extract
    "chunk_size": int,  # Optional: Chunk size for documents (default: 1000)
    "chunk_overlap": int  # Optional: Chunk overlap (default: 200)
}

Actions

1. Build from Text

Action: "build_from_text"

Extract entities and relations from natural language text.

Example:

result = await tool.run(
    op="kg_builder",
    action="build_from_text",
    text="Alice works at Tech Corp in San Francisco. Bob is her colleague.",
    source="example_doc",
    entity_types=["Person", "Company", "Location"]
)

Response:

{
    "success": True,
    "source": "example_doc",
    "entities_added": 4,
    "relations_added": 3,
    "entities": [
        {
            "id": "entity_1",
            "type": "Person",
            "properties": {"name": "Alice"}
        },
        {
            "id": "entity_2",
            "type": "Company",
            "properties": {"name": "Tech Corp"}
        },
        {
            "id": "entity_3",
            "type": "Location",
            "properties": {"name": "San Francisco"}
        },
        {
            "id": "entity_4",
            "type": "Person",
            "properties": {"name": "Bob"}
        }
    ],
    "relations": [
        {
            "source_id": "entity_1",
            "target_id": "entity_2",
            "relation_type": "WORKS_FOR"
        },
        {
            "source_id": "entity_2",
            "target_id": "entity_3",
            "relation_type": "LOCATED_IN"
        },
        {
            "source_id": "entity_4",
            "target_id": "entity_1",
            "relation_type": "COLLEAGUE_OF"
        }
    ]
}

2. Build from Document

Action: "build_from_document"

Process documents with automatic chunking and batch extraction.

Example:

result = await tool.run(
    op="kg_builder",
    action="build_from_document",
    document_path="/path/to/document.txt",
    source="research_paper",
    chunk_size=1000,
    chunk_overlap=200,
    entity_types=["Person", "Organization", "Technology"]
)

Response:

{
    "success": True,
    "document_path": "/path/to/document.txt",
    "source": "research_paper",
    "total_chunks": 15,
    "chunks_processed": 15,
    "total_entities_added": 87,
    "total_relations_added": 124,
    "errors": []
}

3. Build from Structured Data (NEW)

Action: "build_from_structured_data"

Import entities and relations from CSV or JSON files using schema mapping.

Example:

# Define schema mapping
schema_mapping = {
    "entity_mappings": [
        {
            "entity_type": "Person",
            "id_column": "person_id",
            "property_mappings": {
                "name": "full_name",
                "age": "age",
                "role": "job_title"
            }
        },
        {
            "entity_type": "Company",
            "id_column": "company_id",
            "property_mappings": {
                "name": "company_name",
                "industry": "sector"
            }
        }
    ],
    "relation_mappings": [
        {
            "relation_type": "WORKS_FOR",
            "source_column": "person_id",
            "target_column": "company_id",
            "source_type": "Person",
            "target_type": "Company"
        }
    ]
}

result = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="/path/to/employees.csv",
    schema_mapping=schema_mapping
)

Response:

{
    "success": True,
    "data_path": "/path/to/employees.csv",
    "entities_added": 250,
    "relations_added": 250,
    "rows_processed": 250,
    "rows_failed": 0,
    "duration_seconds": 2.5,
    "errors": [],
    "warnings": []
}

Supported File Formats:

  • CSV: Comma-separated values with header row

  • JSON: Array of objects or newline-delimited JSON

Schema Mapping Structure:

{
    "entity_mappings": [
        {
            "entity_type": str,  # Entity type name
            "id_column": str,  # Column containing entity ID
            "property_mappings": {
                "property_name": "column_name",  # Map properties to columns
                ...
            }
        }
    ],
    "relation_mappings": [
        {
            "relation_type": str,  # Relation type name
            "source_column": str,  # Column containing source entity ID
            "target_column": str,  # Column containing target entity ID
            "source_type": str,  # Source entity type
            "target_type": str,  # Target entity type
            "property_mappings": {  # Optional relation properties
                "property_name": "column_name",
                ...
            }
        }
    ]
}

Use Cases:

  • Importing existing databases into knowledge graphs

  • Migrating from relational to graph databases

  • Bulk data loading

  • ETL pipelines

4. Get Statistics

Action: "get_stats"

Get knowledge graph statistics and metrics.

Example:

result = await tool.run(
    op="kg_builder",
    action="get_stats"
)

Response:

{
    "success": True,
    "stats": {
        "num_entities": 341,
        "num_relations": 478,
        "entity_types": {
            "Person": 150,
            "Company": 75,
            "Location": 50,
            "Technology": 66
        },
        "relation_types": {
            "WORKS_FOR": 150,
            "LOCATED_IN": 125,
            "USES": 100,
            "COLLEAGUE_OF": 103
        }
    }
}

Advanced Usage

Combining Actions

Build a comprehensive knowledge graph from multiple sources:

# Step 1: Import structured data
result1 = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="employees.csv",
    schema_mapping=employee_schema
)

# Step 2: Add unstructured text
result2 = await tool.run(
    op="kg_builder",
    action="build_from_text",
    text="Alice recently led the AI initiative at Tech Corp...",
    source="news_article"
)

# Step 3: Process documents
result3 = await tool.run(
    op="kg_builder",
    action="build_from_document",
    document_path="company_report.pdf",
    source="annual_report"
)

# Step 4: Get final statistics
stats = await tool.run(
    op="kg_builder",
    action="get_stats"
)

Error Handling

result = await tool.run(
    op="kg_builder",
    action="build_from_structured_data",
    data_path="data.csv",
    schema_mapping=schema
)

if not result["success"]:
    print(f"Error: {result['error']}")
else:
    print(f"Imported {result['entities_added']} entities")
    if result.get("warnings"):
        print(f"Warnings: {result['warnings']}")

Best Practices

1. Choose the Right Action

  • build_from_text: For short text snippets, chat messages, or single paragraphs

  • build_from_document: For long documents, articles, or reports

  • build_from_structured_data: For existing databases, CSV exports, or JSON data

2. Schema Mapping Tips

  • Use meaningful entity and relation type names

  • Map all relevant properties for rich graph data

  • Include ID columns for entity deduplication

  • Test with small datasets first

3. Performance Optimization

  • Use batch processing for large documents (build_from_document)

  • Adjust chunk_size and chunk_overlap for optimal extraction

  • Enable error skipping for robust imports (skip_errors=True in pipeline)

  • Monitor statistics to track growth

4. Data Quality

  • Validate schema mappings before large imports

  • Check for duplicate entities using entity IDs

  • Review extraction results for accuracy

  • Use entity types to filter and organize data

See Also