Schema Mapping Configuration Guide

This guide explains how to configure schema mappings for importing structured data (CSV, JSON) into knowledge graphs.

Table of Contents

Overview
Basic Concepts
Entity Mapping
Relation Mapping
Property Transformations
Complete Examples
Best Practices

Overview

Schema mapping allows you to declaratively map structured data columns to knowledge graph entities and relations. This eliminates the need for custom code for each data source.

Key Benefits:

Declarative: Define mappings in configuration, not code
Flexible: Support complex transformations (rename, type cast, compute)
Reusable: Same mapping works across multiple data sources
Type-safe: Validation ensures data consistency

Basic Concepts

SchemaMapping

The SchemaMapping class is the container for all mappings:

from aiecs.application.knowledge_graph.builder.schema_mapping import SchemaMapping

mapping = SchemaMapping(
    entity_mappings=[...],  # How to create entities
    relation_mappings=[...],  # How to create relations
    description="My data mapping"
)

EntityMapping

Maps source columns to entity types:

from aiecs.application.knowledge_graph.builder.schema_mapping import EntityMapping

entity_mapping = EntityMapping(
    source_columns=["id", "name", "age"],
    entity_type="Person",
    property_mapping={"id": "id", "name": "name", "age": "age"},
    id_column="id"
)

RelationMapping

Maps source columns to relations between entities:

from aiecs.application.knowledge_graph.builder.schema_mapping import RelationMapping

relation_mapping = RelationMapping(
    source_columns=["emp_id", "dept_id"],
    relation_type="WORKS_IN",
    source_entity_column="emp_id",
    target_entity_column="dept_id"
)

Entity Mapping

Simple Entity Mapping

Map columns directly to entity properties:

EntityMapping(
    source_columns=["id", "name", "email"],
    entity_type="Person",
    property_mapping={
        "id": "id",
        "name": "name",
        "email": "email"
    },
    id_column="id"
)

Entity Mapping with ID Column

Specify which column to use as entity ID:

EntityMapping(
    source_columns=["employee_id", "full_name", "department"],
    entity_type="Employee",
    property_mapping={
        "employee_id": "id",
        "full_name": "name",
        "department": "dept"
    },
    id_column="employee_id"  # Use employee_id as entity ID
)

Multiple Entity Types from Same Row

You can create multiple entities from a single row:

mapping = SchemaMapping(
    entity_mappings=[
        # Create Employee entity
        EntityMapping(
            source_columns=["emp_id", "emp_name"],
            entity_type="Employee",
            property_mapping={"emp_id": "id", "emp_name": "name"},
            id_column="emp_id"
        ),
        # Create Department entity from same row
        EntityMapping(
            source_columns=["dept_id", "dept_name"],
            entity_type="Department",
            property_mapping={"dept_id": "id", "dept_name": "name"},
            id_column="dept_id"
        )
    ]
)

Relation Mapping

Basic Relation Mapping

Create relations between entities:

RelationMapping(
    source_columns=["person_id", "company_id"],
    relation_type="WORKS_FOR",
    source_entity_column="person_id",
    target_entity_column="company_id"
)

Relation with Properties

Add properties to relations:

RelationMapping(
    source_columns=["person_id", "company_id", "role", "since"],
    relation_type="WORKS_FOR",
    source_entity_column="person_id",
    target_entity_column="company_id",
    property_mapping={
        "role": "position",
        "since": "start_date"
    }
)

Property Transformations

Transformations allow you to modify values during import.

Transformation Types

RENAME: Rename a column to a property
TYPE_CAST: Convert value to different type
COMPUTE: Compute value from multiple columns
CONSTANT: Use a constant value
SKIP: Skip this column

RENAME Transformation

Simply rename a column:

from aiecs.application.knowledge_graph.builder.schema_mapping import (
    PropertyTransformation,
    TransformationType
)

transformation = PropertyTransformation(
    transformation_type=TransformationType.RENAME,
    source_column="full_name",
    target_property="name"
)

TYPE_CAST Transformation

Convert string to integer, float, boolean, etc.:

from aiecs.domain.knowledge_graph.schema.property_schema import PropertyType

transformation = PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="age_str",
    target_property="age",
    target_type=PropertyType.INTEGER
)

Supported Types:

PropertyType.STRING
PropertyType.INTEGER
PropertyType.FLOAT
PropertyType.BOOLEAN
PropertyType.LIST (from JSON string or comma-separated)
PropertyType.DICT (from JSON string)
PropertyType.ANY

COMPUTE Transformation

Compute values from multiple columns:

# Concatenate first and last name
transformation = PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="first_name",
    target_property="full_name",
    compute_function="concat_space",
    compute_args=["last_name"]
)

# Sum multiple columns
transformation = PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="price1",
    target_property="total_price",
    compute_function="sum",
    compute_args=["price2", "price3"]
)

Available Compute Functions:

concat: Concatenate strings
concat_space: Concatenate with space separator
concat_comma: Concatenate with comma separator
sum: Sum numeric values
avg / average: Average numeric values
max: Maximum value
min: Minimum value

CONSTANT Transformation

Use a constant value:

transformation = PropertyTransformation(
    transformation_type=TransformationType.CONSTANT,
    target_property="status",
    constant_value="active"
)

SKIP Transformation

Skip a column (don’t import it):

transformation = PropertyTransformation(
    transformation_type=TransformationType.SKIP,
    target_property="internal_id"
)

Complete Examples

Example 1: Employee Data

CSV Structure:

emp_id,name,email,dept_id,dept_name,role,salary
E001,Alice Smith,alice@example.com,D001,Engineering,Engineer,100000
E002,Bob Jones,bob@example.com,D001,Engineering,Manager,120000

Mapping:

from aiecs.application.knowledge_graph.builder.schema_mapping import (
    SchemaMapping,
    EntityMapping,
    RelationMapping,
    PropertyTransformation,
    TransformationType
)
from aiecs.domain.knowledge_graph.schema.property_schema import PropertyType

mapping = SchemaMapping(
    entity_mappings=[
        # Employee entity
        EntityMapping(
            source_columns=["emp_id", "name", "email", "salary"],
            entity_type="Employee",
            property_mapping={
                "emp_id": "id",
                "name": "name",
                "email": "email"
            },
            transformations=[
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="salary",
                    target_property="salary",
                    target_type=PropertyType.INTEGER
                )
            ],
            id_column="emp_id"
        ),
        # Department entity
        EntityMapping(
            source_columns=["dept_id", "dept_name"],
            entity_type="Department",
            property_mapping={"dept_id": "id", "dept_name": "name"},
            id_column="dept_id"
        )
    ],
    relation_mappings=[
        RelationMapping(
            source_columns=["emp_id", "dept_id", "role"],
            relation_type="WORKS_IN",
            source_entity_column="emp_id",
            target_entity_column="dept_id",
            property_mapping={"role": "position"}
        )
    ]
)

Example 2: Product Catalog

JSON Structure:

[
  {
    "product_id": "P001",
    "product_name": "Laptop",
    "category": "Electronics",
    "price": "999.99",
    "in_stock": "true"
  }
]

Mapping:

mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["product_id", "product_name", "category", "price", "in_stock"],
            entity_type="Product",
            property_mapping={"product_id": "id", "product_name": "name"},
            transformations=[
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="price",
                    target_property="price",
                    target_type=PropertyType.FLOAT
                ),
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="in_stock",
                    target_property="available",
                    target_type=PropertyType.BOOLEAN
                ),
                PropertyTransformation(
                    transformation_type=TransformationType.RENAME,
                    source_column="category",
                    target_property="category"
                )
            ],
            id_column="product_id"
        )
    ]
)

Example 3: Complex Transformations

CSV with computed fields:

first_name,last_name,birth_year,score1,score2,score3
John,Doe,1990,85,90,88
Jane,Smith,1985,92,88,95

Mapping with computed full name and average score:

mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["first_name", "last_name", "birth_year", "score1", "score2", "score3"],
            entity_type="Student",
            transformations=[
                # Compute full name
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="first_name",
                    target_property="full_name",
                    compute_function="concat_space",
                    compute_args=["last_name"]
                ),
                # Compute average score
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="score1",
                    target_property="avg_score",
                    compute_function="avg",
                    compute_args=["score2", "score3"]
                ),
                # Calculate age from birth year
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="birth_year",
                    target_property="age",
                    compute_function="subtract",  # Would need to implement
                    compute_args=["2024"]  # Current year
                )
            ],
            id_column="first_name"  # Use first_name as ID (not recommended for production)
        )
    ]
)

Best Practices

1. Always Specify ID Columns

# ✅ Good
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person",
    id_column="id"  # Explicit ID column
)

# ❌ Avoid (uses first column as ID, less clear)
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person"
)

2. Use Type Casting for Numeric Data

# ✅ Good - CSV reads as string, cast to integer
PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="age_str",
    target_property="age",
    target_type=PropertyType.INTEGER
)

# ❌ Avoid - Leaves as string
property_mapping={"age_str": "age"}

3. Validate Mappings Before Use

mapping = SchemaMapping(...)

# Validate before importing
errors = mapping.validate()
if errors:
    print(f"Mapping errors: {errors}")
    # Fix errors before proceeding
else:
    # Safe to use
    pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)

4. Handle Missing Columns Gracefully

The pipeline will skip missing columns, but you can add validation:

# Check required columns exist
required_columns = set()
for entity_mapping in mapping.entity_mappings:
    required_columns.update(entity_mapping.source_columns)
for relation_mapping in mapping.relation_mappings:
    required_columns.update(relation_mapping.source_columns)

# Validate CSV has all required columns
csv_columns = set(df.columns)
missing = required_columns - csv_columns
if missing:
    raise ValueError(f"Missing required columns: {missing}")

5. Use Transformations for Data Cleaning

# Clean phone numbers
PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="phone_raw",
    target_property="phone",
    compute_function="clean_phone"  # Custom function
)

# Normalize text
PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="name_raw",
    target_property="name",
    target_type=PropertyType.STRING
)
# Then apply lowercase normalization in post-processing

6. Document Your Mappings

mapping = SchemaMapping(
    entity_mappings=[...],
    relation_mappings=[...],
    description="Employee and department mapping for HR system import"
)

Common Patterns

Pattern 1: One Entity Per Row

# Simple 1:1 mapping
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person",
    property_mapping={"id": "id", "name": "name"},
    id_column="id"
)

Pattern 2: Multiple Entities Per Row

# Create both Employee and Department from same row
EntityMapping(
    source_columns=["emp_id", "emp_name", "dept_id", "dept_name"],
    entity_type="Employee",
    ...
),
EntityMapping(
    source_columns=["emp_id", "emp_name", "dept_id", "dept_name"],
    entity_type="Department",
    ...
)

Pattern 3: Relations from Same Row

# Create relation between entities created in same row
RelationMapping(
    source_columns=["emp_id", "dept_id"],
    relation_type="WORKS_IN",
    source_entity_column="emp_id",
    target_entity_column="dept_id"
)

Pattern 4: Nested JSON

For nested JSON structures, flatten first or use multiple mappings:

{
  "employee": {
    "id": "E001",
    "name": "Alice"
  },
  "department": {
    "id": "D001",
    "name": "Engineering"
  }
}

Flatten to:

# Flatten in preprocessing or use JSON path extraction
EntityMapping(
    source_columns=["employee_id", "employee_name", "dept_id", "dept_name"],
    ...
)

Troubleshooting

Issue: Entities Not Created

Check:

Are source columns present in data?
Is id_column specified and present?
Are transformations failing silently? (Check warnings in ImportResult)

Issue: Relations Not Created

Check:

Are source and target entity columns present?
Do the entity IDs exist in the graph?
Are entity mappings creating entities before relations?

Issue: Type Casting Fails

Check:

Are values in correct format? (e.g., “123” not “abc” for INTEGER)
Use skip_errors=False to see detailed errors
Add data validation before import

Issue: Computed Values Wrong

Check:

Are all source columns present?
Are values numeric for sum/avg/max/min?
Check compute function name spelling

Next Steps

See StructuredDataPipeline Usage Examples for how to use mappings
See CSV-to-Graph Tutorial for complete CSV example
See JSON-to-Graph Tutorial for complete JSON example