Schema Mapping Configuration Guide

This guide explains how to configure schema mappings for importing structured data (CSV, JSON) into knowledge graphs.

Table of Contents

  1. Overview

  2. Basic Concepts

  3. Entity Mapping

  4. Relation Mapping

  5. Property Transformations

  6. Complete Examples

  7. Best Practices

Overview

Schema mapping allows you to declaratively map structured data columns to knowledge graph entities and relations. This eliminates the need for custom code for each data source.

Key Benefits:

  • Declarative: Define mappings in configuration, not code

  • Flexible: Support complex transformations (rename, type cast, compute)

  • Reusable: Same mapping works across multiple data sources

  • Type-safe: Validation ensures data consistency

Basic Concepts

SchemaMapping

The SchemaMapping class is the container for all mappings:

from aiecs.application.knowledge_graph.builder.schema_mapping import SchemaMapping

mapping = SchemaMapping(
    entity_mappings=[...],  # How to create entities
    relation_mappings=[...],  # How to create relations
    description="My data mapping"
)

EntityMapping

Maps source columns to entity types:

from aiecs.application.knowledge_graph.builder.schema_mapping import EntityMapping

entity_mapping = EntityMapping(
    source_columns=["id", "name", "age"],
    entity_type="Person",
    property_mapping={"id": "id", "name": "name", "age": "age"},
    id_column="id"
)

RelationMapping

Maps source columns to relations between entities:

from aiecs.application.knowledge_graph.builder.schema_mapping import RelationMapping

relation_mapping = RelationMapping(
    source_columns=["emp_id", "dept_id"],
    relation_type="WORKS_IN",
    source_entity_column="emp_id",
    target_entity_column="dept_id"
)

Entity Mapping

Simple Entity Mapping

Map columns directly to entity properties:

EntityMapping(
    source_columns=["id", "name", "email"],
    entity_type="Person",
    property_mapping={
        "id": "id",
        "name": "name",
        "email": "email"
    },
    id_column="id"
)

Entity Mapping with ID Column

Specify which column to use as entity ID:

EntityMapping(
    source_columns=["employee_id", "full_name", "department"],
    entity_type="Employee",
    property_mapping={
        "employee_id": "id",
        "full_name": "name",
        "department": "dept"
    },
    id_column="employee_id"  # Use employee_id as entity ID
)

Multiple Entity Types from Same Row

You can create multiple entities from a single row:

mapping = SchemaMapping(
    entity_mappings=[
        # Create Employee entity
        EntityMapping(
            source_columns=["emp_id", "emp_name"],
            entity_type="Employee",
            property_mapping={"emp_id": "id", "emp_name": "name"},
            id_column="emp_id"
        ),
        # Create Department entity from same row
        EntityMapping(
            source_columns=["dept_id", "dept_name"],
            entity_type="Department",
            property_mapping={"dept_id": "id", "dept_name": "name"},
            id_column="dept_id"
        )
    ]
)

Relation Mapping

Basic Relation Mapping

Create relations between entities:

RelationMapping(
    source_columns=["person_id", "company_id"],
    relation_type="WORKS_FOR",
    source_entity_column="person_id",
    target_entity_column="company_id"
)

Relation with Properties

Add properties to relations:

RelationMapping(
    source_columns=["person_id", "company_id", "role", "since"],
    relation_type="WORKS_FOR",
    source_entity_column="person_id",
    target_entity_column="company_id",
    property_mapping={
        "role": "position",
        "since": "start_date"
    }
)

Property Transformations

Transformations allow you to modify values during import.

Transformation Types

  1. RENAME: Rename a column to a property

  2. TYPE_CAST: Convert value to different type

  3. COMPUTE: Compute value from multiple columns

  4. CONSTANT: Use a constant value

  5. SKIP: Skip this column

RENAME Transformation

Simply rename a column:

from aiecs.application.knowledge_graph.builder.schema_mapping import (
    PropertyTransformation,
    TransformationType
)

transformation = PropertyTransformation(
    transformation_type=TransformationType.RENAME,
    source_column="full_name",
    target_property="name"
)

TYPE_CAST Transformation

Convert string to integer, float, boolean, etc.:

from aiecs.domain.knowledge_graph.schema.property_schema import PropertyType

transformation = PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="age_str",
    target_property="age",
    target_type=PropertyType.INTEGER
)

Supported Types:

  • PropertyType.STRING

  • PropertyType.INTEGER

  • PropertyType.FLOAT

  • PropertyType.BOOLEAN

  • PropertyType.LIST (from JSON string or comma-separated)

  • PropertyType.DICT (from JSON string)

  • PropertyType.ANY

COMPUTE Transformation

Compute values from multiple columns:

# Concatenate first and last name
transformation = PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="first_name",
    target_property="full_name",
    compute_function="concat_space",
    compute_args=["last_name"]
)

# Sum multiple columns
transformation = PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="price1",
    target_property="total_price",
    compute_function="sum",
    compute_args=["price2", "price3"]
)

Available Compute Functions:

  • concat: Concatenate strings

  • concat_space: Concatenate with space separator

  • concat_comma: Concatenate with comma separator

  • sum: Sum numeric values

  • avg / average: Average numeric values

  • max: Maximum value

  • min: Minimum value

CONSTANT Transformation

Use a constant value:

transformation = PropertyTransformation(
    transformation_type=TransformationType.CONSTANT,
    target_property="status",
    constant_value="active"
)

SKIP Transformation

Skip a column (don’t import it):

transformation = PropertyTransformation(
    transformation_type=TransformationType.SKIP,
    target_property="internal_id"
)

Complete Examples

Example 1: Employee Data

CSV Structure:

emp_id,name,email,dept_id,dept_name,role,salary
E001,Alice Smith,alice@example.com,D001,Engineering,Engineer,100000
E002,Bob Jones,bob@example.com,D001,Engineering,Manager,120000

Mapping:

from aiecs.application.knowledge_graph.builder.schema_mapping import (
    SchemaMapping,
    EntityMapping,
    RelationMapping,
    PropertyTransformation,
    TransformationType
)
from aiecs.domain.knowledge_graph.schema.property_schema import PropertyType

mapping = SchemaMapping(
    entity_mappings=[
        # Employee entity
        EntityMapping(
            source_columns=["emp_id", "name", "email", "salary"],
            entity_type="Employee",
            property_mapping={
                "emp_id": "id",
                "name": "name",
                "email": "email"
            },
            transformations=[
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="salary",
                    target_property="salary",
                    target_type=PropertyType.INTEGER
                )
            ],
            id_column="emp_id"
        ),
        # Department entity
        EntityMapping(
            source_columns=["dept_id", "dept_name"],
            entity_type="Department",
            property_mapping={"dept_id": "id", "dept_name": "name"},
            id_column="dept_id"
        )
    ],
    relation_mappings=[
        RelationMapping(
            source_columns=["emp_id", "dept_id", "role"],
            relation_type="WORKS_IN",
            source_entity_column="emp_id",
            target_entity_column="dept_id",
            property_mapping={"role": "position"}
        )
    ]
)

Example 2: Product Catalog

JSON Structure:

[
  {
    "product_id": "P001",
    "product_name": "Laptop",
    "category": "Electronics",
    "price": "999.99",
    "in_stock": "true"
  }
]

Mapping:

mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["product_id", "product_name", "category", "price", "in_stock"],
            entity_type="Product",
            property_mapping={"product_id": "id", "product_name": "name"},
            transformations=[
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="price",
                    target_property="price",
                    target_type=PropertyType.FLOAT
                ),
                PropertyTransformation(
                    transformation_type=TransformationType.TYPE_CAST,
                    source_column="in_stock",
                    target_property="available",
                    target_type=PropertyType.BOOLEAN
                ),
                PropertyTransformation(
                    transformation_type=TransformationType.RENAME,
                    source_column="category",
                    target_property="category"
                )
            ],
            id_column="product_id"
        )
    ]
)

Example 3: Complex Transformations

CSV with computed fields:

first_name,last_name,birth_year,score1,score2,score3
John,Doe,1990,85,90,88
Jane,Smith,1985,92,88,95

Mapping with computed full name and average score:

mapping = SchemaMapping(
    entity_mappings=[
        EntityMapping(
            source_columns=["first_name", "last_name", "birth_year", "score1", "score2", "score3"],
            entity_type="Student",
            transformations=[
                # Compute full name
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="first_name",
                    target_property="full_name",
                    compute_function="concat_space",
                    compute_args=["last_name"]
                ),
                # Compute average score
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="score1",
                    target_property="avg_score",
                    compute_function="avg",
                    compute_args=["score2", "score3"]
                ),
                # Calculate age from birth year
                PropertyTransformation(
                    transformation_type=TransformationType.COMPUTE,
                    source_column="birth_year",
                    target_property="age",
                    compute_function="subtract",  # Would need to implement
                    compute_args=["2024"]  # Current year
                )
            ],
            id_column="first_name"  # Use first_name as ID (not recommended for production)
        )
    ]
)

Best Practices

1. Always Specify ID Columns

# ✅ Good
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person",
    id_column="id"  # Explicit ID column
)

# ❌ Avoid (uses first column as ID, less clear)
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person"
)

2. Use Type Casting for Numeric Data

# ✅ Good - CSV reads as string, cast to integer
PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="age_str",
    target_property="age",
    target_type=PropertyType.INTEGER
)

# ❌ Avoid - Leaves as string
property_mapping={"age_str": "age"}

3. Validate Mappings Before Use

mapping = SchemaMapping(...)

# Validate before importing
errors = mapping.validate()
if errors:
    print(f"Mapping errors: {errors}")
    # Fix errors before proceeding
else:
    # Safe to use
    pipeline = StructuredDataPipeline(mapping=mapping, graph_store=store)

4. Handle Missing Columns Gracefully

The pipeline will skip missing columns, but you can add validation:

# Check required columns exist
required_columns = set()
for entity_mapping in mapping.entity_mappings:
    required_columns.update(entity_mapping.source_columns)
for relation_mapping in mapping.relation_mappings:
    required_columns.update(relation_mapping.source_columns)

# Validate CSV has all required columns
csv_columns = set(df.columns)
missing = required_columns - csv_columns
if missing:
    raise ValueError(f"Missing required columns: {missing}")

5. Use Transformations for Data Cleaning

# Clean phone numbers
PropertyTransformation(
    transformation_type=TransformationType.COMPUTE,
    source_column="phone_raw",
    target_property="phone",
    compute_function="clean_phone"  # Custom function
)

# Normalize text
PropertyTransformation(
    transformation_type=TransformationType.TYPE_CAST,
    source_column="name_raw",
    target_property="name",
    target_type=PropertyType.STRING
)
# Then apply lowercase normalization in post-processing

6. Document Your Mappings

mapping = SchemaMapping(
    entity_mappings=[...],
    relation_mappings=[...],
    description="Employee and department mapping for HR system import"
)

Common Patterns

Pattern 1: One Entity Per Row

# Simple 1:1 mapping
EntityMapping(
    source_columns=["id", "name"],
    entity_type="Person",
    property_mapping={"id": "id", "name": "name"},
    id_column="id"
)

Pattern 2: Multiple Entities Per Row

# Create both Employee and Department from same row
EntityMapping(
    source_columns=["emp_id", "emp_name", "dept_id", "dept_name"],
    entity_type="Employee",
    ...
),
EntityMapping(
    source_columns=["emp_id", "emp_name", "dept_id", "dept_name"],
    entity_type="Department",
    ...
)

Pattern 3: Relations from Same Row

# Create relation between entities created in same row
RelationMapping(
    source_columns=["emp_id", "dept_id"],
    relation_type="WORKS_IN",
    source_entity_column="emp_id",
    target_entity_column="dept_id"
)

Pattern 4: Nested JSON

For nested JSON structures, flatten first or use multiple mappings:

{
  "employee": {
    "id": "E001",
    "name": "Alice"
  },
  "department": {
    "id": "D001",
    "name": "Engineering"
  }
}

Flatten to:

# Flatten in preprocessing or use JSON path extraction
EntityMapping(
    source_columns=["employee_id", "employee_name", "dept_id", "dept_name"],
    ...
)

Troubleshooting

Issue: Entities Not Created

Check:

  1. Are source columns present in data?

  2. Is id_column specified and present?

  3. Are transformations failing silently? (Check warnings in ImportResult)

Issue: Relations Not Created

Check:

  1. Are source and target entity columns present?

  2. Do the entity IDs exist in the graph?

  3. Are entity mappings creating entities before relations?

Issue: Type Casting Fails

Check:

  1. Are values in correct format? (e.g., “123” not “abc” for INTEGER)

  2. Use skip_errors=False to see detailed errors

  3. Add data validation before import

Issue: Computed Values Wrong

Check:

  1. Are all source columns present?

  2. Are values numeric for sum/avg/max/min?

  3. Check compute function name spelling

Next Steps