# Custom LLM Client Integration

This document describes how to integrate custom LLM providers into the AIECS framework using the `LLMClientProtocol` interface.

## Overview

AIECS supports custom LLM providers through a protocol-based interface that allows you to:
- Integrate local LLM models (Ollama, LM Studio, etc.)
- Use proprietary or specialized LLM services
- Implement custom embedding providers
- Configure different LLMs for different operations (entity extraction, RAG strategy selection, embeddings)

## LLMClientProtocol Interface

Custom LLM clients must implement the `LLMClientProtocol` interface defined in `aiecs/llm/protocols.py`:

```python
from typing import Protocol, List, Optional, AsyncGenerator, Dict, Any

@runtime_checkable
class LLMClientProtocol(Protocol):
    """Protocol that all LLM clients must implement"""

    @property
    def provider_name(self) -> str:
        """Name of the LLM provider"""
        ...

    @property
    def model_name(self) -> str:
        """Name of the model being used"""
        ...

    async def generate_text(
        self,
        prompt: str,
        max_tokens: Optional[int] = None,
        temperature: Optional[float] = None,
        **kwargs
    ) -> str:
        """Generate text from a prompt"""
        ...

    async def stream_text(
        self,
        prompt: str,
        max_tokens: Optional[int] = None,
        temperature: Optional[float] = None,
        **kwargs
    ) -> AsyncGenerator[str, None]:
        """Stream text generation from a prompt"""
        ...

    async def get_embeddings(
        self,
        texts: List[str],
        **kwargs
    ) -> List[List[float]]:
        """Generate embeddings for a list of texts"""
        ...
```

## Implementing a Custom LLM Client

### Example 1: Local Model Client (Ollama)

```python
from typing import List, Optional, AsyncGenerator
import httpx

class OllamaClient:
    """Custom client for Ollama local models"""

    def __init__(self, model: str = "llama2", base_url: str = "http://localhost:11434"):
        self._model = model
        self._base_url = base_url
        self._client = httpx.AsyncClient()

    @property
    def provider_name(self) -> str:
        return "ollama"

    @property
    def model_name(self) -> str:
        return self._model

    async def generate_text(
        self,
        prompt: str,
        max_tokens: Optional[int] = None,
        temperature: Optional[float] = None,
        **kwargs
    ) -> str:
        """Generate text using Ollama API"""
        payload = {
            "model": self._model,
            "prompt": prompt,
            "stream": False
        }

        if temperature is not None:
            payload["temperature"] = temperature

        response = await self._client.post(
            f"{self._base_url}/api/generate",
            json=payload
        )
        response.raise_for_status()
        return response.json()["response"]

    async def stream_text(
        self,
        prompt: str,
        max_tokens: Optional[int] = None,
        temperature: Optional[float] = None,
        **kwargs
    ) -> AsyncGenerator[str, None]:
        """Stream text generation using Ollama API"""
        payload = {
            "model": self._model,
            "prompt": prompt,
            "stream": True
        }

        if temperature is not None:
            payload["temperature"] = temperature

        async with self._client.stream(
            "POST",
            f"{self._base_url}/api/generate",
            json=payload
        ) as response:
            response.raise_for_status()
            async for line in response.aiter_lines():
                if line:
                    import json
                    data = json.loads(line)
                    if "response" in data:
                        yield data["response"]

    async def get_embeddings(
        self,
        texts: List[str],
        **kwargs
    ) -> List[List[float]]:
        """Generate embeddings using Ollama API"""
        embeddings = []
        for text in texts:
            response = await self._client.post(
                f"{self._base_url}/api/embeddings",
                json={"model": self._model, "prompt": text}
            )
            response.raise_for_status()
            embeddings.append(response.json()["embedding"])
        return embeddings

    async def close(self):
        """Clean up resources"""
        await self._client.aclose()
```

### Example 2: Custom Embedding Client

```python
from typing import List

class CustomEmbeddingClient:
    """Custom embedding client for specialized embeddings"""

    def __init__(self, model: str = "all-MiniLM-L6-v2"):
        self._model = model

    @property
    def provider_name(self) -> str:
        return "custom-embeddings"

    @property
    def model_name(self) -> str:
        return self._model

    async def generate_text(self, prompt: str, **kwargs) -> str:
        """Not used for embedding-only clients"""
        raise NotImplementedError("This client only supports embeddings")

    async def stream_text(self, prompt: str, **kwargs):
        """Not used for embedding-only clients"""
        raise NotImplementedError("This client only supports embeddings")

    async def get_embeddings(self, texts: List[str], **kwargs) -> List[List[float]]:
        """Generate custom embeddings"""
        # Example: Using a local sentence transformer model
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer(self._model)
        embeddings = model.encode(texts)
        return embeddings.tolist()
```

## Registering Custom Providers

Once you've implemented a custom client, register it with the `LLMClientFactory`:

```python
from aiecs.llm import LLMClientFactory

# Create your custom client instance
ollama_client = OllamaClient(model="llama2")

# Register it with a unique provider name
LLMClientFactory.register_custom_provider("ollama", ollama_client)

# Now you can use it anywhere in AIECS
from aiecs.llm import resolve_llm_client

client = resolve_llm_client("ollama")
response = await client.generate_text("What is the capital of France?")
```

## Configuration-Driven Usage

You can configure AIECS to use custom providers through environment variables:

### Entity Extraction Configuration

```bash
# Use custom LLM for entity extraction
export KG_ENTITY_EXTRACTION_PROVIDER="ollama"
export KG_ENTITY_EXTRACTION_MODEL="llama2"
```

```python
from aiecs.application.knowledge_graph.extraction import LLMEntityExtractor

# Register your custom provider first
LLMClientFactory.register_custom_provider("ollama", OllamaClient(model="llama2"))

# Create extractor using configuration
extractor = LLMEntityExtractor.from_config()
# Will automatically use the ollama provider from environment variables
```

### RAG Strategy Selection Configuration

```bash
# Use custom LLM for query intent classification
export KG_STRATEGY_SELECTION_PROVIDER="ollama"
export KG_STRATEGY_SELECTION_MODEL="llama2"
```

```python
from aiecs.application.knowledge_graph.retrieval import QueryIntentClassifier

# Register your custom provider first
LLMClientFactory.register_custom_provider("ollama", OllamaClient(model="llama2"))

# Create classifier using configuration
classifier = QueryIntentClassifier.from_config()
# Will automatically use the ollama provider from environment variables
```

### Embedding Configuration

```bash
# Use custom embedding provider
export KG_EMBEDDING_PROVIDER="custom-embeddings"
export KG_EMBEDDING_MODEL="all-MiniLM-L6-v2"
export KG_EMBEDDING_DIMENSION=384
```

```python
from aiecs.application.knowledge_graph.builder import GraphBuilder

# Register your custom embedding provider first
embedding_client = CustomEmbeddingClient(model="all-MiniLM-L6-v2")
LLMClientFactory.register_custom_provider("custom-embeddings", embedding_client)

# Create graph builder using configuration
builder = GraphBuilder.from_config(
    graph_store=my_graph_store,
    entity_extractor=my_entity_extractor,
    relation_extractor=my_relation_extractor
)
# Will automatically use the custom embedding provider from environment variables
```

## Cost Optimization Example

Use different LLMs for different operations to optimize costs:

```python
from aiecs.llm import LLMClientFactory
from aiecs.application.knowledge_graph.extraction import LLMEntityExtractor
from aiecs.application.knowledge_graph.retrieval import QueryIntentClassifier
from aiecs.application.knowledge_graph.builder import GraphBuilder
import os

# Register custom providers
ollama_client = OllamaClient(model="llama2")  # Free local model
LLMClientFactory.register_custom_provider("ollama", ollama_client)

# Use free local model for classification (lightweight task)
os.environ["KG_STRATEGY_SELECTION_PROVIDER"] = "ollama"
os.environ["KG_STRATEGY_SELECTION_MODEL"] = "llama2"

# Use powerful cloud model for entity extraction (complex task)
os.environ["KG_ENTITY_EXTRACTION_PROVIDER"] = "OpenAI"
os.environ["KG_ENTITY_EXTRACTION_MODEL"] = "gpt-4"

# Use local embeddings to avoid API costs
embedding_client = CustomEmbeddingClient(model="all-MiniLM-L6-v2")
LLMClientFactory.register_custom_provider("local-embeddings", embedding_client)
os.environ["KG_EMBEDDING_PROVIDER"] = "local-embeddings"
os.environ["KG_EMBEDDING_MODEL"] = "all-MiniLM-L6-v2"
os.environ["KG_EMBEDDING_DIMENSION"] = "384"

# Create components - they'll use the configured providers
classifier = QueryIntentClassifier.from_config()  # Uses ollama
extractor = LLMEntityExtractor.from_config()      # Uses OpenAI GPT-4
builder = GraphBuilder.from_config(...)           # Uses local embeddings
```

## Best Practices

1. **Protocol Compliance**: Ensure your client implements all required methods of `LLMClientProtocol`
2. **Error Handling**: Implement proper error handling and retries in your client
3. **Resource Cleanup**: Implement `close()` method to clean up resources (connections, etc.)
4. **Async Support**: Use async/await for all I/O operations
5. **Type Hints**: Use proper type hints for better IDE support and type checking
6. **Testing**: Test your custom client thoroughly before using in production
7. **Configuration**: Use environment variables for configuration to avoid hardcoding values

## Troubleshooting

### Client Not Found

If you get "Unknown provider" errors:
- Ensure you've registered the provider before using it
- Check that the provider name matches exactly (case-sensitive)
- Register providers at application startup, before creating components

### Settings Cache Issues

If environment variable changes aren't picked up:
```python
from aiecs.config.config import get_settings
get_settings.cache_clear()  # Clear settings cache
```

### Client Cache Issues

If you need to force re-resolution of clients:
```python
from aiecs.llm import clear_client_cache
clear_client_cache()  # Clear all cached clients
clear_client_cache("ollama")  # Clear specific provider
```

## See Also

- [LLM Configuration](LLM_CONFIGURATION.md) - General LLM configuration
- [Base LLM Client](BASE_LLM_CLIENT.md) - Built-in LLM clients
- [LLM AI Clients](LLM_AI_CLIENTS.md) - Standard provider clients