AIECS Project Migration Summary
Completed Tasks
1. Project Renaming ✓
Successfully renamed “app” directory to “aiecs” (AI Execute Services)
Updated all internal references from
app.toaiecs.Ensured all import paths are correct
2. Main.py Entry File ✓
Created complete aiecs/main.py file, including:
FastAPI application setup
WebSocket integration
Health check endpoints
Task execution API
Tool list API
Service and provider information API
Complete lifecycle management
3. README Documentation ✓
Created professional README.md, including:
Project introduction and features
Installation instructions
Quick start guide
Configuration instructions
API documentation
Architecture description
Development guide
4. PyProject.toml Configuration ✓
Updated pyproject.toml:
Changed project name to “aiecs”
Added complete metadata
Configured correct dependencies
Added classifiers and keywords
Configured build system
5. Scripts Dependency Patches ✓
Moved scripts directory into aiecs package
Updated
fix_weasel_validator.pyto adapt to new structureCreated
setup.pyfile with post-install hooksConfigured automatic weasel patch execution mechanism
6. NLP Data Package Auto-Download ✓
Created comprehensive
download_nlp_data.pyscript to automatically download NLP data packages required by classfire_toolAutomatically downloads NLTK stopwords, punkt, and other data packages (required by rake-nltk and text processing)
Automatically downloads spaCy English model en_core_web_sm (required)
Automatically downloads spaCy Chinese model zh_core_web_sm (optional)
Integrated into post-install hooks, automatically executed during installation
Provides multiple manual execution methods:
aiecs-download-nlp-data: Python script command./aiecs/scripts/setup_nlp_data.sh: Convenient shell script
Includes complete error handling, logging, and installation verification
Supports automatic virtual environment detection and activation
Additional Completed Work
Created
__main__.pyAllows running service via
python -m aiecs
Created LICENSE file
MIT License
Created MANIFEST.in
Ensures all necessary files are included in distribution package
Created .gitignore
Prevents unnecessary files from entering version control
Created PUBLISH.md
Detailed PyPI publishing guide
Created test scripts
test_import.pyfor verifying package structure
Project Structure
python-middleware-dev/
├── aiecs/ # Main package directory (formerly app)
│ ├── __init__.py
│ ├── __main__.py # CLI entry point
│ ├── main.py # FastAPI application
│ ├── scripts/ # Automation scripts
│ │ ├── __init__.py
│ │ ├── fix_weasel_validator.py # weasel library patch
│ │ ├── download_nlp_data.py # NLP data package download
│ │ └── ...
│ └── ... (other modules)
├── setup.py # Installation configuration (with post-install)
├── pyproject.toml # Project metadata
├── README.md # Project documentation
├── LICENSE # MIT License
├── MANIFEST.in # Include file manifest
├── PUBLISH.md # Publishing guide
└── .gitignore # Git ignore file
Publishing Preparation
The project is now ready to publish to PyPI. Publishing steps:
Install build tools
pip install build twine
Build package
python -m build
Test installation
pip install dist/aiecs-1.0.0-py3-none-any.whl
Upload to TestPyPI (recommended to test first)
python -m twine upload --repository testpypi dist/*
Upload to PyPI
python -m twine upload dist/*
Usage Instructions
After installation, users can:
Use as a library
from aiecs import AIECS from aiecs.domain.task.task_context import TaskContext
Run service
aiecs # or python -m aiecs
Run weasel patch (if automatic patch fails)
aiecs-patch-weasel
Download NLP data packages (if automatic download fails)
# Use Python script command (recommended) aiecs-download-nlp-data # Or use shell script ./aiecs/scripts/setup_nlp_data.sh # Only verify installed data packages ./aiecs/scripts/setup_nlp_data.sh --verify
Important Notes
Users need to configure environment variables (.env file) to use normally
PostgreSQL and Redis services are required for full operation
Weasel patch will automatically attempt to execute during installation
NLP data packages (NLTK stopwords and spaCy en_core_web_sm) will automatically download during installation
Image Tool requires system-level Tesseract OCR to use OCR functionality
Java Environment and Apache Tika (Optional Dependency):
Office Tool’s text extraction functionality uses Apache Tika as a universal fallback solution
Tika supports text extraction from 1000+ document formats (including legacy Office formats)
Requires Java Runtime Environment (JRE) 8+ to use
If Java environment is not available, Tika-related tests will be automatically skipped, not affecting other functionality
Recommended to install Java in enterprise environments or when processing multiple document formats
Project supports Python 3.10-3.12
Automation Features
NLP Data Package Management
Auto-Download: Automatically downloads NLP data packages required by classfire_tool during installation
NLTK stopwords, punkt, and other data packages
spaCy English model en_core_web_sm (required)
spaCy Chinese model zh_core_web_sm (optional)
Multiple Execution Methods:
Python script:
aiecs-download-nlp-dataShell script:
./aiecs/scripts/setup_nlp_data.shVerification mode:
./aiecs/scripts/setup_nlp_data.sh --verify
Advanced Features:
Automatic virtual environment detection and activation
Dependency integrity checking
Download progress and status logging
Post-installation verification tests
Intelligent detection of existing data packages
Timeout protection (prevents long hangs)
Error Handling: Download failures do not block the entire installation process, detailed logs are generated
Java/Tika Integration Management
Function Positioning: Apache Tika serves as Office Tool’s universal text extraction fallback solution
Supported Formats:
Dedicated library processing: DOCX, PPTX, XLSX (using python-docx/python-pptx/pandas)
PDF documents (using pdfplumber)
Image OCR (using pytesseract)
Tika-processed formats: Legacy Office (.doc/.xls/.ppt), RTF, ODF, e-books, and 1000+ formats
Environment Detection:
Automatically detects Java runtime environment
Gracefully skips during testing (if Java unavailable)
Provides degradation handling at runtime
Deployment Recommendations:
Development Environment: Java optional, convenient for complete testing
Production Environment: Decide based on document processing requirements
Docker Deployment: Provides both Java-enabled and pure Python image options
Error Handling: Tika unavailability does not affect other document processing functionality, warning logs are recorded
Java Environment Configuration Guide
Installing Java Runtime Environment
Linux (Ubuntu/Debian)
# Install OpenJDK 11 (recommended)
sudo apt update
sudo apt install openjdk-11-jre-headless
# Or install OpenJDK 8 (minimum requirement)
sudo apt install openjdk-8-jre-headless
# Verify installation
java -version
Linux (CentOS/RHEL/Fedora)
# CentOS/RHEL
sudo yum install java-11-openjdk-headless
# Fedora
sudo dnf install java-11-openjdk-headless
# Verify installation
java -version
macOS
# Using Homebrew
brew install openjdk@11
# Or download Oracle JDK
# Visit https://www.oracle.com/java/technologies/downloads/
# Verify installation
java -version
Windows
# Using Chocolatey
choco install openjdk11
# Or using Scoop
scoop install openjdk
# Or manually download and install
# Visit https://adoptium.net/ to download Eclipse Temurin
# Verify installation
java -version
Verifying Tika Functionality
After installing Java, you can verify if Tika functionality works correctly:
from aiecs.tools.task_tools.office_tool import OfficeTool
# Create tool instance
tool = OfficeTool()
# Test Tika text extraction (using any document file)
try:
text = tool.extract_text("path/to/your/document.doc")
print("Tika functionality working correctly")
except Exception as e:
print(f"Tika unavailable: {e}")
Docker Configuration Guide
Basic Python Image (Without Java)
# Dockerfile.python-only
FROM python:3.11-slim
# Install system dependencies (Tesseract OCR)
RUN apt-get update && apt-get install -y \
tesseract-ocr \
tesseract-ocr-chi-sim \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy project files
COPY . .
# Install Python dependencies
RUN pip install -e .
# Start command
CMD ["python", "-m", "aiecs"]
Complete Image with Java
# Dockerfile.with-java
FROM python:3.11-slim
# Install system dependencies (including Java and Tesseract)
RUN apt-get update && apt-get install -y \
openjdk-11-jre-headless \
tesseract-ocr \
tesseract-ocr-chi-sim \
&& rm -rf /var/lib/apt/lists/*
# Set JAVA_HOME environment variable
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
# Set working directory
WORKDIR /app
# Copy project files
COPY . .
# Install Python dependencies
RUN pip install -e .
# Verify Java installation
RUN java -version
# Start command
CMD ["python", "-m", "aiecs"]
Docker Compose Configuration
# docker-compose.yml
version: '3.8'
services:
aiecs-python-only:
build:
context: .
dockerfile: Dockerfile.python-only
environment:
- PYTHONPATH=/app
volumes:
- ./data:/app/data
ports:
- "8000:8000"
depends_on:
- postgres
- redis
aiecs-with-java:
build:
context: .
dockerfile: Dockerfile.with-java
environment:
- PYTHONPATH=/app
- JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
volumes:
- ./data:/app/data
ports:
- "8000:8000"
depends_on:
- postgres
- redis
postgres:
image: postgres:15
environment:
POSTGRES_DB: aiecs
POSTGRES_USER: aiecs
POSTGRES_PASSWORD: aiecs_password
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
ports:
- "6379:6379"
volumes:
postgres_data:
redis_data:
Multi-Stage Build (Recommended for Production)
# Dockerfile.multi-stage
# Build stage
FROM python:3.11 as builder
WORKDIR /app
COPY pyproject.toml setup.py ./
COPY aiecs/ ./aiecs/
# Install build dependencies
RUN pip install build
RUN python -m build
# Runtime stage - Pure Python
FROM python:3.11-slim as python-runtime
RUN apt-get update && apt-get install -y \
tesseract-ocr \
tesseract-ocr-chi-sim \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/dist/*.whl /tmp/
RUN pip install /tmp/*.whl
CMD ["python", "-m", "aiecs"]
# Runtime stage - With Java
FROM python:3.11-slim as java-runtime
RUN apt-get update && apt-get install -y \
openjdk-11-jre-headless \
tesseract-ocr \
tesseract-ocr-chi-sim \
&& rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
WORKDIR /app
COPY --from=builder /app/dist/*.whl /tmp/
RUN pip install /tmp/*.whl
CMD ["python", "-m", "aiecs"]
Build and Run Commands
# Build pure Python image
docker build -f Dockerfile.python-only -t aiecs:python-only .
# Build image with Java
docker build -f Dockerfile.with-java -t aiecs:with-java .
# Use multi-stage build
docker build --target python-runtime -t aiecs:python-runtime .
docker build --target java-runtime -t aiecs:java-runtime .
# Run container
docker run -p 8000:8000 aiecs:with-java
# Use Docker Compose
docker-compose up aiecs-with-java
Environment Variable Configuration
Create .env file for Docker environment:
# .env
# Database configuration
DATABASE_URL=postgresql://aiecs:aiecs_password@postgres:5432/aiecs
# Redis configuration
REDIS_URL=redis://redis:6379/0
# Java configuration (optional)
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
TIKA_SERVER_JAR=/usr/share/java/tika-server.jar
# Other configuration
PYTHONPATH=/app
LOG_LEVEL=INFO
Verify Docker Deployment
# Enter container to verify environment
docker exec -it <container_id> bash
# Verify Python environment
python -c "from aiecs import AIECS; print('AIECS OK')"
# Verify Java environment (if installed)
java -version
# Verify Tika functionality
python -c "
from aiecs.tools.task_tools.office_tool import OfficeTool
tool = OfficeTool()
print('Tika available:', hasattr(tool, '_extract_tika_text'))
"
# Verify OCR functionality
tesseract --version
Image Size Comparison
Pure Python Image: ~800MB
Image with Java: ~1.2GB
Full Feature Image: ~1.5GB (includes all dependencies)
Choose the appropriate image configuration based on actual requirements!
Project migration completed! 🎉