First Steps¶

文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln

After completing the Quick Start, this guide will help you understand JAiRouter's core concepts and configure it for your specific needs.

Core Concepts¶

1. Service Types¶

JAiRouter supports multiple AI service types:

Service Type	Description	API Endpoint
chat	Chat completions	`/v1/chat/completions`
embedding	Text embeddings	`/v1/embeddings`
rerank	Text reranking	`/v1/rerank`
tts	Text-to-speech	`/v1/audio/speech`
stt	Speech-to-text	`/v1/audio/transcriptions`
image	Image generation	`/v1/images/generations`

2. Service Instances¶

Each service type can have multiple instances for load balancing:

model:
  services:
    chat:
      instances:
        - name: "qwen2.5:7b"
          baseUrl: "http://server1:11434"
          path: "/v1/chat/completions"
          weight: 2
        - name: "qwen2.5:14b"
          baseUrl: "http://server2:11434"
          path: "/v1/chat/completions"
          weight: 1

3. Load Balancing Strategies¶

Choose how requests are distributed:

random: Random selection
round-robin: Sequential rotation
least-connections: Route to least busy instance
ip-hash: Consistent routing based on client IP

4. Rate Limiting¶

Control request rates per client or globally:

token-bucket: Allow bursts up to bucket capacity
leaky-bucket: Smooth, constant rate
sliding-window: Rate limit over time windows

Basic Configuration¶

Minimal Configuration¶

Start with a simple configuration:

server:
  port: 8080

model:
  services:
    chat:
      instances:
        - name: "default-model"
          baseUrl: "http://localhost:11434"
          path: "/v1/chat/completions"

Adding Load Balancing¶

Configure multiple instances with load balancing:

model:
  services:
    chat:
      load-balance:
        type: round-robin
      instances:
        - name: "fast-model"
          baseUrl: "http://fast-server:11434"
          weight: 3
        - name: "accurate-model"
          baseUrl: "http://accurate-server:11434"
          weight: 1

Adding Rate Limiting¶

Protect your services with rate limiting:

model:
  services:
    chat:
      rate-limit:
        type: token-bucket
        capacity: 100
        refill-rate: 10
        client-ip-enable: true
      instances:
        - name: "protected-model"
          baseUrl: "http://localhost:11434"

Adding Circuit Breaking¶

Prevent cascading failures:

model:
  services:
    chat:
      circuit-breaker:
        enabled: true
        failure-threshold: 5
        recovery-timeout: 30000
        success-threshold: 3
      fallback:
        type: default
        message: "Service temporarily unavailable"
      instances:
        - name: "reliable-model"
          baseUrl: "http://localhost:11434"

Configuration Management¶

Static Configuration¶

Define services in application.yml:

model:
  services:
    chat:
      # Configuration here
    embedding:
      # Configuration here

Dynamic Configuration¶

Add, update, or remove instances at runtime:

# Add a new instance
curl -X POST http://localhost:8080/api/config/instance/add/chat \
  -H "Content-Type: application/json" \
  -d '{
    "name": "new-model",
    "baseUrl": "http://new-server:11434",
    "path": "/v1/chat/completions",
    "weight": 1
  }'

# Update an instance
curl -X PUT http://localhost:8080/api/config/instance/update/chat \
  -H "Content-Type: application/json" \
  -d '{
    "instanceId": "new-model@http://new-server:11434",
    "instance": {
      "name": "new-model",
      "baseUrl": "http://updated-server:11434",
      "path": "/v1/chat/completions",
      "weight": 2
    }
  }'

# Remove an instance
curl -X DELETE "http://localhost:8080/api/config/instance/del/chat?modelName=new-model&baseUrl=http://updated-server:11434"

Health Monitoring¶

Automatic Health Checks¶

JAiRouter automatically monitors service health:

model:
  services:
    chat:
      health-check:
        enabled: true
        interval: 30000  # 30 seconds
        timeout: 5000    # 5 seconds
        path: "/health"  # Health check endpoint

Manual Health Check¶

Check service status manually:

# Check overall health
curl http://localhost:8080/actuator/health

# Check specific service instances
curl http://localhost:8080/api/config/instance/type/chat

Monitoring and Observability¶

Metrics¶

JAiRouter exposes metrics for monitoring:

# View all metrics
curl http://localhost:8080/actuator/metrics

# View specific metrics
curl http://localhost:8080/actuator/metrics/http.server.requests
curl http://localhost:8080/actuator/metrics/jairouter.requests.total

Logging¶

Configure logging levels:

logging:
  level:
    org.unreal.modelrouter: DEBUG
    org.springframework.web: INFO
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} - %msg%n"
  file:
    name: logs/jairouter.log

Testing Your Configuration¶

1. Validate Configuration¶

Test your configuration before deploying:

# Check configuration syntax
java -jar jairouter.jar --spring.config.location=file:./application.yml --spring.profiles.active=validate

2. Load Testing¶

Use tools like Apache Bench or curl to test load balancing:

# Simple load test
for i in {1..10}; do
  curl -X POST http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "test", "messages": [{"role": "user", "content": "test"}]}' &
done
wait

3. Circuit Breaker Testing¶

Test circuit breaker behavior:

# Stop backend service to trigger circuit breaker
# Then make requests to see fallback responses
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "test", "messages": [{"role": "user", "content": "test"}]}'

Common Patterns¶

1. Multi-Model Setup¶

Configure different models for different use cases:

model:
  services:
    chat:
      instances:
        - name: "fast-chat"
          baseUrl: "http://fast-server:11434"
          weight: 3
        - name: "smart-chat"
          baseUrl: "http://smart-server:11434"
          weight: 1
    embedding:
      instances:
        - name: "embedding-model"
          baseUrl: "http://embedding-server:11434"

2. Environment-Specific Configuration¶

Use Spring profiles for different environments:

# application-dev.yml
model:
  services:
    chat:
      instances:
        - name: "dev-model"
          baseUrl: "http://localhost:11434"

---
# application-prod.yml
model:
  services:
    chat:
      load-balance:
        type: least-connections
      rate-limit:
        type: token-bucket
        capacity: 1000
        refill-rate: 100
      instances:
        - name: "prod-model-1"
          baseUrl: "http://prod-server-1:11434"
        - name: "prod-model-2"
          baseUrl: "http://prod-server-2:11434"

3. Gradual Rollout¶

Use weights for gradual model rollouts:

model:
  services:
    chat:
      instances:
        - name: "stable-model"
          baseUrl: "http://stable-server:11434"
          weight: 9  # 90% of traffic
        - name: "new-model"
          baseUrl: "http://new-server:11434"
          weight: 1  # 10% of traffic

Next Steps¶

Now that you understand the basics, explore more advanced topics:

Configuration Guide - Detailed configuration options
API Reference - Complete API documentation
Deployment Guide - Production deployment strategies
Monitoring Guide - Set up comprehensive monitoring

Troubleshooting¶

Common Issues¶

No Available Instances: - Check if backend services are running - Verify network connectivity - Check health check configuration

Rate Limit Exceeded: - Adjust rate limit settings - Check if client IP rate limiting is appropriate - Monitor request patterns

Circuit Breaker Open: - Check backend service health - Review failure threshold settings - Monitor error rates

For more detailed troubleshooting, see the Troubleshooting Guide.