Configuration Guide¶
文档版本: 1.0.0
最后更新: 2025-08-28 Git 提交: c1aa5b0f
作者: Lincoln
JAiRouter provides flexible configuration options to meet various deployment scenarios. This guide covers all configuration aspects from basic setup to advanced features.
Modular Configuration Overview¶
Starting from v1.0.0, JAiRouter adopts a modular configuration structure, separating different functional configurations into independent configuration files to improve maintainability and readability.
Configuration File Structure¶
src/main/resources/
├── application.yml # Main configuration file, imports other modules
├── config/
│ ├── base/
│ │ ├── server-base.yml # Server base configuration
│ │ ├── model-services-base.yml # Model services configuration
│ │ └── monitoring-base.yml # Monitoring base configuration
│ ├── security/
│ │ └── security-base.yml # Security feature configuration
│ ├── tracing/
│ │ └── tracing-base.yml # Tracing feature configuration
│ └── monitoring/
│ ├── slow-query-alerts.yml # Slow query alert configuration
│ └── error-tracking.yml # Error tracking configuration
├── application-dev.yml # Development environment configuration
├── application-staging.yml # Staging environment configuration
├── application-prod.yml # Production environment configuration
├── application-legacy.yml # Backward compatibility configuration
└── application-security-example.yml # Security configuration example
Configuration Import Mechanism¶
The main configuration file application.yml imports various module configurations through the spring.config.import
mechanism:
# application.yml
spring:
config:
import:
- classpath:config/base/server-base.yml
- classpath:config/base/model-services-base.yml
- classpath:config/base/monitoring-base.yml
- classpath:config/tracing/tracing-base.yml
- classpath:config/security/security-base.yml
- classpath:config/monitoring/slow-query-alerts.yml
- classpath:config/monitoring/error-tracking.yml
This approach makes the configuration clearer and easier to maintain and extend.
Configuration Overview¶
JAiRouter supports two main configuration approaches:
- Static Configuration: Defined in YAML files, loaded at startup
- Dynamic Configuration: Updated at runtime via REST APIs
Configuration Hierarchy¶
Configuration is loaded in the following order (later sources override earlier ones):
- Default configuration (embedded in JAR)
application.yml
(classpath)./application.yml
(current directory)./config/application.yml
(config directory)- Environment variables
- Command line arguments
- Dynamic configuration (runtime updates)
Basic Structure¶
server:
port: 8080
model:
services:
<service-type>:
load-balance:
type: <strategy>
rate-limit:
type: <algorithm>
# ... rate limit settings
circuit-breaker:
enabled: true
# ... circuit breaker settings
fallback:
type: <fallback-type>
# ... fallback settings
instances:
- name: <model-name>
baseUrl: <service-url>
path: <api-path>
weight: <load-balance-weight>
# ... instance-specific settings
store:
type: <storage-backend>
# ... storage settings
Service Types¶
JAiRouter supports the following service types:
Service Type | Description | Default Path |
---|---|---|
chat | Chat completions | /v1/chat/completions |
embedding | Text embeddings | /v1/embeddings |
rerank | Text reranking | /v1/rerank |
tts | Text-to-speech | /v1/audio/speech |
stt | Speech-to-text | /v1/audio/transcriptions |
imgGen | Image generation | /v1/images/generations |
imgEdit | Image editing | /v1/images/edits |
Configuration Sections¶
1. Load Balancing¶
Configure how requests are distributed across service instances:
model:
services:
chat:
load-balance:
type: round-robin # random, round-robin, least-connections, ip-hash
Available Strategies:
- random: Random selection of instances
- round-robin: Sequential rotation through instances
- least-connections: Route to instance with fewest active connections
- ip-hash: Consistent routing based on client IP hash
2. Rate Limiting¶
Control request rates to prevent service overload:
model:
services:
chat:
rate-limit:
type: token-bucket
capacity: 100 # Maximum tokens in bucket
refill-rate: 10 # Tokens added per second
client-ip-enable: true # Enable per-client-IP rate limiting
Available Algorithms:
- token-bucket: Allow bursts up to bucket capacity
- leaky-bucket: Smooth, constant rate limiting
- sliding-window: Rate limit over time windows
- warm-up: Gradually increase rate limit
3. Circuit Breaking¶
Prevent cascading failures with circuit breaker pattern:
model:
services:
chat:
circuit-breaker:
enabled: true
failure-threshold: 5 # Failures before opening circuit
recovery-timeout: 30000 # Time before attempting recovery (ms)
success-threshold: 3 # Successes needed to close circuit
4. Fallback Strategies¶
Define fallback behavior when services are unavailable:
model:
services:
chat:
fallback:
type: default
message: "Service temporarily unavailable"
# OR
type: cache
ttl: 300000 # Cache TTL in milliseconds
Available Types:
- default: Return predefined message
- cache: Return cached responses
- none: No fallback (return error)
5. Service Instances¶
Define the actual service endpoints:
model:
services:
chat:
instances:
- name: "qwen2.5:7b"
baseUrl: "http://server1:11434"
path: "/v1/chat/completions"
weight: 2
timeout: 30000
headers:
Authorization: "Bearer token"
Custom-Header: "value"
- name: "qwen2.5:14b"
baseUrl: "http://server2:11434"
path: "/v1/chat/completions"
weight: 1
Instance Properties:
name
: Model name identifierbaseUrl
: Service base URLpath
: API endpoint pathweight
: Load balancing weight (higher = more traffic)timeout
: Request timeout in millisecondsheaders
: Custom headers to include in requests
6. Storage Configuration¶
Configure how dynamic configuration is persisted:
store:
type: file # memory or file
path: ./config # File storage directory
auto-backup: true # Enable automatic backups
backup-interval: 3600 # Backup interval in seconds
Storage Types:
- memory: In-memory storage (lost on restart)
- file: File-based storage (persisted across restarts)
Environment-Specific Configuration¶
JAiRouter supports multiple environment configuration files:
- Development Environment: application-dev.yml
- Staging Environment: application-staging.yml
- Production Environment: application-prod.yml
- Compatibility Mode: application-legacy.yml
- Security Example: application-security-example.yml
Environment configuration files only contain the differences from the base configuration, following Spring Boot's configuration override mechanism.
Environment Variables¶
Override configuration using environment variables:
# Server configuration
export SERVER_PORT=8080
# Model service configuration
export MODEL_LOAD_BALANCE_TYPE=round-robin
export MODEL_RATE_LIMIT_TYPE=token-bucket
Command Line Arguments¶
Override configuration via command line:
java -jar jairouter.jar \
--server.port=8080 \
--model.load-balance.type=round-robin \
--spring.profiles.active=prod
Configuration Validation¶
JAiRouter validates configuration at startup. Common validation errors:
Missing Required Fields¶
# ❌ Invalid - missing baseUrl
model:
services:
chat:
instances:
- name: "model"
path: "/v1/chat/completions"
# ✅ Valid
model:
services:
chat:
instances:
- name: "model"
baseUrl: "http://localhost:11434"
path: "/v1/chat/completions"
Invalid Configuration Values¶
# ❌ Invalid - unsupported load balance type
model:
services:
chat:
load-balance:
type: invalid-type
# ✅ Valid
model:
services:
chat:
load-balance:
type: round-robin
Configuration Best Practices¶
1. Use Meaningful Names¶
# ✅ Good - descriptive names
model:
services:
chat:
instances:
- name: "qwen2.5-7b-fast"
baseUrl: "http://fast-gpu-server:11434"
- name: "qwen2.5-14b-accurate"
baseUrl: "http://high-memory-server:11434"
2. Set Appropriate Timeouts¶
# ✅ Good - reasonable timeouts
model:
services:
chat:
instances:
- name: "model"
baseUrl: "http://server:11434"
timeout: 30000 # 30 seconds for chat
embedding:
instances:
- name: "embedding"
baseUrl: "http://server:11434"
timeout: 10000 # 10 seconds for embeddings
3. Configure Health Checks¶
# ✅ Good - enable health monitoring
model:
load-balance:
health-check:
enabled: true
interval: 30000
timeout: 5000
4. Use Weights for Gradual Rollouts¶
# ✅ Good - gradual rollout with weights
model:
services:
chat:
instances:
- name: "stable-model-v1"
baseUrl: "http://stable-server:11434"
weight: 9 # 90% traffic
- name: "new-model-v2"
baseUrl: "http://new-server:11434"
weight: 1 # 10% traffic
Next Steps¶
- Application Configuration - Detailed application settings
- Dynamic Configuration - Runtime configuration management
- Load Balancing - Load balancing strategies
- Rate Limiting - Rate limiting algorithms
- Circuit Breaker - Circuit breaker configuration