Architecture Description¶
文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln
Overview¶
JAiRouter is a reactive AI model service routing gateway built on Spring Boot 3.5.x and Spring WebFlux. It adopts a modular design, supporting multiple load balancing strategies, rate limiting algorithms, circuit breaker mechanisms, and dynamic configuration management.
Overall Architecture¶
graph TB
subgraph "Client Layer"
A[Web Client]
B[Mobile Application]
C[Third-party Services]
end
subgraph "Gateway Layer"
D[Unified API Gateway]
E[Load Balancer]
F[Rate Limiter]
G[Circuit Breaker]
end
subgraph "Adapter Layer"
H[GPUStack Adapter]
I[Ollama Adapter]
J[VLLM Adapter]
K[OpenAI Adapter]
end
subgraph "Backend Services"
L[GPUStack Instance]
M[Ollama Instance]
N[VLLM Instance]
O[OpenAI Service]
end
A --> D
B --> D
C --> D
D --> E
E --> F
F --> G
G --> H
G --> I
G --> J
G --> K
H --> L
I --> M
J --> N
K --> O
Core Module Architecture¶
1. Controller Layer¶
graph LR
A[UniversalController] --> B[Chat API]
A --> C[Embedding API]
A --> D[Rerank API]
A --> E[TTS API]
A --> F[STT API]
A --> G[Image API]
H[ModelManagerController] --> I[Instance Management]
H --> J[Configuration Update]
K[AutoMergeController] --> L[Configuration Merge]
K --> M[File Management]
Responsibilities: - Unified API entry point, providing OpenAI-compatible interfaces - Dynamic configuration management interfaces - Automatic configuration file merging functionality
2. Service Layer¶
graph TB
subgraph "Core Services"
A[ModelServiceRegistry]
B[LoadBalancerFactory]
C[RateLimiterFactory]
D[CircuitBreakerFactory]
end
subgraph "Management Services"
E[ConfigurationService]
F[HealthCheckService]
G[AutoMergeService]
end
A --> B
A --> C
A --> D
E --> A
F --> A
Responsibilities: - Service registration and discovery - Component factory management - Dynamic configuration updates - Health check monitoring
3. Adapter Layer¶
graph TB
A[BaseAdapter] --> B[GPUStackAdapter]
A --> C[OllamaAdapter]
A --> D[VLLMAdapter]
A --> E[XinferenceAdapter]
A --> F[LocalAIAdapter]
A --> G[OpenAIAdapter]
subgraph "Adapter Functions"
H[Request Transformation]
I[Response Mapping]
J[Error Handling]
K[Streaming Processing]
end
B --> H
B --> I
B --> J
B --> K
Responsibilities: - Unifying invocation methods for different backend services - Request/response format conversion - Protocol adaptation and error handling
4. Load Balancer Layer¶
graph TB
A[LoadBalancer Interface] --> B[RandomLoadBalancer]
A --> C[RoundRobinLoadBalancer]
A --> D[LeastConnectionsLoadBalancer]
A --> E[IPHashLoadBalancer]
subgraph "Load Balancing Strategies"
F[Random Selection]
G[Round Robin]
H[Least Connections]
I[IP Hash]
end
B --> F
C --> G
D --> H
E --> I
Responsibilities: - Implementation of multiple load balancing algorithms - Support for weight configuration - Dynamic instance management
5. Rate Limiting Layer¶
graph TB
A[RateLimiter Interface] --> B[TokenBucketRateLimiter]
A --> C[LeakyBucketRateLimiter]
A --> D[SlidingWindowRateLimiter]
A --> E[WarmUpRateLimiter]
subgraph "Rate Limiting Algorithms"
F[Token Bucket]
G[Leaky Bucket]
H[Sliding Window]
I[Warm-up Rate Limiting]
end
B --> F
C --> G
D --> H
E --> I
Responsibilities: - Implementation of multiple rate limiting algorithms - Independent rate limiting per client IP - Dynamic rate limiting parameter adjustment
6. Circuit Breaker Layer¶
stateDiagram-v2
[*] --> CLOSED
CLOSED --> OPEN : Failure rate exceeds threshold
OPEN --> HALF_OPEN : Wait time reached
HALF_OPEN --> CLOSED : Success count reaches threshold
HALF_OPEN --> OPEN : Failure count reaches threshold
Responsibilities: - Circuit breaker state management - Failure rate statistics and threshold detection - Automatic recovery mechanism
7. Storage Layer¶
graph TB
A[ConfigStore Interface] --> B[MemoryConfigStore]
A --> C[FileConfigStore]
subgraph "Storage Functions"
D[Configuration Persistence]
E[Configuration Loading]
F[Configuration Merging]
G[Version Management]
end
B --> D
B --> E
C --> D
C --> E
C --> F
C --> G
Responsibilities: - Configuration data persistence - Support for memory and file storage - Configuration version management and merging
Technology Stack¶
Core Frameworks¶
- Java 17+: Modern Java feature support
- Spring Boot 3.5.x: Application framework and auto-configuration
- Spring WebFlux: Reactive web framework
- Reactor Core: Reactive programming support
Build Tools¶
- Maven 3.8+: Project building and dependency management
- Maven Wrapper: Ensuring build environment consistency
Monitoring and Documentation¶
- SpringDoc OpenAPI: Automatic API documentation generation
- Micrometer: Metrics collection and monitoring
- Spring Boot Actuator: Health checks and management endpoints
Code Quality¶
- Checkstyle: Code style checking
- SpotBugs: Static code analysis
- JaCoCo: Code coverage analysis
Design Principles¶
1. Reactive Programming¶
- Using Reactor for non-blocking I/O
- Supporting high-concurrency request processing
- Backpressure handling and flow control
2. Modular Design¶
- Clear module boundaries and responsibility separation
- Pluggable component architecture
- Easy to extend and maintain
3. Configuration-driven¶
- Supporting static and dynamic configurations
- Hot configuration updates without restart
- Configuration version management and rollback
4. Fault Tolerance Design¶
- Multi-layered fault tolerance mechanisms
- Graceful degradation and failure recovery
- Comprehensive error handling and logging
5. Observability¶
- Comprehensive metrics monitoring
- Structured log output
- Health checks and status reporting
Extension Points¶
1. Adapter Extension¶
Implement the BaseAdapter interface to support new backend services:
@Component
public class CustomAdapter extends BaseAdapter {
@Override
public Mono<String> processRequest(String serviceType, String requestBody, ServiceInstance instance) {
// Implement custom adapter logic
}
}
2. Load Balancing Strategy Extension¶
Implement the LoadBalancer interface to add new load balancing algorithms:
@Component
public class CustomLoadBalancer implements LoadBalancer {
@Override
public ServiceInstance selectInstance(List<ServiceInstance> instances, String clientInfo) {
// Implement custom load balancing logic
}
}
3. Rate Limiting Algorithm Extension¶
Implement the RateLimiter interface to add new rate limiting algorithms:
@Component
public class CustomRateLimiter implements RateLimiter {
@Override
public boolean tryAcquire(String key, int permits) {
// Implement custom rate limiting logic
}
}
Performance Considerations¶
1. Memory Management¶
- Periodic cleanup of inactive rate limiters
- Reasonable caching strategies and expiration mechanisms
- Avoiding memory leaks
2. Concurrent Processing¶
- Using reactive programming model
- Proper thread pool configuration
- Avoiding blocking operations
3. Network Optimization¶
- Connection pool reuse
- Request timeout control
- Backpressure handling
4. Monitoring and Tuning¶
- Key metrics monitoring
- Performance bottleneck identification
- Dynamic parameter adjustment
Security Considerations¶
1. Input Validation¶
- Request parameter validation
- Prevention of injection attacks
- Data format validation
2. Access Control¶
- API key authentication
- Request frequency limiting
- IP whitelist mechanism
3. Data Protection¶
- Sensitive information masking
- Transmission encryption
- Log security
This architecture design ensures JAiRouter's scalability, maintainability, and high performance, providing a stable and reliable foundation platform for AI model service routing.