Glossary¶
文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln
This document defines the professional terms and concepts used in the JAiRouter project.
A¶
Adapter¶
A component that converts API formats of different backend AI services into a unified interface. Each adapter is responsible for handling request/response format conversion, error handling, and protocol adaptation for specific services.
Example: GPUStackAdapter converts JAiRouter requests into a format that GPUStack service can understand.
API Gateway¶
A service that acts as a unified entry point for all client requests, providing functions such as routing, authentication, rate limiting, and monitoring. JAiRouter is an API gateway specifically designed for AI model services.
Actuator¶
Production-ready features provided by Spring Boot, including health checks, metrics collection, configuration information, and other monitoring and management endpoints.
B¶
Backpressure¶
A flow control mechanism in reactive programming when the data production speed exceeds the consumption speed. JAiRouter uses the Reactor framework to handle backpressure.
Base URL¶
The root address of a backend service, used to construct complete request URLs.
Example: http://localhost:11434
is the base URL for the Ollama service.
C¶
Circuit Breaker¶
A fault protection mechanism that automatically cuts off requests when service failures are detected, preventing fault propagation. It has three states: CLOSED, OPEN, and HALF_OPEN.
Client IP¶
The IP address of the client initiating the request, used for IP-based load balancing and rate limiting.
Configuration Store¶
A storage backend for persisting configuration data, supporting both in-memory storage and file storage.
D¶
Dynamic Configuration¶
Configuration that can be modified at runtime without restarting the service. JAiRouter supports dynamically updating service instance configurations via REST API.
E¶
Endpoint¶
The specific access address of an API, including the URL path and HTTP method.
Example: POST /v1/chat/completions
is the chat completion endpoint.
F¶
Fallback¶
A mechanism that returns preset responses or uses backup services when the primary service is unavailable. It supports two strategies: default response and cache fallback.
Flux¶
A reactive type in the Reactor framework representing an asynchronous sequence of 0 to N elements.
G¶
Grafana¶
An open-source monitoring and visualization platform used to display metric data collected by Prometheus.
H¶
Health Check¶
A mechanism that periodically checks whether service instances are running normally. Unhealthy instances are automatically removed from the load balancing pool.
HTTP Client Pool¶
A mechanism that reuses HTTP connections to avoid frequent creation and destruction of connections, thereby improving performance.
I¶
Instance¶
A specific deployment unit of a backend AI service, containing information such as name, address, path, and weight.
Example:
IP Hash¶
A load balancing strategy that uses hash calculations based on the client's IP address to select backend instances, ensuring that requests from the same client are always routed to the same instance.
J¶
JVM (Java Virtual Machine)¶
The runtime environment for running Java applications. JAiRouter runs on the JVM and requires Java 17 or higher.
L¶
Load Balancer¶
A component that distributes requests among multiple backend instances. JAiRouter supports strategies such as random, round-robin, least connections, and IP hash.
Least Connections¶
A load balancing strategy that selects the instance with the fewest active connections, suitable for requests with significant processing time differences.
Leaky Bucket¶
A rate-limiting algorithm that processes requests at a fixed rate. Requests exceeding the rate are discarded or delayed to achieve smooth rate limiting.
M¶
Metrics¶
Quantitative data used to monitor system performance and status, such as request count, response time, and error rate.
Mono¶
A reactive type in the Reactor framework representing an asynchronous sequence of 0 to 1 elements.
Model¶
A specific model in AI services, such as GPT-3.5, LLaMA-2, etc. Each model may have different capabilities and performance characteristics.
O¶
OpenAI Compatible¶
An interface design that follows the OpenAI API format and specifications, allowing clients to seamlessly switch to JAiRouter.
P¶
Path¶
The URL path part of an API endpoint, combined with the base URL to form the complete request address.
Example: /v1/chat/completions
is the path for the chat completion API.
Prometheus¶
An open-source monitoring system and time-series database used to collect and store JAiRouter's operational metrics.
R¶
Rate Limiter¶
A component that controls request frequency to prevent system overload. It supports algorithms such as token bucket, leaky bucket, and sliding window.
Reactive Programming¶
A programming paradigm based on asynchronous data streams. JAiRouter implements reactive programming using Spring WebFlux and Reactor.
Round Robin¶
A load balancing strategy that selects backend instances in a cyclic order to ensure even distribution of requests.
S¶
Service Instance¶
See Instance.
Service Registry¶
A component that manages information about all backend service instances, responsible for instance registration, discovery, and health status maintenance.
Sliding Window¶
A rate-limiting algorithm that limits the number of requests within a fixed time window, with the window sliding over time.
Spring Boot¶
A Java application framework. JAiRouter is built on Spring Boot 3.5.x.
Spring WebFlux¶
Spring framework's reactive web module that supports non-blocking I/O and high-concurrency processing.
T¶
Token Bucket¶
A rate-limiting algorithm that adds tokens to a bucket at a fixed rate. Requests need to consume tokens to pass through, supporting burst traffic.
Timeout¶
The maximum time limit for a request to wait for a response. Exceeding this time will return a timeout error.
W¶
WebClient¶
A non-blocking, reactive HTTP client provided by Spring WebFlux. JAiRouter uses it to communicate with backend services.
Weight¶
The weight value of a service instance used for weighted load balancing. Instances with higher weights receive more requests.
Warm Up¶
A rate-limiting strategy that gradually increases the allowed request volume during system startup to avoid performance issues during cold starts.
Abbreviation List¶
Abbreviation | Full Name | Chinese | Description |
---|---|---|---|
AI | Artificial Intelligence | 人工智能 | Computer simulation of human intelligence |
API | Application Programming Interface | 应用程序编程接口 | Communication protocol between software components |
CPU | Central Processing Unit | 中央处理器 | Main processing unit of a computer |
DNS | Domain Name System | 域名系统 | System that converts domain names to IP addresses |
GC | Garbage Collection | 垃圾回收 | JVM automatic memory management mechanism |
HTTP | HyperText Transfer Protocol | 超文本传输协议 | Basic protocol for web communication |
HTTPS | HTTP Secure | 安全超文本传输协议 | Encrypted HTTP protocol |
IP | Internet Protocol | 网际协议 | Basic protocol for network communication |
JSON | JavaScript Object Notation | JavaScript 对象表示法 | Lightweight data interchange format |
JVM | Java Virtual Machine | Java 虚拟机 | Virtual environment for running Java programs |
LLM | Large Language Model | 大语言模型 | Large-scale pre-trained language model |
REST | Representational State Transfer | 表述性状态转移 | Web service architectural style |
RPS | Requests Per Second | 每秒请求数 | Metric for measuring system throughput |
SSL | Secure Sockets Layer | 安全套接字层 | Network communication encryption protocol |
TLS | Transport Layer Security | 传输层安全 | Successor version of SSL |
TTL | Time To Live | 生存时间 | Validity period of data |
URL | Uniform Resource Locator | 统一资源定位符 | Address of a network resource |
YAML | YAML Ain't Markup Language | YAML 不是标记语言 | Human-readable data serialization format |
Concept Relationship Diagram¶
graph TB
subgraph "Core Concepts"
A[API Gateway]
B[Load Balancer]
C[Rate Limiter]
D[Circuit Breaker]
end
subgraph "Service Management"
E[Service Registry]
F[Service Instance]
G[Health Check]
H[Adapter]
end
subgraph "Configuration Management"
I[Configuration Store]
J[Dynamic Configuration]
end
subgraph "Monitoring System"
K[Metrics]
L[Prometheus]
M[Grafana]
end
A --> B
A --> C
A --> D
B --> E
E --> F
E --> G
F --> H
A --> I
I --> J
A --> K
K --> L
L --> M
Usage Recommendations¶
Getting Started¶
It is recommended to learn the following concepts in order: 1. Basic Concepts: API Gateway, Load Balancer, Service Instance 2. Core Functions: Rate Limiter, Circuit Breaker, Health Check 3. Advanced Features: Dynamic Configuration, Metrics, Adapter 4. Operations and Monitoring: Prometheus, Grafana, Actuator
In-depth Understanding¶
- Reactive Programming: Learn Mono, Flux, and backpressure handling
- Performance Optimization: Understand concepts such as JVM, GC, and connection pools
- Monitoring and Operations: Master metric collection, alert configuration, and troubleshooting
Practical Application¶
- Start with simple load balancing configurations
- Gradually add rate limiting and circuit breaker functions
- Configure monitoring and alert systems
- Optimize performance and stability
Note: This glossary will be continuously updated as the project develops. If you find missing terms or concepts that need to be added, please provide feedback through GitHub Issues.
Last Updated: January 15, 2025