Quick Start¶
文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln
This guide will help you make your first API call to JAiRouter and understand the basic concepts.
Prerequisites¶
- JAiRouter is installed and running (see Installation Guide)
- At least one AI model service is configured and accessible
Your First API Call¶
1. Chat Completion¶
Make a chat completion request using the OpenAI-compatible API:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:7b",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"max_tokens": 100
}'
2. Text Embeddings¶
Generate text embeddings:
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-ada-002",
"input": "Hello world"
}'
3. Text-to-Speech¶
Generate speech from text:
curl -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, this is a test.",
"voice": "alloy"
}' \
--output speech.mp3
Understanding the Response¶
A typical chat completion response looks like:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "qwen2.5:7b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 20,
"total_tokens": 29
}
}
Load Balancing in Action¶
If you have multiple instances configured, JAiRouter will automatically distribute requests:
model:
services:
chat:
load-balance:
type: round-robin
instances:
- name: "qwen2.5:7b"
baseUrl: "http://server1:11434"
weight: 2
- name: "qwen2.5:7b"
baseUrl: "http://server2:11434"
weight: 1
With this configuration: - Server1 will receive ~67% of requests (weight 2) - Server2 will receive ~33% of requests (weight 1)
Monitoring Your Requests¶
1. Check Service Health¶
2. View Metrics¶
3. Check Instance Status¶
Rate Limiting¶
JAiRouter includes built-in rate limiting. If you exceed the configured limits, you'll receive a 429 Too Many Requests
response:
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_exceeded",
"code": "rate_limit_exceeded"
}
}
Error Handling¶
JAiRouter provides consistent error responses:
Service Unavailable (503)¶
{
"error": {
"message": "All service instances are unavailable",
"type": "service_unavailable",
"code": "no_available_instances"
}
}
Circuit Breaker Open (503)¶
{
"error": {
"message": "Circuit breaker is open",
"type": "circuit_breaker_open",
"code": "circuit_breaker_open"
}
}
Using with OpenAI SDK¶
JAiRouter is compatible with OpenAI SDKs. Simply change the base URL:
Python¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # JAiRouter doesn't require API keys by default
)
response = client.chat.completions.create(
model="qwen2.5:7b",
messages=[
{"role": "user", "content": "Hello!"}
]
)
Node.js¶
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed'
});
const response = await openai.chat.completions.create({
model: 'qwen2.5:7b',
messages: [{ role: 'user', content: 'Hello!' }],
});
Next Steps¶
Now that you've made your first API calls, learn more about:
- Configuration - Detailed configuration options
- API Reference - Complete API documentation
- Deployment - Production deployment guides
Common Use Cases¶
1. A/B Testing Models¶
Configure multiple models and use weights to split traffic:
model:
services:
chat:
instances:
- name: "model-a"
baseUrl: "http://server1:11434"
weight: 1
- name: "model-b"
baseUrl: "http://server2:11434"
weight: 1
2. Fallback Strategy¶
Configure primary and backup services:
model:
services:
chat:
circuit-breaker:
enabled: true
failure-threshold: 5
fallback:
type: default
message: "Service temporarily unavailable"
3. Geographic Distribution¶
Route requests to the nearest server based on IP: