Performance Troubleshooting¶
文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln
This document provides diagnostic methods and optimization strategies for JAiRouter performance issues.
Performance Monitoring Metrics¶
Key Performance Indicators (KPI)¶
Metric Category | Metric Name | Normal Range | Alert Threshold |
---|---|---|---|
Response Time | P95 Response Time | < 2s | > 5s |
Throughput | Requests Per Second (RPS) | > 100 | < 50 |
Error Rate | 4xx/5xx Error Rate | < 1% | > 5% |
Resource Usage | CPU Usage | < 70% | > 85% |
Resource Usage | Memory Usage | < 80% | > 90% |
Connections | Active Connections | < 1000 | > 2000 |
Monitoring Metric Collection¶
# Get response time metrics
curl -s http://localhost:8080/actuator/metrics/jairouter.request.duration | jq
# Get request statistics
curl -s http://localhost:8080/actuator/metrics/jairouter.requests.total | jq
# Get JVM metrics
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
curl -s http://localhost:8080/actuator/metrics/jvm.gc.pause | jq
# Get system metrics
curl -s http://localhost:8080/actuator/metrics/system.cpu.usage | jq
Performance Issue Classification¶
1. Slow Response Time¶
Symptom Identification¶
- API response time exceeds 5 seconds
- User feedback about slow system response
- P95 response time continuously increasing
Diagnostic Steps¶
1. Identify Bottleneck Location
# Check component response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/v1/chat/completions
# curl-format.txt content:
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
# time_starttransfer: %{time_starttransfer}\n
# ----------\n
# time_total: %{time_total}\n
2. Analyze Request Chain
# Enable request tracing
java -Dlogging.level.org.unreal.modelrouter=DEBUG \
-jar target/model-router-*.jar
# Analyze timestamps in logs
grep "Processing request" logs/jairouter-debug.log | tail -10
3. Check Backend Service Performance
# Directly test backend service
time curl -X POST http://backend:9090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"test","messages":[{"role":"user","content":"hello"}]}'
Optimization Strategies¶
Connection Pool Optimization
spring:
webflux:
httpclient:
pool:
max-connections: 200 # Increase connection pool size
max-idle-time: 60s # Extend idle time
max-life-time: 300s # Extend connection lifecycle
connect-timeout: 5s # Connection timeout
response-timeout: 30s # Response timeout
Load Balancer Optimization
model:
services:
chat:
load-balance:
type: least-connections # Use least connections strategy
timeout: 30s # Set reasonable timeout
Cache Strategy
2. Insufficient Throughput¶
Symptom Identification¶
- RPS below expectations
- System unable to handle high concurrent requests
- Request queue backlog
Diagnostic Steps¶
1. Stress Testing
# Use Apache Bench for stress testing
ab -n 1000 -c 50 -H "Content-Type: application/json" \
-p request.json http://localhost:8080/v1/chat/completions
# Use wrk for stress testing
wrk -t12 -c400 -d30s --script=post.lua http://localhost:8080/v1/chat/completions
2. Thread Pool Analysis
# Get thread pool status
curl -s http://localhost:8080/actuator/metrics/executor.active | jq
curl -s http://localhost:8080/actuator/metrics/executor.queue.remaining | jq
# Generate thread dump
jstack <pid> > threads.dump
3. Resource Usage Analysis
Optimization Strategies¶
Thread Pool Tuning
spring:
task:
execution:
pool:
core-size: 8 # Core thread count
max-size: 32 # Maximum thread count
queue-capacity: 200 # Queue capacity
keep-alive: 60s # Thread keep-alive time
Reactor Tuning
spring:
webflux:
netty:
worker-threads: 16 # Worker thread count
initial-buffer-size: 128 # Initial buffer size
max-buffer-size: 1024 # Maximum buffer size
Asynchronous Processing
3. High Memory Usage¶
Symptom Identification¶
- JVM heap memory continuously growing
- Frequent Full GC
- OutOfMemoryError exceptions
Diagnostic Steps¶
1. Memory Usage Analysis
# Check memory usage
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
curl -s http://localhost:8080/actuator/metrics/jvm.memory.max | jq
# Check GC status
curl -s http://localhost:8080/actuator/metrics/jvm.gc.pause | jq
curl -s http://localhost:8080/actuator/metrics/jvm.gc.memory.allocated | jq
2. Heap Dump Analysis
# Generate heap dump
jcmd <pid> GC.run_finalization
jmap -dump:format=b,file=heap.hprof <pid>
# Analyze heap dump using Eclipse MAT or VisualVM
3. Memory Leak Detection
# Enable memory leak detection
java -XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/tmp/heapdump.hprof \
-jar target/model-router-*.jar
Optimization Strategies¶
JVM Parameter Tuning
# G1GC configuration
java -Xms2g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:G1HeapRegionSize=16m \
-XX:G1NewSizePercent=20 \
-XX:G1MaxNewSizePercent=30 \
-XX:InitiatingHeapOccupancyPercent=45 \
-jar target/model-router-*.jar
Memory Management Optimization
model:
memory:
# Rate limiter cleanup
rate-limiter-cleanup:
enabled: true
interval: 5m
inactive-threshold: 30m
# Cache management
cache:
max-size: 1000
expire-after-write: 10m
expire-after-access: 5m
Object Pooling
4. High CPU Usage¶
Symptom Identification¶
- CPU usage consistently above 85%
- High system load
- Increased response time
Diagnostic Steps¶
1. CPU Hotspot Analysis
# Check CPU usage
top -H -p <pid>
# Generate CPU performance profile
java -XX:+FlightRecorder \
-XX:StartFlightRecording=duration=60s,filename=cpu-profile.jfr \
-jar target/model-router-*.jar
# Analyze performance data
jfr print --events CPULoad cpu-profile.jfr
2. Thread Analysis
# Generate thread dump
jstack <pid> > threads.dump
# Analyze high CPU threads
top -H -p <pid> # Find high CPU thread ID
printf "%x\n" <thread-id> # Convert to hexadecimal
grep <hex-thread-id> threads.dump # Search in dump
3. Code Hotspot Analysis
# Enable compilation logging
java -XX:+PrintCompilation \
-XX:+UnlockDiagnosticVMOptions \
-XX:+PrintInlining \
-jar target/model-router-*.jar
Optimization Strategies¶
Algorithm Optimization
// Optimized load balancing algorithm
@Component
public class OptimizedRoundRobinLoadBalancer implements LoadBalancer {
private final AtomicInteger counter = new AtomicInteger(0);
@Override
public ServiceInstance selectInstance(List<ServiceInstance> instances, String clientInfo) {
if (instances.isEmpty()) {
return null;
}
// Optimize modulo operation using bitwise operations
int index = counter.getAndIncrement() & (instances.size() - 1);
return instances.get(index);
}
}
Concurrency Optimization
model:
concurrency:
# Limit concurrent processing
max-concurrent-requests: 1000
# Use lock-free data structures
lock-free-structures: true
# Batch processing optimization
batch-processing:
enabled: true
batch-size: 100
timeout: 100ms
Cache Optimization
// Use Caffeine high-performance cache
@Configuration
public class CacheConfiguration {
@Bean
public Cache<String, Object> responseCache() {
return Caffeine.newBuilder()
.maximumSize(10000)
.expireAfterWrite(Duration.ofMinutes(5))
.recordStats()
.build();
}
}
Performance Tuning Best Practices¶
1. JVM Tuning¶
Heap Memory Configuration¶
# Production environment recommended configuration
java -Xms4g -Xmx4g \ # Fixed heap size to avoid dynamic adjustment
-XX:+UseG1GC \ # Use G1 garbage collector
-XX:MaxGCPauseMillis=100 \ # Maximum GC pause time
-XX:G1HeapRegionSize=32m \ # G1 region size
-XX:G1NewSizePercent=20 \ # Young generation ratio
-XX:G1MaxNewSizePercent=30 \ # Maximum young generation ratio
-XX:InitiatingHeapOccupancyPercent=45 \ # GC trigger threshold
-XX:+UnlockExperimentalVMOptions \ # Enable experimental options
-XX:+UseStringDeduplication \ # String deduplication
-jar target/model-router-*.jar
GC Log Configuration¶
# GC log configuration
-Xlog:gc*:gc.log:time,tags \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=5 \
-XX:GCLogFileSize=10M
2. Application Layer Tuning¶
Connection Pool Configuration¶
spring:
webflux:
httpclient:
pool:
max-connections: 500 # Adjust based on backend service count
max-idle-time: 30s # Idle connection keep time
max-life-time: 300s # Connection maximum lifecycle
pending-acquire-timeout: 10s # Connection acquisition timeout
pending-acquire-max-count: 1000 # Waiting queue size
Reactive Configuration¶
spring:
webflux:
netty:
worker-threads: 16 # Worker thread count = CPU cores * 2
initial-buffer-size: 128 # Initial buffer
max-buffer-size: 1024 # Maximum buffer
connection-timeout: 5s # Connection timeout
3. Monitoring and Alerting¶
Performance Monitoring Configuration¶
management:
metrics:
export:
prometheus:
enabled: true
step: 30s # Metric collection interval
distribution:
percentiles-histogram:
http.server.requests: true # Enable response time histogram
percentiles:
http.server.requests: 0.5,0.95,0.99 # Percentiles
Alert Rules¶
# prometheus-alerts.yml
groups:
- name: jairouter.performance
rules:
- alert: HighResponseTime
expr: histogram_quantile(0.95, jairouter_request_duration_seconds_bucket) > 5
for: 2m
labels:
severity: warning
annotations:
summary: "JAiRouter response time too high"
description: "P95 response time: {{ $value }}s"
- alert: HighCPUUsage
expr: system_cpu_usage > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "JAiRouter CPU usage too high"
description: "CPU usage: {{ $value | humanizePercentage }}"
- alert: HighMemoryUsage
expr: jvm_memory_used_bytes / jvm_memory_max_bytes > 0.9
for: 3m
labels:
severity: critical
annotations:
summary: "JAiRouter memory usage too high"
description: "Memory usage: {{ $value | humanizePercentage }}"
4. Capacity Planning¶
Performance Benchmark Testing¶
#!/bin/bash
# performance-benchmark.sh
echo "=== JAiRouter Performance Benchmark ==="
# Warm-up
echo "Warm-up phase..."
ab -n 100 -c 10 http://localhost:8080/v1/chat/completions
# Benchmark test
echo "Benchmark test..."
ab -n 1000 -c 50 -g benchmark.dat http://localhost:8080/v1/chat/completions
# Stress test
echo "Stress test..."
ab -n 5000 -c 200 -g stress.dat http://localhost:8080/v1/chat/completions
# Generate report
echo "Generating performance report..."
gnuplot -e "
set terminal png;
set output 'performance-report.png';
set title 'JAiRouter Performance Test';
set xlabel 'Request Number';
set ylabel 'Response Time (ms)';
plot 'benchmark.dat' using 9 with lines title 'Benchmark',
'stress.dat' using 9 with lines title 'Stress Test'
"
echo "=== Performance testing completed ==="
Capacity Assessment¶
# Calculate theoretical maximum RPS
# RPS = 1000ms / average response time(ms) * concurrent connections
# Example calculation:
# Average response time: 100ms
# Concurrent connections: 200
# Theoretical maximum RPS = 1000 / 100 * 200 = 2000
# Consider safety factor (70%)
# Recommended actual RPS = 2000 * 0.7 = 1400
Performance Optimization Checklist¶
Pre-deployment Checklist¶
- [ ] JVM parameters optimized
- [ ] Connection pool configuration reasonable
- [ ] Cache strategy configured
- [ ] Monitoring metrics enabled
- [ ] Alert rules set
Runtime Monitoring¶
- [ ] Response time within normal range
- [ ] CPU usage < 80%
- [ ] Memory usage < 85%
- [ ] GC pause time < 200ms
- [ ] Error rate < 1%
Regular Optimization¶
- [ ] Analyze performance trends
- [ ] Identify performance bottlenecks
- [ ] Adjust configuration parameters
- [ ] Update optimization strategies
- [ ] Verify optimization effectiveness
Performance Testing Tools¶
1. Apache Bench (ab)¶
# Basic test
ab -n 1000 -c 50 http://localhost:8080/v1/chat/completions
# POST request test
ab -n 1000 -c 50 -p request.json -T application/json http://localhost:8080/v1/chat/completions
2. wrk¶
# Install wrk
# Ubuntu: sudo apt-get install wrk
# macOS: brew install wrk
# Basic test
wrk -t12 -c400 -d30s http://localhost:8080/v1/chat/completions
# Using script
wrk -t12 -c400 -d30s --script=post.lua http://localhost:8080/v1/chat/completions
3. JMeter¶
<!-- JMeter test plan example -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2">
<hashTree>
<TestPlan testname="JAiRouter Performance Test">
<elementProp name="TestPlan.arguments" elementType="Arguments" guiclass="ArgumentsPanel">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="TestPlan.user_define_classpath"></stringProp>
<boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
</TestPlan>
</hashTree>
</jmeterTestPlan>
By following these performance optimization guidelines, you can ensure that JAiRouter provides stable and efficient services in production environments.