Skip to content

Performance Troubleshooting

文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln

This document provides diagnostic methods and optimization strategies for JAiRouter performance issues.

Performance Monitoring Metrics

Key Performance Indicators (KPI)

Metric CategoryMetric NameNormal RangeAlert Threshold
Response TimeP95 Response Time< 2s> 5s
ThroughputRequests Per Second (RPS)> 100< 50
Error Rate4xx/5xx Error Rate< 1%> 5%
Resource UsageCPU Usage< 70%> 85%
Resource UsageMemory Usage< 80%> 90%
ConnectionsActive Connections< 1000> 2000

Monitoring Metric Collection

# Get response time metrics
curl -s http://localhost:8080/actuator/metrics/jairouter.request.duration | jq

# Get request statistics
curl -s http://localhost:8080/actuator/metrics/jairouter.requests.total | jq

# Get JVM metrics
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
curl -s http://localhost:8080/actuator/metrics/jvm.gc.pause | jq

# Get system metrics
curl -s http://localhost:8080/actuator/metrics/system.cpu.usage | jq

Performance Issue Classification

1. Slow Response Time

Symptom Identification

  • API response time exceeds 5 seconds
  • User feedback about slow system response
  • P95 response time continuously increasing

Diagnostic Steps

1. Identify Bottleneck Location

# Check component response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/v1/chat/completions

# curl-format.txt content:
#     time_namelookup:  %{time_namelookup}\n
#        time_connect:  %{time_connect}\n
#     time_appconnect:  %{time_appconnect}\n
#    time_pretransfer:  %{time_pretransfer}\n
#       time_redirect:  %{time_redirect}\n
#  time_starttransfer:  %{time_starttransfer}\n
#                     ----------\n
#          time_total:  %{time_total}\n

2. Analyze Request Chain

# Enable request tracing
java -Dlogging.level.org.unreal.modelrouter=DEBUG \
     -jar target/model-router-*.jar

# Analyze timestamps in logs
grep "Processing request" logs/jairouter-debug.log | tail -10

3. Check Backend Service Performance

# Directly test backend service
time curl -X POST http://backend:9090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"test","messages":[{"role":"user","content":"hello"}]}'

Optimization Strategies

Connection Pool Optimization

spring:
  webflux:
    httpclient:
      pool:
        max-connections: 200        # Increase connection pool size
        max-idle-time: 60s         # Extend idle time
        max-life-time: 300s        # Extend connection lifecycle
      connect-timeout: 5s          # Connection timeout
      response-timeout: 30s        # Response timeout

Load Balancer Optimization

model:
  services:
    chat:
      load-balance:
        type: least-connections    # Use least connections strategy
      timeout: 30s                 # Set reasonable timeout

Cache Strategy

model:
  cache:
    enabled: true
    ttl: 300s                     # 5-minute cache
    max-size: 1000               # Maximum cache entries

2. Insufficient Throughput

Symptom Identification

  • RPS below expectations
  • System unable to handle high concurrent requests
  • Request queue backlog

Diagnostic Steps

1. Stress Testing

# Use Apache Bench for stress testing
ab -n 1000 -c 50 -H "Content-Type: application/json" \
   -p request.json http://localhost:8080/v1/chat/completions

# Use wrk for stress testing
wrk -t12 -c400 -d30s --script=post.lua http://localhost:8080/v1/chat/completions

2. Thread Pool Analysis

# Get thread pool status
curl -s http://localhost:8080/actuator/metrics/executor.active | jq
curl -s http://localhost:8080/actuator/metrics/executor.queue.remaining | jq

# Generate thread dump
jstack <pid> > threads.dump

3. Resource Usage Analysis

# CPU usage
top -p <pid>

# Memory usage
jstat -gc <pid> 5s

# I/O usage
iotop -p <pid>

Optimization Strategies

Thread Pool Tuning

spring:
  task:
    execution:
      pool:
        core-size: 8              # Core thread count
        max-size: 32              # Maximum thread count
        queue-capacity: 200       # Queue capacity
        keep-alive: 60s           # Thread keep-alive time

Reactor Tuning

spring:
  webflux:
    netty:
      worker-threads: 16          # Worker thread count
      initial-buffer-size: 128    # Initial buffer size
      max-buffer-size: 1024       # Maximum buffer size

Asynchronous Processing

model:
  async:
    enabled: true
    thread-pool-size: 16
    queue-capacity: 1000

3. High Memory Usage

Symptom Identification

  • JVM heap memory continuously growing
  • Frequent Full GC
  • OutOfMemoryError exceptions

Diagnostic Steps

1. Memory Usage Analysis

# Check memory usage
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
curl -s http://localhost:8080/actuator/metrics/jvm.memory.max | jq

# Check GC status
curl -s http://localhost:8080/actuator/metrics/jvm.gc.pause | jq
curl -s http://localhost:8080/actuator/metrics/jvm.gc.memory.allocated | jq

2. Heap Dump Analysis

# Generate heap dump
jcmd <pid> GC.run_finalization
jmap -dump:format=b,file=heap.hprof <pid>

# Analyze heap dump using Eclipse MAT or VisualVM

3. Memory Leak Detection

# Enable memory leak detection
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/heapdump.hprof \
     -jar target/model-router-*.jar

Optimization Strategies

JVM Parameter Tuning

# G1GC configuration
java -Xms2g -Xmx4g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -XX:G1HeapRegionSize=16m \
     -XX:G1NewSizePercent=20 \
     -XX:G1MaxNewSizePercent=30 \
     -XX:InitiatingHeapOccupancyPercent=45 \
     -jar target/model-router-*.jar

Memory Management Optimization

model:
  memory:
    # Rate limiter cleanup
    rate-limiter-cleanup:
      enabled: true
      interval: 5m
      inactive-threshold: 30m

    # Cache management
    cache:
      max-size: 1000
      expire-after-write: 10m
      expire-after-access: 5m

Object Pooling

spring:
  webflux:
    httpclient:
      pool:
        # Enable object pooling
        use-object-pooling: true
        pool-size: 100

4. High CPU Usage

Symptom Identification

  • CPU usage consistently above 85%
  • High system load
  • Increased response time

Diagnostic Steps

1. CPU Hotspot Analysis

# Check CPU usage
top -H -p <pid>

# Generate CPU performance profile
java -XX:+FlightRecorder \
     -XX:StartFlightRecording=duration=60s,filename=cpu-profile.jfr \
     -jar target/model-router-*.jar

# Analyze performance data
jfr print --events CPULoad cpu-profile.jfr

2. Thread Analysis

# Generate thread dump
jstack <pid> > threads.dump

# Analyze high CPU threads
top -H -p <pid>  # Find high CPU thread ID
printf "%x\n" <thread-id>  # Convert to hexadecimal
grep <hex-thread-id> threads.dump  # Search in dump

3. Code Hotspot Analysis

# Enable compilation logging
java -XX:+PrintCompilation \
     -XX:+UnlockDiagnosticVMOptions \
     -XX:+PrintInlining \
     -jar target/model-router-*.jar

Optimization Strategies

Algorithm Optimization

// Optimized load balancing algorithm
@Component
public class OptimizedRoundRobinLoadBalancer implements LoadBalancer {
    private final AtomicInteger counter = new AtomicInteger(0);

    @Override
    public ServiceInstance selectInstance(List<ServiceInstance> instances, String clientInfo) {
        if (instances.isEmpty()) {
            return null;
        }

        // Optimize modulo operation using bitwise operations
        int index = counter.getAndIncrement() & (instances.size() - 1);
        return instances.get(index);
    }
}

Concurrency Optimization

model:
  concurrency:
    # Limit concurrent processing
    max-concurrent-requests: 1000

    # Use lock-free data structures
    lock-free-structures: true

    # Batch processing optimization
    batch-processing:
      enabled: true
      batch-size: 100
      timeout: 100ms

Cache Optimization

// Use Caffeine high-performance cache
@Configuration
public class CacheConfiguration {

    @Bean
    public Cache<String, Object> responseCache() {
        return Caffeine.newBuilder()
            .maximumSize(10000)
            .expireAfterWrite(Duration.ofMinutes(5))
            .recordStats()
            .build();
    }
}

Performance Tuning Best Practices

1. JVM Tuning

Heap Memory Configuration

# Production environment recommended configuration
java -Xms4g -Xmx4g \                    # Fixed heap size to avoid dynamic adjustment
     -XX:+UseG1GC \                     # Use G1 garbage collector
     -XX:MaxGCPauseMillis=100 \         # Maximum GC pause time
     -XX:G1HeapRegionSize=32m \         # G1 region size
     -XX:G1NewSizePercent=20 \          # Young generation ratio
     -XX:G1MaxNewSizePercent=30 \       # Maximum young generation ratio
     -XX:InitiatingHeapOccupancyPercent=45 \  # GC trigger threshold
     -XX:+UnlockExperimentalVMOptions \ # Enable experimental options
     -XX:+UseStringDeduplication \      # String deduplication
     -jar target/model-router-*.jar

GC Log Configuration

# GC log configuration
-Xlog:gc*:gc.log:time,tags \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=5 \
-XX:GCLogFileSize=10M

2. Application Layer Tuning

Connection Pool Configuration

spring:
  webflux:
    httpclient:
      pool:
        max-connections: 500           # Adjust based on backend service count
        max-idle-time: 30s            # Idle connection keep time
        max-life-time: 300s           # Connection maximum lifecycle
        pending-acquire-timeout: 10s  # Connection acquisition timeout
        pending-acquire-max-count: 1000  # Waiting queue size

Reactive Configuration

spring:
  webflux:
    netty:
      worker-threads: 16              # Worker thread count = CPU cores * 2
      initial-buffer-size: 128        # Initial buffer
      max-buffer-size: 1024          # Maximum buffer
      connection-timeout: 5s          # Connection timeout

3. Monitoring and Alerting

Performance Monitoring Configuration

management:
  metrics:
    export:
      prometheus:
        enabled: true
        step: 30s                     # Metric collection interval
    distribution:
      percentiles-histogram:
        http.server.requests: true    # Enable response time histogram
      percentiles:
        http.server.requests: 0.5,0.95,0.99  # Percentiles

Alert Rules

# prometheus-alerts.yml
groups:
  - name: jairouter.performance
    rules:
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, jairouter_request_duration_seconds_bucket) > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "JAiRouter response time too high"
          description: "P95 response time: {{ $value }}s"

      - alert: HighCPUUsage
        expr: system_cpu_usage > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "JAiRouter CPU usage too high"
          description: "CPU usage: {{ $value | humanizePercentage }}"

      - alert: HighMemoryUsage
        expr: jvm_memory_used_bytes / jvm_memory_max_bytes > 0.9
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "JAiRouter memory usage too high"
          description: "Memory usage: {{ $value | humanizePercentage }}"

4. Capacity Planning

Performance Benchmark Testing

#!/bin/bash
# performance-benchmark.sh

echo "=== JAiRouter Performance Benchmark ==="

# Warm-up
echo "Warm-up phase..."
ab -n 100 -c 10 http://localhost:8080/v1/chat/completions

# Benchmark test
echo "Benchmark test..."
ab -n 1000 -c 50 -g benchmark.dat http://localhost:8080/v1/chat/completions

# Stress test
echo "Stress test..."
ab -n 5000 -c 200 -g stress.dat http://localhost:8080/v1/chat/completions

# Generate report
echo "Generating performance report..."
gnuplot -e "
set terminal png;
set output 'performance-report.png';
set title 'JAiRouter Performance Test';
set xlabel 'Request Number';
set ylabel 'Response Time (ms)';
plot 'benchmark.dat' using 9 with lines title 'Benchmark',
     'stress.dat' using 9 with lines title 'Stress Test'
"

echo "=== Performance testing completed ==="

Capacity Assessment

# Calculate theoretical maximum RPS
# RPS = 1000ms / average response time(ms) * concurrent connections

# Example calculation:
# Average response time: 100ms
# Concurrent connections: 200
# Theoretical maximum RPS = 1000 / 100 * 200 = 2000

# Consider safety factor (70%)
# Recommended actual RPS = 2000 * 0.7 = 1400

Performance Optimization Checklist

Pre-deployment Checklist

  • [ ] JVM parameters optimized
  • [ ] Connection pool configuration reasonable
  • [ ] Cache strategy configured
  • [ ] Monitoring metrics enabled
  • [ ] Alert rules set

Runtime Monitoring

  • [ ] Response time within normal range
  • [ ] CPU usage < 80%
  • [ ] Memory usage < 85%
  • [ ] GC pause time < 200ms
  • [ ] Error rate < 1%

Regular Optimization

  • [ ] Analyze performance trends
  • [ ] Identify performance bottlenecks
  • [ ] Adjust configuration parameters
  • [ ] Update optimization strategies
  • [ ] Verify optimization effectiveness

Performance Testing Tools

1. Apache Bench (ab)

# Basic test
ab -n 1000 -c 50 http://localhost:8080/v1/chat/completions

# POST request test
ab -n 1000 -c 50 -p request.json -T application/json http://localhost:8080/v1/chat/completions

2. wrk

# Install wrk
# Ubuntu: sudo apt-get install wrk
# macOS: brew install wrk

# Basic test
wrk -t12 -c400 -d30s http://localhost:8080/v1/chat/completions

# Using script
wrk -t12 -c400 -d30s --script=post.lua http://localhost:8080/v1/chat/completions

3. JMeter

<!-- JMeter test plan example -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2">
  <hashTree>
    <TestPlan testname="JAiRouter Performance Test">
      <elementProp name="TestPlan.arguments" elementType="Arguments" guiclass="ArgumentsPanel">
        <collectionProp name="Arguments.arguments"/>
      </elementProp>
      <stringProp name="TestPlan.user_define_classpath"></stringProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
    </TestPlan>
  </hashTree>
</jmeterTestPlan>

By following these performance optimization guidelines, you can ensure that JAiRouter provides stable and efficient services in production environments.