跳转至

使用指南

本指南详细介绍如何在各种场景下有效使用 JAiRouter 的分布式追踪功能。

基本使用

查看追踪数据

1. 日志追踪信息

启用追踪后,所有请求都会在日志中包含追踪信息:

# 查看包含追踪信息的日志
tail -f logs/application.log | grep traceId

# 根据 traceId 查找特定请求的所有日志
grep "4bf92f3577b34da6a3ce929d0e0e4736" logs/application.log

2. 结构化日志查询

# 使用 jq 解析 JSON 格式的追踪日志
tail -f logs/application.log | jq 'select(.traceId != null)'

# 查询特定服务的追踪数据
tail -f logs/application.log | jq 'select(.service == "jairouter" and .traceId != null)'

3. Actuator 端点查询

# 查看追踪健康状态
curl http://localhost:8080/actuator/health/tracing

# 查看追踪配置信息
curl http://localhost:8080/actuator/info | jq '.tracing'

# 查看追踪指标
curl http://localhost:8080/actuator/metrics | grep tracing

自定义追踪标签

1. 在业务代码中添加标签

@RestController
public class CustomController {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    @PostMapping("/api/custom")
    public ResponseEntity<?> customEndpoint(@RequestBody CustomRequest request) {
        TracingContext context = tracingContextHolder.getCurrentContext();
        if (context != null && context.isActive()) {
            Span span = context.getCurrentSpan();
            // 添加业务标签
            span.setAttribute("user.id", request.getUserId());
            span.setAttribute("business.type", request.getType());
            span.setAttribute("custom.operation", "data-processing");

            // 记录业务事件
            span.addEvent("business.started");
        }

        // 业务逻辑处理
        CustomResponse response = processRequest(request);

        // 记录处理结果
        if (context != null && context.isActive()) {
            Span span = context.getCurrentSpan();
            span.setAttribute("result.status", response.getStatus());
            span.addEvent("business.completed");
        }

        return ResponseEntity.ok(response);
    }
}

2. 手动创建 Span 添加追踪

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;

@Component
public class BusinessService {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public Mono<ProcessResult> processUserData(UserData data) {
        TracingContext context = tracingContextHolder.getCurrentContext();
        if (context != null && context.isActive()) {
            Span parentSpan = context.getCurrentSpan();
            Span span = context.createChildSpan("user-data-processing", SpanKind.INTERNAL, parentSpan);
            try {
                return Mono.fromCallable(() -> {
                    // 业务逻辑
                    return new ProcessResult();
                }).doFinally(signal -> span.end());
            } catch (Exception e) {
                span.recordException(e);
                span.end();
                throw e;
            }
        }
        return Mono.fromCallable(() -> {
            // 业务逻辑
            return new ProcessResult();
        });
    }

    public List<Entity> queryDatabase(String condition) {
        TracingContext context = tracingContextHolder.getCurrentContext();
        if (context != null && context.isActive()) {
            Span parentSpan = context.getCurrentSpan();
            Span span = context.createChildSpan("database-query", SpanKind.CLIENT, parentSpan);
            span.setAttribute("operation", "query");
            try {
                // 数据库查询逻辑
                return entityRepository.findByCondition(condition);
            } finally {
                span.end();
            }
        }
        // 数据库查询逻辑
        return entityRepository.findByCondition(condition);
    }
}

高级场景

分布式服务链追踪

1. 服务间调用追踪

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;

@Service
public class ExternalServiceClient {

    @Autowired
    private WebClient webClient;

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public Mono<ExternalResponse> callExternalService(ExternalRequest request) {
        TracingContext context = tracingContextHolder.getCurrentContext();
        if (context == null || !context.isActive()) {
            return callExternalServiceInternal(request, null);
        }

        // 创建子 Span 用于外部服务调用
        Span parentSpan = context.getCurrentSpan();
        Span span = context.createChildSpan("external-service-call", SpanKind.CLIENT, parentSpan);
        // 添加服务信息
        span.setAttribute("external.service", "ai-model-service");
        span.setAttribute("external.endpoint", "/v1/chat/completions");
        span.setAttribute("request.model", request.getModel());

        return callExternalServiceInternal(request, span)
            .doFinally(signal -> span.end());
    }

    private Mono<ExternalResponse> callExternalServiceInternal(ExternalRequest request, Span span) {
        return webClient.post()
            .uri("/v1/chat/completions")
            .body(BodyInserters.fromValue(request))
            .retrieve()
            .bodyToMono(ExternalResponse.class)
            .doOnSuccess(response -> {
                if (span != null) {
                    span.setAttribute("response.status", "success");
                    span.setAttribute("response.tokens", response.getUsage().getTotalTokens());
                }
            })
            .doOnError(error -> {
                if (span != null) {
                    span.setAttribute("error.type", error.getClass().getSimpleName());
                    span.setAttribute("error.message", error.getMessage());
                    span.recordException(error);
                }
            });
    }
}

2. 响应式流中的上下文传播

@Component
public class ReactiveProcessor {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public Mono<ProcessedData> processDataPipeline(InputData input) {
        return Mono.just(input)
            // 第一阶段:验证
            .flatMap(this::validateInput)
            .contextWrite(ctx ->
                tracingContextHolder.getCurrentContext()
                    .map(tracing -> ctx.put("tracing", tracing))
                    .orElse(ctx))

            // 第二阶段:转换
            .flatMap(this::transformData)
            .contextWrite(ctx -> {
                // 在每个阶段更新追踪信息
                TracingContext context = tracingContextHolder.getCurrentContext();
                if (context != null && context.isActive()) {
                    Span span = context.getCurrentSpan();
                    span.addEvent("pipeline.stage.transform");
                }
                return ctx;
            })

            // 第三阶段:存储
            .flatMap(this::saveData)
            .doOnSuccess(result -> {
                TracingContext context = tracingContextHolder.getCurrentContext();
                if (context != null && context.isActive()) {
                    Span span = context.getCurrentSpan();
                    span.setAttribute("pipeline.result", "success");
                }
            });
    }
}

慢查询检测

1. 自动慢查询检测

系统会自动检测超过阈值的请求:

jairouter:
  tracing:
    sampling:
      strategy: "adaptive"
      adaptive:
        slow-request-threshold: 3000  # 3秒阈值
        slow-request-sample-rate: 0.8 # 慢请求80%采样

2. 手动慢查询分析

@Component
public class SlowQueryAnalyzer {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public void analyzeSlowOperation() {
        long startTime = System.currentTimeMillis();

        try {
            // 执行可能较慢的操作
            performComplexOperation();
        } finally {
            long duration = System.currentTimeMillis() - startTime;

            if (duration > 5000) { // 5秒阈值
                TracingContext context = tracingContextHolder.getCurrentContext();
                if (context != null && context.isActive()) {
                    Span span = context.getCurrentSpan();
                    // 标记为慢查询
                    span.setAttribute("performance.slow", "true");
                    span.setAttribute("performance.duration", String.valueOf(duration));
                    span.addEvent("slow.query.detected");
                }

                // 记录详细的性能信息
                recordPerformanceDetails();
            }
        }
    }
}

错误追踪和分析

1. 异常自动追踪

@Component
public class ErrorHandlingService {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public Mono<Result> processWithErrorHandling(Request request) {
        return Mono.fromCallable(() -> processRequest(request))
            .onErrorResume(BusinessException.class, ex -> {
                // 业务异常处理
                TracingContext context = tracingContextHolder.getCurrentContext();
                if (context != null && context.isActive()) {
                    Span span = context.getCurrentSpan();
                    span.setAttribute("error.type", "business");
                    span.setAttribute("error.code", ex.getErrorCode());
                    span.recordException(ex);
                }
                return Mono.just(createErrorResult(ex));
            })
            .onErrorResume(Exception.class, ex -> {
                // 系统异常处理
                TracingContext context = tracingContextHolder.getCurrentContext();
                if (context != null && context.isActive()) {
                    Span span = context.getCurrentSpan();
                    span.setAttribute("error.type", "system");
                    span.setAttribute("error.severity", "high");
                    span.recordException(ex);
                }
                return Mono.error(new SystemException("系统处理失败", ex));
            });
    }
}

2. 自定义错误追踪

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;

@Component
public class CustomErrorTracker {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public void trackCustomError(String operation, Throwable error, Map<String, Object> context) {
        TracingContext tracingContext = tracingContextHolder.getCurrentContext();
        if (tracingContext == null || !tracingContext.isActive()) {
            return;
        }

        Span parentSpan = tracingContext.getCurrentSpan();
        Span span = tracingContext.createChildSpan("error-analysis", SpanKind.INTERNAL, parentSpan);
        try {
            // 记录错误基本信息
            span.setAttribute("error.operation", operation);
            span.setAttribute("error.class", error.getClass().getSimpleName());
            span.setAttribute("error.message", error.getMessage());

            // 记录上下文信息
            context.forEach((key, value) ->
                span.setAttribute("context." + key, String.valueOf(value)));

            // 记录堆栈跟踪(脱敏处理)
            String sanitizedStackTrace = sanitizeStackTrace(error);
            span.addEvent("error.stacktrace");

            // 分析错误严重程度
            String severity = analyzeSeverity(error);
            span.setAttribute("error.severity", severity);
        } finally {
            span.end();
        }
    }

    private String sanitizeStackTrace(Throwable error) {
        // 实现堆栈跟踪脱敏逻辑
        return error.getStackTrace()[0].toString();
    }
}

性能监控

实时性能指标

1. 查看关键指标

# 查看追踪相关的 Prometheus 指标
curl -s http://localhost:8080/actuator/prometheus | grep jairouter_tracing

# 查看采样率指标
curl -s http://localhost:8080/actuator/metrics/jairouter.tracing.sampling.rate

# 查看 Span 创建和导出统计
curl -s http://localhost:8080/actuator/metrics/jairouter.tracing.spans.created
curl -s http://localhost:8080/actuator/metrics/jairouter.tracing.spans.exported

2. 性能监控 API

@RestController
@RequestMapping("/api/tracing/performance")
public class TracingPerformanceController {

    @Autowired
    private TracingPerformanceMonitor performanceMonitor;

    @GetMapping("/stats")
    public ResponseEntity<PerformanceStats> getPerformanceStats() {
        PerformanceStats stats = performanceMonitor.getStats();
        return ResponseEntity.ok(stats);
    }

    @GetMapping("/slow-requests")
    public ResponseEntity<List<SlowRequest>> getSlowRequests(
            @RequestParam(defaultValue = "5000") long thresholdMs) {
        List<SlowRequest> slowRequests = performanceMonitor
            .getSlowRequests(thresholdMs);
        return ResponseEntity.ok(slowRequests);
    }
}

内存使用监控

1. 监控追踪内存使用

@Component
public class TracingMemoryMonitor {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    @Scheduled(fixedRate = 30000) // 每30秒检查一次
    public void monitorMemoryUsage() {
        MemoryUsage memoryUsage = tracingMemoryManager.getMemoryUsage();

        if (memoryUsage.getUsedRatio() > 0.8) {
            // 内存使用率超过80%,触发清理
            tracingMemoryManager.triggerCleanup();

            // 记录内存压力事件
            TracingContext context = tracingContextHolder.getCurrentContext();
            if (context != null && context.isActive()) {
                Span span = context.getCurrentSpan();
                span.addEvent("memory.pressure.detected");
            }
        }
    }
}

2. 内存使用优化

jairouter:
  tracing:
    memory:
      max-spans: 50000              # 根据实际内存调整
      cleanup-interval: 30s         # 更频繁的清理
      span-ttl: 180s               # 较短的 TTL
      memory-threshold: 0.7        # 较低的内存阈值

安全和隐私

敏感数据脱敏

1. 自动敏感数据检测

@Component
public class TracingSanitizer {

    public void sanitizeSpanAttributes(Span span, Map<String, Object> attributes) {
        attributes.forEach((key, value) -> {
            if (isSensitiveAttribute(key)) {
                // 脱敏处理
                String sanitizedValue = sanitizeValue(String.valueOf(value));
                span.setAttribute(key, sanitizedValue);
            } else {
                span.setAttribute(key, String.valueOf(value));
            }
        });
    }

    private boolean isSensitiveAttribute(String key) {
        // 检查是否为敏感属性
        return key.toLowerCase().contains("password") ||
               key.toLowerCase().contains("token") ||
               key.toLowerCase().contains("secret") ||
               key.toLowerCase().contains("api-key");
    }

    private String sanitizeValue(String value) {
        if (value.length() <= 4) {
            return "***";
        }
        // 保留前2位和后2位,中间用*代替
        return value.substring(0, 2) +
               "*".repeat(value.length() - 4) +
               value.substring(value.length() - 2);
    }
}

2. 配置敏感数据过滤规则

jairouter:
  tracing:
    security:
      enabled: true
      sensitive-headers:
        - "Authorization"
        - "Cookie"
        - "X-API-Key"
        - "X-Auth-Token"
      sensitive-params:
        - "password"
        - "token"
        - "secret"
        - "api_key"
        - "access_token"
      mask-pattern: "***"

访问控制

1. 基于角色的追踪数据访问

@Component
public class TracingSecurityManager {

    @PreAuthorize("hasRole('ADMIN')")
    public List<TraceData> getAllTraces() {
        return tracingQueryService.findAllTraces();
    }

    @PreAuthorize("hasRole('USER')")
    public List<TraceData> getUserTraces(String userId) {
        return tracingQueryService.findTracesByUser(userId);
    }

    @PreAuthorize("hasRole('VIEWER')")
    public TraceData getFilteredTrace(String traceId) {
        TraceData trace = tracingQueryService.findById(traceId);
        return filterSensitiveData(trace);
    }
}

故障排除

常见问题诊断

1. 追踪数据缺失

诊断步骤:

# 1. 检查追踪是否启用
curl http://localhost:8080/actuator/health/tracing

# 2. 检查采样配置
curl http://localhost:8080/actuator/configprops | jq '.jairouter.tracing.sampling'

# 3. 检查导出器状态
curl http://localhost:8080/actuator/metrics/jairouter.tracing.export.errors

解决方案:

# 临时提高采样率进行调试
jairouter:
  tracing:
    sampling:
      strategy: "ratio"
      ratio: 1.0  # 100% 采样

2. 性能影响过大

诊断指标:

# 查看追踪处理延迟
curl http://localhost:8080/actuator/metrics/jairouter.tracing.processing.duration

# 查看内存使用情况
curl http://localhost:8080/actuator/metrics/jvm.memory.used

优化建议:

jairouter:
  tracing:
    # 降低采样率
    sampling:
      ratio: 0.1

    # 启用异步处理
    async:
      enabled: true
      core-pool-size: 4

    # 优化内存配置
    memory:
      max-spans: 5000
      cleanup-interval: 15s

3. 上下文传播问题

检查响应式流上下文:

@Component
public class ContextDiagnostic {

    @Autowired
    private TracingContextHolder tracingContextHolder;

    public Mono<String> diagnoseContext() {
        return Mono.deferContextual(ctx -> {
            boolean hasTracing = ctx.hasKey("tracing");
            TracingContext context = tracingContextHolder.getCurrentContext();
            String traceId = context != null ? context.getTraceId() : "null";

            return Mono.just(String.format(
                "Context有追踪: %s, 当前TraceId: %s",
                hasTracing, traceId));
        });
    }
}

最佳实践

1. 采样策略选择

  • 开发环境:使用 100% 采样率便于调试
  • 测试环境:使用规则采样,重点关注关键接口
  • 生产环境:使用自适应采样,平衡性能和可观测性

2. 标签使用规范

// ✅ 好的标签命名
span.setAttribute("http.method", "POST");
span.setAttribute("user.id", userId);
span.setAttribute("business.operation", "payment");

// ❌ 避免的标签命名
span.setAttribute("tag1", "value");  // 不明确的命名
span.setAttribute("user_data", largeObject.toString());  // 过大的值

3. 错误处理

// ✅ 正确的错误记录
span.setAttribute("error", "true");
span.setAttribute("error.type", "validation");
span.recordException(exception);

// ❌ 避免记录敏感错误信息
span.setAttribute("error.details", exception.getMessage());  // 可能包含敏感信息

4. 性能考虑

  • 避免在高频调用路径中添加过多标签
  • 使用异步导出避免影响请求性能
  • 定期清理过期的追踪数据
  • 监控追踪系统自身的资源使用

下一步