Skip to content

Monitoring Configuration Reference

文档版本: 1.0.0
最后更新: 2025-08-19
Git 提交: c1aa5b0f
作者: Lincoln

This document provides a complete configuration reference for the JAiRouter monitoring system, including detailed descriptions, default values, and usage examples for all configuration options.

Configuration File Structure

Main Configuration File

JAiRouter monitoring configuration is primarily defined in application.yml:

# Monitoring configuration
monitoring:
  metrics:
    # Basic configuration
    enabled: true
    prefix: "jairouter"
    collection-interval: 10s

    # Metric categories
    enabled-categories:
      - system
      - business
      - infrastructure

    # Custom tags
    custom-tags:
      environment: "${spring.profiles.active:default}"
      version: "@project.version@"

    # Sampling configuration
    sampling:
      request-metrics: 1.0
      backend-metrics: 1.0
      infrastructure-metrics: 1.0

    # Performance configuration
    performance:
      async-processing: true
      batch-size: 500
      buffer-size: 2000

    # Memory configuration
    memory:
      cache-size: 10000
      cache-expiry: 5m

    # Security configuration
    security:
      data-masking: false
      mask-labels: []

# Spring Actuator configuration
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
      base-path: /actuator

  endpoint:
    health:
      show-details: always
    prometheus:
      cache:
        time-to-live: 10s

  metrics:
    export:
      prometheus:
        enabled: true
        descriptions: true
        step: 10s

Basic Configuration

monitoring.metrics.enabled

Type: Boolean
Default Value: true
Description: Whether to enable monitoring metric collection

monitoring:
  metrics:
    enabled: true  # Enable monitoring
    # enabled: false  # Disable monitoring

Environment Variable: MONITORING_METRICS_ENABLED

monitoring.metrics.prefix

Type: String
Default Value: "jairouter"
Description: Metric name prefix used to distinguish metrics from different applications

monitoring:
  metrics:
    prefix: "jairouter"        # Default prefix
    # prefix: "my-app"         # Custom prefix
    # prefix: ""               # No prefix

monitoring.metrics.collection-interval

Type: Duration
Default Value: 10s
Description: Metric collection interval

monitoring:
  metrics:
    collection-interval: 10s   # 10 seconds
    # collection-interval: 5s  # 5 seconds (more frequent)
    # collection-interval: 30s # 30 seconds (less frequent)

Metric Category Configuration

monitoring.metrics.enabled-categories

Type: List
Default Value: ["system", "business", "infrastructure"]
Description: Enabled metric categories

monitoring:
  metrics:
    enabled-categories:
      - system          # System metrics (JVM, HTTP, etc.)
      - business        # Business metrics (model calls, user sessions, etc.)
      - infrastructure  # Infrastructure metrics (load balancing, rate limiting, circuit breaking, etc.)

Available Values: - system: System metrics such as JVM memory, GC, HTTP requests - business: Business metrics such as model calls, user sessions, business processes - infrastructure: Infrastructure metrics such as load balancing, rate limiting, circuit breaking, health checks

Custom Tags Configuration

monitoring.metrics.custom-tags

Type: Map
Default Value: {}
Description: Custom tags added to all metrics

monitoring:
  metrics:
    custom-tags:
      environment: "${spring.profiles.active:default}"
      version: "@project.version@"
      region: "us-west-1"
      datacenter: "dc1"
      team: "platform"

Notes: - Tag values support Spring expressions and placeholders - Avoid using high-cardinality tags (such as user ID, IP address) - It is recommended that the number of tags does not exceed 10

Sampling Configuration

monitoring.metrics.sampling

Type: Object
Description: Metric sampling rate configuration to control the frequency of metric collection

monitoring:
  metrics:
    sampling:
      request-metrics: 1.0        # Request metric sampling rate (100%)
      backend-metrics: 1.0        # Backend call metric sampling rate
      infrastructure-metrics: 1.0 # Infrastructure metric sampling rate
      system-metrics: 1.0         # System metric sampling rate
      debug-metrics: 0.1          # Debug metric sampling rate (10%)

Sampling Rate Explanation: - 1.0: 100% sampling, collect all metrics - 0.5: 50% sampling, randomly collect half of the metrics - 0.1: 10% sampling, randomly collect one-tenth of the metrics - 0.0: 0% sampling, do not collect metrics

Environment-Specific Configuration:

# Development environment - full sampling for debugging
monitoring:
  metrics:
    sampling:
      request-metrics: 1.0
      backend-metrics: 1.0

# Production environment - reduce sampling rate to reduce overhead
monitoring:
  metrics:
    sampling:
      request-metrics: 0.1
      backend-metrics: 0.5

Performance Configuration

monitoring.metrics.performance

Type: Object
Description: Performance-related configuration

monitoring:
  metrics:
    performance:
      # Asynchronous processing configuration
      async-processing: true
      async-thread-pool-size: 4
      async-thread-pool-max-size: 8
      async-queue-capacity: 1000

      # Batch processing configuration
      batch-size: 500
      batch-timeout: 1s

      # Buffer configuration
      buffer-size: 2000
      buffer-flush-interval: 5s

      # Processing timeout configuration
      processing-timeout: 5s

async-processing

Type: Boolean
Default Value: true
Description: Whether to enable asynchronous metric processing

monitoring:
  metrics:
    performance:
      async-processing: true   # Enable asynchronous processing (recommended)
      # async-processing: false # Synchronous processing (for debugging)

batch-size

Type: Integer
Default Value: 500
Description: Batch processing size, the number of metric events processed at once

monitoring:
  metrics:
    performance:
      batch-size: 500    # Default batch size
      # batch-size: 100  # Small batch, low latency
      # batch-size: 1000 # Large batch, high throughput

buffer-size

Type: Integer
Default Value: 2000
Description: Buffer size, the queue capacity for pending metric events

monitoring:
  metrics:
    performance:
      buffer-size: 2000   # Default buffer size
      # buffer-size: 5000 # Large buffer, handle burst traffic
      # buffer-size: 1000 # Small buffer, save memory

Memory Configuration

monitoring.metrics.memory

Type: Object
Description: Memory usage related configuration

monitoring:
  metrics:
    memory:
      # Cache configuration
      cache-size: 10000
      cache-expiry: 5m
      cache-cleanup-interval: 1m

      # Memory threshold configuration
      memory-threshold: 80
      low-memory-sampling-rate: 0.1

      # Object pool configuration
      object-pool-enabled: true
      object-pool-size: 1000

cache-size

Type: Integer
Default Value: 10000
Description: Metric cache size

cache-expiry

Type: Duration
Default Value: 5m
Description: Cache expiration time

memory-threshold

Type: Integer
Default Value: 80
Description: Memory usage threshold (percentage), low memory mode is enabled when exceeded

Security Configuration

monitoring.metrics.security

Type: Object
Description: Security-related configuration

monitoring:
  metrics:
    security:
      # Data masking
      data-masking: true
      mask-labels:
        - user_id
        - client_ip
        - api_key
        - session_id

      # IP address masking
      ip-masking: true
      ip-mask-pattern: "xxx.xxx.xxx.xxx"

      # Sensitive metric filtering
      sensitive-metrics-filter: true
      filtered-metrics:
        - "*.password.*"
        - "*.secret.*"
        - "*.token.*"

data-masking

Type: Boolean
Default Value: false
Description: Whether to enable data masking

mask-labels

Type: List
Default Value: []
Description: List of tag names that need to be masked

Spring Actuator Configuration

management.endpoints.web.exposure.include

Type: String
Default Value: "health,info"
Description: List of exposed endpoints

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
        # include: "*"  # Expose all endpoints (development environment only)

management.endpoint.prometheus.cache.time-to-live

Type: Duration
Default Value: 10s
Description: Prometheus endpoint cache time

management:
  endpoint:
    prometheus:
      cache:
        time-to-live: 10s  # 10 second cache
        # time-to-live: 0s # Disable cache
        # time-to-live: 60s # 1 minute cache

management.metrics.export.prometheus

Type: Object
Description: Prometheus export configuration

management:
  metrics:
    export:
      prometheus:
        enabled: true
        descriptions: true
        step: 10s
        pushgateway:
          enabled: false
          base-url: http://localhost:9091

Environment-Specific Configuration

Development Environment Configuration

# application-dev.yml
monitoring:
  metrics:
    enabled: true
    sampling:
      request-metrics: 1.0
      backend-metrics: 1.0
      infrastructure-metrics: 1.0
    performance:
      async-processing: false  # For easier debugging
      batch-size: 100
    security:
      data-masking: false

management:
  endpoints:
    web:
      exposure:
        include: "*"  # Expose all endpoints in development environment
  endpoint:
    prometheus:
      cache:
        time-to-live: 1s  # Reduce cache time for easier testing

Test Environment Configuration

# application-test.yml
monitoring:
  metrics:
    enabled: true
    prefix: "test_jairouter"
    sampling:
      request-metrics: 0.1  # Reduce sampling rate to minimize test interference
      backend-metrics: 0.5
    performance:
      async-processing: true
      batch-size: 50
    memory:
      cache-size: 1000

management:
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus

Production Environment Configuration

# application-prod.yml
monitoring:
  metrics:
    enabled: true
    sampling:
      request-metrics: 0.1
      backend-metrics: 0.5
      infrastructure-metrics: 0.1
      system-metrics: 0.5
    performance:
      async-processing: true
      batch-size: 1000
      buffer-size: 5000
    memory:
      cache-size: 20000
      memory-threshold: 85
      low-memory-sampling-rate: 0.01
    security:
      data-masking: true
      mask-labels:
        - user_id
        - client_ip
        - api_key
      ip-masking: true

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    prometheus:
      cache:
        time-to-live: 30s
  security:
    enabled: true

Dynamic Configuration

Runtime Configuration Updates

JAiRouter supports runtime dynamic updates of monitoring configurations:

# Update sampling rate
curl -X POST http://localhost:8080/actuator/monitoring/config \
  -H "Content-Type: application/json" \
  -d '{
    "sampling": {
      "request-metrics": 0.5,
      "backend-metrics": 0.8
    }
  }'

# Enable/disable metric categories
curl -X POST http://localhost:8080/actuator/monitoring/categories \
  -H "Content-Type: application/json" \
  -d '{
    "enabled-categories": ["system", "business"]
  }'

# Update performance configuration
curl -X POST http://localhost:8080/actuator/monitoring/performance \
  -H "Content-Type: application/json" \
  -d '{
    "batch-size": 200,
    "buffer-size": 1000
  }'

Configuration File Hot Reload

Supports updating monitoring configurations through configuration files:

# config/monitoring-override.yml
monitoring:
  metrics:
    sampling:
      request-metrics: 0.3
    performance:
      batch-size: 200

The system will automatically detect configuration file changes and apply the new configuration.

Configuration Validation

Configuration Syntax Validation

# Validate YAML syntax
./mvnw spring-boot:run -Dspring-boot.run.arguments="--spring.config.location=classpath:/application.yml --spring.profiles.active=test"

Configuration Validity Check

# Check current configuration
curl http://localhost:8080/actuator/monitoring/config

# Check metric collection status
curl http://localhost:8080/actuator/monitoring/status

# Verify endpoint accessibility
curl http://localhost:8080/actuator/prometheus

Configuration Best Practices

1. Environment-Specific Configuration

  • Development Environment: Enable all metrics for easier debugging
  • Test Environment: Reduce sampling rate to minimize test interference
  • Production Environment: Balance performance and monitoring accuracy

2. Performance Optimization Configuration

# High-performance configuration
monitoring:
  metrics:
    sampling:
      request-metrics: 0.1
    performance:
      async-processing: true
      batch-size: 1000
      buffer-size: 5000
    memory:
      cache-size: 20000

3. Security Configuration

# Security configuration
monitoring:
  metrics:
    security:
      data-masking: true
      mask-labels:
        - user_id
        - client_ip
        - api_key

management:
  security:
    enabled: true
  server:
    port: 8081
    address: 127.0.0.1

4. Monitoring Configuration

# Monitoring system configuration
monitoring:
  metrics:
    custom-tags:
      monitoring_version: "1.0"
    enabled-categories:
      - system
      - monitoring  # Metrics of the monitoring system itself

Troubleshooting Configuration

Debug Configuration

# Enable debug mode
logging:
  level:
    org.unreal.modelrouter.monitoring: DEBUG
    io.micrometer: DEBUG

monitoring:
  metrics:
    debug:
      enabled: true
      log-metrics: true
      log-interval: 30s

Problem Diagnosis Configuration

# Diagnosis configuration
monitoring:
  metrics:
    diagnostics:
      enabled: true
      collect-jvm-metrics: true
      collect-system-metrics: true
      health-check-interval: 10s

Configuration Templates

Basic Template

# Basic monitoring configuration template
monitoring:
  metrics:
    enabled: true
    prefix: "jairouter"
    enabled-categories:
      - system
      - business
      - infrastructure
    sampling:
      request-metrics: 1.0
      backend-metrics: 1.0
    performance:
      async-processing: true
      batch-size: 500

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    prometheus:
      cache:
        time-to-live: 10s

High-Performance Template

# High-performance monitoring configuration template
monitoring:
  metrics:
    enabled: true
    sampling:
      request-metrics: 0.1
      backend-metrics: 0.5
      infrastructure-metrics: 0.1
    performance:
      async-processing: true
      batch-size: 1000
      buffer-size: 5000
    memory:
      cache-size: 20000
      memory-threshold: 85

Security Template

# Security monitoring configuration template
monitoring:
  metrics:
    enabled: true
    security:
      data-masking: true
      mask-labels:
        - user_id
        - client_ip
        - api_key
      ip-masking: true

management:
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus
  security:
    enabled: true
  server:
    port: 8081
    address: 127.0.0.1

Tip: It is recommended to choose an appropriate configuration template based on the actual environment and requirements, and continuously optimize configuration parameters based on system operation conditions.