JAiRouter Slow Query Alert Feature¶
Overview¶
JAiRouter's slow query alert feature is a complete performance monitoring and alerting system that can automatically detect slow operations in the system and send alert notifications based on configured strategies. This feature integrates distributed tracing, structured logging, and Prometheus metric export.
Configuration File Structure¶
JAiRouter uses a modular configuration management approach, with slow query alert configuration located in a separate configuration file:
- Main Configuration File: src/main/resources/application.yml
- Slow Query Alert Configuration File: src/main/resources/config/monitoring/slow-query-alerts.yml
- Environment Configuration Files: src/main/resources/application-{profile}.yml
Modular Configuration Explanation¶
Slow query alert configuration has been separated from the main configuration file and is imported through the spring.config.import
mechanism:
Features¶
🔍 Automatic Slow Query Detection¶
- Automatic slow query detection based on configurable thresholds
- Support for setting different detection thresholds by operation type
- Real-time performance metric collection and analysis
📊 Intelligent Alert Strategy¶
- Frequency-based alert suppression to avoid alert flooding
- Support for severity-level alert strategies
- Configurable alert triggering conditions (minimum occurrences, time intervals, etc.)
📈 Performance Analysis and Statistics¶
- Detailed slow query statistics (count, average time, maximum time, etc.)
- Performance trend analysis and hotspot identification
- Historical data tracking of operation performance
🔗 Complete Integration Support¶
- Integration with distributed tracing systems for complete request chains
- Structured log output for easy log aggregation and analysis
- Prometheus metric export for visualization and alerting
Quick Start¶
1. Enable Slow Query Alerts¶
Configure in slow-query-alerts.yml:
jairouter:
monitoring:
slow-query-alert:
enabled: true
global:
min-interval-ms: 300000 # 5-minute minimum alert interval
min-occurrences: 3 # Trigger alert after 3 slow queries
enabled-severities:
- critical
- warning
2. Configure Operation-Specific Thresholds¶
jairouter:
monitoring:
slow-query-alert:
operations:
chat_request:
enabled: true
min-interval-ms: 180000 # 3 minutes
min-occurrences: 2
enabled-severities:
- critical
- warning
- info
backend_adapter_call:
enabled: true
min-interval-ms: 120000 # 2 minutes
min-occurrences: 3
3. View Alert Status¶
Check alert statistics via REST API:
# Get slow query statistics
curl http://localhost:8080/api/monitoring/slow-queries/stats
# Get alert statistics
curl http://localhost:8080/api/monitoring/slow-queries/alerts/stats
# Get alert system status
curl http://localhost:8080/api/monitoring/slow-queries/alerts/status
Configuration Details¶
Global Configuration¶
Configuration Item | Type | Default Value | Description |
---|---|---|---|
enabled | boolean | true | Whether to enable slow query alerts |
min-interval-ms | long | 300000 | Minimum alert interval (milliseconds) |
min-occurrences | long | 3 | Minimum slow query occurrences to trigger alert |
enabled-severities | Set | [critical, warning] | Enabled alert severity levels |
suppression-window-ms | long | 3600000 | Alert suppression time window |
max-alerts-per-hour | int | 10 | Maximum alerts per hour |
Operation-Specific Configuration¶
Different alert strategies can be configured for different operation types:
operations:
chat_request: # Chat request
min-interval-ms: 180000
min-occurrences: 2
enabled-severities: [critical, warning, info]
embedding_request: # Embedding request
min-interval-ms: 300000
min-occurrences: 5
enabled-severities: [critical, warning]
backend_adapter_call: # Backend adapter call
min-interval-ms: 120000
min-occurrences: 3
enabled-severities: [critical, warning]
Severity Levels¶
The system automatically determines severity based on the ratio of operation duration to threshold:
- critical: Duration ≥ threshold × 5
- warning: Duration ≥ threshold × 3
- info: Duration ≥ threshold × 1
Monitoring Metrics¶
Prometheus Metrics¶
The slow query alert system exports the following Prometheus metrics:
# Slow query total counter
slow_query_total{operation="chat_request", severity="warning"}
# Slow query response time distribution
slow_query_duration_seconds{operation="chat_request"}
# Slow query threshold multiplier
slow_query_threshold_multiplier{operation="chat_request"}
# Slow query alert trigger counter
slow_query_alert_triggered{operation="chat_request", severity="warning"}
# Active slow query alerts
slow_query_alert_active{operation="chat_request", severity="warning"}
Alert Rule Example¶
Configure alert rules in Prometheus:
groups:
- name: jairouter.slow-query-alerts
rules:
- alert: JAiRouterSlowQueryDetected
expr: increase(slow_query_total[5m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Slow query operation detected"
description: "Operation {{ $labels.operation }} detected slow query"
API Endpoints¶
Slow Query Statistics API¶
Returns slow query statistics for all operations.
Alert Statistics API¶
Returns statistics for the alert system:
{
"totalAlertsTriggered": 42,
"totalAlertsSuppressed": 8,
"activeAlertKeys": 3,
"activeOperations": ["chat_request", "embedding_request"],
"alertTriggerRate": 0.84,
"alertSuppressionRate": 0.16,
"averageAlertsPerOperation": 14.0
}
Alert System Status API¶
Returns the current status of the alert system:
{
"enabled": true,
"totalOperations": 5,
"activeAlerts": 3,
"suppressedAlerts": 1,
"lastAlertTime": "2025-08-28T10:30:45Z",
"systemHealth": "HEALTHY"
}
Environment Configuration Overrides¶
Different environments can override slow query alert configuration through corresponding environment configuration files:
Development Environment (application-dev.yml)¶
jairouter:
monitoring:
slow-query-alert:
enabled: true
global:
min-interval-ms: 60000 # Shorter alert interval in development environment
min-occurrences: 1 # Fewer occurrences to trigger in development environment
Production Environment (application-prod.yml)¶
jairouter:
monitoring:
slow-query-alert:
enabled: true
global:
min-interval-ms: 600000 # Longer alert interval in production environment
max-alerts-per-hour: 50 # Higher alert frequency limit in production environment
Best Practices¶
Configuration Management¶
- Base Configuration: Define common configurations in slow-query-alerts.yml
- Environment Differences: Override specific configurations in corresponding environment configuration files
- Threshold Setting: Set reasonable thresholds based on actual business needs and performance test results
Alert Strategy¶
- Severity Levels: Properly use different severity levels of alerts
- Suppression Strategy: Configure appropriate alert suppression to avoid alert flooding
- Notification Channels: Configure different notification channels based on alert severity
Performance Optimization¶
- Sampling Rate: Adjust sampling rate according to system load
- Batch Processing: Properly configure batch processing parameters
- Resource Monitoring: Monitor the resource usage of the slow query alert system itself