Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -945,6 +945,16 @@ Configuration for Redis-backed query result caching to improve performance and r

*Required if `REDIS_ENABLED=true`

#### Health Check Configuration

Configuration for automatic datasource health checks and auto-stopping unhealthy routes.

| Variable | Default | Description | Required |
|-----------------------------------|-------------------|--------------------------------------------------|----------|
| `HEALTH_CHECK_ENABLED` | `true` | Enable/disable health checks globally | Optional |
| `HEALTH_CHECK_INTERVAL_CRON` | `0 0/2 * * * ?` | Cron expression for health check interval | Optional |
| `HEALTH_CHECK_FAILURE_THRESHOLD` | `-1` | Consecutive failures before auto-stopping routes | Optional |

**Important Notes:**

- **Redis is Optional**: Routes work without Redis using `NoOpCacheService` as a fallback. Enable only if you have Redis infrastructure.
Expand Down Expand Up @@ -1014,11 +1024,15 @@ minutesToLive=30

**`com.inovexcorp.queryservice.scheduler.DatasourceHealthCheck.cfg`** - Schedule for datasource health checks
```properties
# Check datasource health every 60 seconds
scheduler.expression=0/60 * * * * ?
# Enable/disable health checks globally
enabled=$[env:HEALTH_CHECK_ENABLED;default=true]

# Cron expression for health check interval (default: every 2 hours)
scheduler.expression=$[env:HEALTH_CHECK_INTERVAL_CRON;default=0 0/2 * * * ?]
scheduler.concurrent=false

# Auto-stop routes after N consecutive failures (-1 = disabled)
consecutiveFailureThreshold=-1
consecutiveFailureThreshold=$[env:HEALTH_CHECK_FAILURE_THRESHOLD;default=-1]
```

**`com.inovexcorp.queryservice.scheduler.CleanHealthRecords.cfg`** - Schedule for health record cleanup
Expand Down Expand Up @@ -3230,8 +3244,10 @@ curl "http://localhost:8080/queryrest/api/routes/cache/info" -k
#### DataSource Health Checks

**Automatic Checks:**
- Default interval: Every 60 seconds
- Default interval: Every 2 hours (configurable via `HEALTH_CHECK_INTERVAL_CRON`)
- Enabled by default (controllable via `HEALTH_CHECK_ENABLED`)
- Configured in: `com.inovexcorp.queryservice.scheduler.DatasourceHealthCheck.cfg`
- Auto-stop routes after N consecutive failures (configured via `HEALTH_CHECK_FAILURE_THRESHOLD`)

**Health States:**
- **UP**: Connection successful
Expand Down Expand Up @@ -4975,6 +4991,9 @@ Cons:
| `CACHE_FAIL_OPEN` | `true` | Boolean | Continue on cache errors (vs fail closed) | Cache |
| `CACHE_STATS_ENABLED` | `true` | Boolean | Track cache hit/miss statistics | Cache |
| `CACHE_STATS_TTL` | `5` | Integer | Cache statistics refresh interval (seconds) | Cache |
| `HEALTH_CHECK_ENABLED` | `true` | Boolean | Enable/disable health checks globally | Health Check |
| `HEALTH_CHECK_INTERVAL_CRON` | `0 0/2 * * * ?` | String | Cron expression for health check interval | Health Check |
| `HEALTH_CHECK_FAILURE_THRESHOLD` | `-1` | Integer | Consecutive failures before auto-stopping routes | Health Check |
| `KEYSTORE` | None | String | Custom SSL keystore path | Security |
| `PASSWORD` | None | String | SSL keystore password | Security |

Expand All @@ -4984,7 +5003,7 @@ Cons:
|-----------|-------------------|-------------|--------------|
| Query Metrics | `0 0/1 * * * ?` | Every 1 minute | Yes |
| Clean Metrics | `0 0/1 * * * ?` | Every 1 minute | Yes |
| Datasource Health Check | `0/60 * * * * ?` | Every 60 seconds | Yes |
| Datasource Health Check | `0 0/2 * * * ?` | Every 2 hours (configurable) | Yes |
| Clean Health Records | `0 0 0 * * ?` | Daily at midnight | Yes |

**Cron Format:** `second minute hour day month weekday`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ public class SimpleAnzoClient implements AnzoClient {
private final String user;
private final String password;
private final HttpClient httpClient;
private final int requestTimeoutSeconds;

public SimpleAnzoClient(String server, String user, String password,
int connectTimeoutSeconds) {
Expand All @@ -53,6 +54,8 @@ public SimpleAnzoClient(String server, String user, String password,
this.server = server;
this.user = user;
this.password = password;
// Use same timeout for both connection and request for health checks
this.requestTimeoutSeconds = connectTimeoutSeconds;
this.httpClient = createHttpClient(connectTimeoutSeconds, validateCertificate);
}

Expand Down Expand Up @@ -159,7 +162,8 @@ public QueryResponse getGraphmarts() throws IOException, InterruptedException {
final long start = System.currentTimeMillis();

try {
HttpResponse<InputStream> resp = makeLdsRequest(GRAPHMARTS_DS_CAT, GM_LOOKUP);
// Use configured request timeout instead of hard-coded 30 seconds
HttpResponse<InputStream> resp = makeLdsRequest(GRAPHMARTS_DS_CAT, GM_LOOKUP, requestTimeoutSeconds);
long duration = System.currentTimeMillis() - start;

if (resp.statusCode() == 200) {
Expand Down Expand Up @@ -214,8 +218,14 @@ public QueryResponse getLayersForGraphmart(String graphmart) throws IOException,

private HttpResponse<InputStream> makeLdsRequest(String dataset, String query)
throws IOException, InterruptedException {
// Use default 30-second timeout for backward compatibility
return makeLdsRequest(dataset, query, 30);
}

private HttpResponse<InputStream> makeLdsRequest(String dataset, String query, int timeoutSeconds)
throws IOException, InterruptedException {
return httpClient.send(
buildRequest(createLdsSparqlUri(dataset), query, RESPONSE_FORMAT.JSON, 30, false),
buildRequest(createLdsSparqlUri(dataset), query, RESPONSE_FORMAT.JSON, timeoutSeconds, false),
HttpResponse.BodyHandlers.ofInputStream());
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
# This configuration controls the timing for the datasource health check service
# that monitors the availability of backend Anzo datasources.
# Health checks are performed every 60 seconds by default.
scheduler.expression=0/30 * * * * ?
# This configuration controls the datasource health check service that monitors
# the availability of backend Anzo datasources.

# Enable or disable health checks globally
# Set HEALTH_CHECK_ENABLED=false to disable all health checking
enabled=$[env:HEALTH_CHECK_ENABLED;default=true]

# Health check interval (cron expression)
# Default: every 2 minutes (0 0/2 * * * ?)
# Previous default was 30 seconds (0/30 * * * * ?)
scheduler.expression=$[env:HEALTH_CHECK_INTERVAL_CRON;default=0 0/2 * * * ?]

# Consecutive failure threshold: Number of consecutive health check failures before
# automatically stopping all routes for a datasource. Set to 0 to disable automatic
# route stopping. Default is -1 failures, indicating not to shut off routes.
consecutiveFailureThreshold=-1
# automatically stopping all routes for a datasource. Set to 0 or -1 to disable automatic
# route stopping. Default is -1, indicating not to shut off routes.
# Recommended: Set to 3 (6 minutes of downtime with 2-minute checks) to auto-disable
# routes for persistently failing datasources.
consecutiveFailureThreshold=$[env:HEALTH_CHECK_FAILURE_THRESHOLD;default=-1]
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package com.inovexcorp.queryservice.health;

/**
* Service interface for health check configuration management.
* Implementations provide access to global health check settings.
*/
public interface HealthCheckConfigService {

/**
* Returns whether health checks are currently enabled globally.
*
* @return true if health checks are enabled, false otherwise
*/
boolean isEnabled();
}
Loading