Resource Health Deployment Guide⚓︎
The Resource Health Building Block (BB) provides a flexible framework for monitoring the health and status of resources within the EOEPCA platform. This includes core platform services as well as derived or user-provided resources such as datasets, workflows, or user applications.
Introduction⚓︎
The Resource Health BB allows you to:
- Define and schedule automated health checks (e.g. daily, hourly).
- Observe and visualise check outcomes via a web dashboard.
- Integrate with external services (e.g. IAM for OIDC authentication, Data Access, Resource Catalogue).
- Store results in OpenSearch, optionally visualizing them using OpenSearch Dashboards.
- Collect telemetry via OpenTelemetry, enabling advanced monitoring and alerting.
Components Overview⚓︎
- Resource Health Web
- Dashboard and front-end for viewing health checks and results.
- By default, can be secured with OIDC authentication (e.g. via Keycloak).
- Resource Health API(s)
- Telemetry API for gathering check results and metrics.
- Health Checks API (Check Manager) for listing, scheduling, and managing checks.
- Health Check Runner
- A flexible engine that executes your custom health checks at scheduled intervals.
- Mock API (optional sample)
- An example test resource used in demonstration checks (e.g. an hourly check to a mock endpoint).
- OpenSearch & OpenSearch Dashboards
- Stores logs, results, and trace data from your checks.
- Provides advanced visualisation and analytics features.
- OpenTelemetry Collector
- Receives telemetry from health checks and forward them to OpenSearch.
Prerequisites⚓︎
Before deploying the Resource Health Building Block, ensure you have the following:
Component | Requirement | Documentation Link |
---|---|---|
Kubernetes | Cluster (tested on v1.28) | Installation Guide |
Git | Properly installed | Installation Guide |
Helm | Version 3.5 or newer | Installation Guide |
Helm plugins | helm-git : Version 1.3.0 tested |
Installation Guide |
kubectl | Configured for cluster access | Installation Guide |
Ingress Controller | Properly installed (e.g., NGINX) | Installation Guide |
Internal TLS Certificates | ClusterIssuer for internal certificates | Internal TLS Setup |
Clone the Deployment Guide Repository:
Validate your environment:
This script checks common prerequisites, including your Kubernetes/Helm installation, Git, and any required Helm plugins.
Deployment Steps⚓︎
1. Run the Configuration Script⚓︎
The configure-resource-health.sh
script gathers basic configuration inputs (such as your internal ClusterIssuer for TLS, storage class, etc.) and generates a generated-values.yaml
that tailors the Resource Health deployment to your environment.
During execution, you will be prompted for:
INGRESS_HOST
: Hostname.INTERNAL_CLUSTER_ISSUER
: Name of the cert-manager ClusterIssuer for internal TLS. (Default:eoepca-ca-clusterissuer
)STORAGE_CLASS
: Storage class for persistent volumes. (Default:standard
)
2. Deploy the Resource Health BB (Helm)⚓︎
- Install or upgrade Resource Health
Note: While the Resource Health BB is not yet in the official EOEPCA Helm charts, you can install it directly from the GitHub repository.
-
Clone the Resource Health repository and update dependencies:
-
Install or upgrade the Resource Health Helm chart:
As part of this deployment, you will have a preconfigured healthcheck that runs every minute.
2. Configure Ingress⚓︎
By default, Resource Health is designed to be flexible with Ingress and OIDC configurations.
For the purpose of this guide, the configuration script created a sample APISIX Ingress resource in generated-ingress.yaml
that you can apply or adapt to your environment.
3. Monitor the Deployment⚓︎
After the Helm installation finishes, check that all pods are running in the resource-health namespace:
Validation⚓︎
- Run the validation script:
- Access the Resource Health Web:
Access the Resource Health Web dashboard at:
Access the Health Checks at:
Check the Telemetry service status at:
Usage⚓︎
1. Defining Health Checks⚓︎
Health checks are typically defined in the Helm chart’s values under resource-health.healthchecks.checks
. Each check has:
- name
- schedule (a cron expression like
"@hourly"
or"0 8 * * *"
) - requirements (optional Python packages)
- script (the actual test logic)
- env (environment variables, e.g. references to external services)
Defining Health Checks⚓︎
Helm-based (preferred for GitOps or static config):
resource-health:
healthchecks:
checks:
- name: daily-trivial-check
schedule: "0 8 * * *"
requirements: "https://example.com/requirements.txt"
script: "https://example.com/trivial_check.py"
env:
- name: SOME_HOST
value: "https://some-endpoint.example.com"
Apply with:
helm upgrade -i resource-health reference-repo/resource-health-reference-deployment -f generated-values.yaml -n resource-health
UI-based (via the Resource Health Web):
Visit the Resource Health Web dashboard and select the Create new check dropdown to define a new health check.
Fill in the form similarly to the Helm-based approach, including the template, schedule, name, script and requirements.
Uninstallation⚓︎
To remove all Resource Health components and the namespace: