Resource Registration Deployment Guide⚓︎
The Resource Registration Building Block enables data and metadata ingestion into platform services. It handles:
- Metadata registration into Resource Discovery
- Data registration into Data Access services
- Resource visualisation configuration
Introduction⚓︎
The Resource Registration Building Block manages resource ingestion into the platform for discovery, access and collaboration. It supports:
- Datasets (EO data, auxiliary data)
- Processing workflows
- Jupyter Notebooks
- Web services and applications
- Documentation and metadata
The BB integrates with other platform services to enable:
- Automated metadata extraction
- Resource discovery indexing
- Access control configuration
- Usage tracking
Components Overview⚓︎
The Resource Registration BB comprises three main components:
-
Registration API
An OGC API Processes interface for registering, updating, or deleting resources on the local platform. -
Harvester
Automates workflows (via Flowable BPMN) to harvest data from external sources and register them in the platform. -
Common Registration Library
A Python library consolidating upstream packages (e.g. STAC tools, eometa tools) for business logic in workflows and resource handling.
Prerequisites⚓︎
Before deploying the Resource Registration Building Block, ensure you have the following:
Component | Requirement | Documentation Link |
---|---|---|
Kubernetes | Cluster (tested on v1.28) | Installation Guide |
Helm | Version 3.7 or newer | Installation Guide |
kubectl | Configured for cluster access | Installation Guide |
TLS Certificates | Managed via cert-manager or manually |
TLS Certificate Management Guide |
Ingress Controller | Properly installed (e.g., NGINX) | Installation Guide |
Clone the Deployment Guide Repository:
git clone https://github.com/EOEPCA/deployment-guide
cd deployment-guide/scripts/resource-registration
Validate your environment:
Run the validation script to ensure all prerequisites are met:
Deployment Steps⚓︎
1. Run the Configuration Script⚓︎
Generate configuration files and prepare deployment:
Configuration Parameters
During the script execution, you will be prompted to provide:
INGRESS_HOST
: Base domain for ingress hosts.- Example:
example.com
- Example:
CLUSTER_ISSUER
: Cert-Manager ClusterIssuer for TLS certificates.- Example:
letsencrypt-http01-apisix
- Example:
FLOWABLE_ADMIN_USER
: Admin username for Flowable.- Default:
eoepca
- Default:
FLOWABLE_ADMIN_PASSWORD
: Admin password for Flowable.- Default:
eoepca
- Default:
2. Apply Kubernetes Secrets⚓︎
Create required secrets:
Secrets Created:
flowable-admin-credentials
:
Contains Flowable admin username and password
3. Deploy the Registration API Using Helm⚓︎
Deploy the Registration API using the generated values file.
helm repo add eoepca-dev https://eoepca.github.io/helm-charts-dev
helm repo update eoepca-dev
helm upgrade -i registration-api eoepca-dev/registration-api \
--version 2.0.0-rc2 \
--namespace resource-registration \
--create-namespace \
--values registration-api/generated-values.yaml
Deploy the ingress for the Registration API:
4. Deploy the Registration Harvester Using Helm⚓︎
Deploy Flowable Engine:
helm repo add flowable https://flowable.github.io/helm/
helm repo update flowable
helm upgrade -i registration-harvester-api-engine flowable/flowable \
--version 7.0.0 \
--namespace resource-registration \
--create-namespace \
--values registration-harvester/generated-values.yaml
Deploy the ingress for the Flowable Engine:
Deploy Registration Harvester Worker:
By way of example, a worker
is deployed that harvests Landast
data from USGS.
helm repo add eoepca-dev https://eoepca.github.io/helm-charts-dev
helm repo update eoepca-dev
# Version 2.0.0-rc2-pr-63 pending PR #63 - https://github.com/EOEPCA/helm-charts-dev/pull/63
helm upgrade -i landsat-harvester-worker eoepca-dev/registration-harvester \
--version 2.0.0-rc2-pr-63 \
--namespace resource-registration \
--create-namespace \
--values registration-harvester/generated-values.yaml
The Landsat harvester relies upon credentials for the USGS service. These can be obtained via free sign-up at the USGS Machine-to-Machine (M2M) API.
The Landsat harvester worker expects a Kubernetes secret that provides these (and other) credentials.
The Generate Application Token page should be used to create a token with the M2M API
scope - which can then be set into the following environment variables for inclusion in the secret.
Now we can create the landsat-harvester-secret
Kubernetes secret that is expected by the Landsat harvester worker.
source ~/.eoepca/state
kubectl create secret generic landsat-harvester-secret \
--from-literal=FLOWABLE_USER="$FLOWABLE_ADMIN_USER" \
--from-literal=FLOWABLE_PASSWORD="$FLOWABLE_ADMIN_PASSWORD" \
--from-literal=M2M_USER="${M2M_USER}" \
--from-literal=M2M_PASSWORD="${M2M_PASSWORD}" \
--namespace resource-registration \
--dry-run=client -o yaml | kubectl apply -f -
5. Monitor the Deployment⚓︎
Check the status of the deployments:
Validation and Usage⚓︎
Check Kubernetes Resources:
Ensure that all Kubernetes resources are running correctly.
- All pods should be in the
Running
state. - No pods should be in
CrashLoopBackOff
orError
states.
Automated Validation:
This script performs a series of automated tests to validate the deployment.
Registration API Home:
This page provides basic information about the Registration API.
Swagger UI Documentation:
Interactive API documentation allowing you to explore and test the Registration API endpoints.
Flowable REST API Swagger UI:
Provides Swagger UI documentation for the Flowable REST API.
source ~/.eoepca/state
xdg-open "${HTTP_SCHEME}://registration-harvester-api.${INGRESS_HOST}/flowable-rest/docs/"
Registering Resources⚓︎
Resource Registration relies on an OGC API Processes interface, through which it provides the Registration API interfaces:
- Registration:
POST /processes/register/execution
- De-registration:
POST /processes/deregister/execution
These interfaces are illustrated below.
Example - Registering a Collection⚓︎
This example registers a Collection
resource into the EOEPCA Resource Catalogue instance.
This assumes that the Resource Discovery Building Block has been deployed - offering a STAC endpoint.
Use the following command to register an STAC Collection landsat-ot-c2-l2
- representing the Landsat 8-9 OLI/TIRS Collection 2 Level-2
.
This collection is used in later steps as a target for harvesting of some example Landsat data.
source ~/.eoepca/state
curl -X POST "https://registration-api.${INGRESS_HOST}/processes/register/execution" \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"inputs": {
"source": {"rel": "collection", "href": "https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/etc/collections/landsat/landsat-ot-c2-l2.json"},
"target": {"rel": "https://api.stacspec.org/v1.0.0/core", "href": "https://resource-catalogue.${INGRESS_HOST}/stac"}
}
}
EOF
- source: A valid STAC Collection URL (in this example, hosted on GitHub).
(Adjust this path according to your input.) - target: Your STAC server endpoint where the resource is to be registered.
Validating the Registration⚓︎
You should see a new job with the status COMPLETED
.
If you have deployed the Resource Discovery Building Block, then the registered Collection
will also be available at:
source ~/.eoepca/state
xdg-open "${HTTP_SCHEME}://resource-catalogue.${INGRESS_HOST}/collections/landsat-ot-c2-l2"
Collection De-registration⚓︎
Demonstrates use of the API for resource deregistration…
Skip this step if you are intending to perform the example harvesting of Landsat data - as is illustrated in later steps.
source ~/.eoepca/state
curl -X POST "https://registration-api.${INGRESS_HOST}/processes/deregister/execution" \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"inputs": {
"id": "landsat-ot-c2-l2",
"rel": "collection",
"target": {"rel": "https://api.stacspec.org/v1.0.0/core", "href": "https://resource-catalogue.${INGRESS_HOST}/stac"}
}
}
EOF
Using the Registration Harvester⚓︎
The Registration Harvester leverages Flowable to automate resource harvesting workflows.
Access the Flowable REST API Swagger UI:
source ~/.eoepca/state
xdg-open "${HTTP_SCHEME}://registration-harvester-api.${INGRESS_HOST}/flowable-rest/docs/"
List Deployed Workflows
Initially only the built-in Demo processes
workflow is deployed.
source ~/.eoepca/state
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
| jq -r '.data[] | "\(.deploymentTime): \(.name)" '
The Demo processes
workflow provides a number of example processes.
source ~/.eoepca/state
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/process-definitions" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
| jq -r '.data[] | "\(.key): \(.name)" '
Example - Deploy Workflow for Landsat harvesting⚓︎
Earlier in this page we deployed the Landsat harvester worker, which is implemented to respond to a specific set of workflow topics - as described by the values deployed with the helm chart:
- landsat_discover_data (LandsatDiscoverHandler)
- landsat_continuous_data_discovery (LandsatContinuousDiscoveryHandler)
- landsat_get_download_urls (LandsatGetDownloadUrlHandler)
- landsat_download_data (LandsatDownloadHandler)
- landsat_untar (LandsatUntarHandler)
- landsat_extract_metadata (LandsatExtractMetadataHandler)
- landsat_register_metadata (LandsatRegisterMetadataHandler)
To exploit this we deploy the Landsat workflow, comprising two BPMN processes. The main process (Landsat Registration) searches for new data at USGS. For each new scene found, the workflow executes another process (Landsat Scene Ingestion) which performs the individual steps for harvesting and registering the data.
Workflow - Landsat Registration (main)
Deploy the BPMN workflow landsat.bpmn
by POST
to the Flowable service…
source ~/.eoepca/state
curl -s https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/workflows/landsat.bpmn | \
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
-F "landsat.bpmn=@-;filename=landsat.bpmn;type=text/xml" | jq
Sub-Workflow Landsat Scene Ingestion
Deploy the BPMN sub-workflow landsat-scene-ingestion.bpmn
by POST
to the Flowable service…
source ~/.eoepca/state
curl -s https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/workflows/landsat-scene-ingestion.bpmn | \
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
-F "landsat-scene-ingestion.bpmn=@-;filename=landsat-scene-ingestion.bpmn;type=text/xml" | jq
List Deployed Workflows
Now the landsat workflows and associated processes should be listed as deployed.
Workflows…
source ~/.eoepca/state
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
| jq -r '.data[] | "\(.deploymentTime): \(.name)" '
Processes…
# Retrieve processes
processes="$( \
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/process-definitions" \
-u "${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD}" \
)"
echo -e "\nProcess listing..."
echo "$processes" | jq -r '.data[] | "\(.key): \(.name)"'
# Extract Landsat Workflow process ID
landsat_process_id="$(echo "$processes" | jq -r '[.data[] | select(.name == "Landsat Workflow")][0].id')"
echo -e "\nLandsat process ID: ${landsat_process_id}"
Invoke Landsat Harvesting Workflow⚓︎
source ~/.eoepca/state
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/runtime/process-instances" \
-u "${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD}" \
-H "Content-Type: application/json" \
-d @- <<EOF | jq
{
"processDefinitionId": "$landsat_process_id",
"variables": [
{
"name": "datetime_interval",
"type": "string",
"value": "2024-11-13T10:00:00Z/2024-11-13T11:00:00Z"
},
{
"name": "collections",
"type": "string",
"value": "landsat-c2l2-sr"
},
{
"name": "bbox",
"type": "string",
"value": "-7,46,3,52"
}
]
}
EOF
Monitor the job progress⚓︎
Logs…
Process instances…
Expecting an instance of the main Landsat Workflow
process, and for each scene discovered, an instance of the Landsat Scene Ingestion
process.
This may take a few minutes to complete.
source ~/.eoepca/state
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/runtime/process-instances" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
| jq -r '.data[] | "\(.startTime) | \(.id) | \(.processDefinitionName)"'
Check the Catalogue collection⚓︎
Expecting 5 scenes registered into the Landsat collection.
This may take a few minutes to complete.
source ~/.eoepca/state
xdg-open "https://resource-catalogue.${INGRESS_HOST}/collections/landsat-ot-c2-l2/items"
Delivery of data assets
⚓︎
The default harvesting approach illustrated above maintains the harvested assets into an eodata
persistent volume. The metadata records registered with the catalogue assume delivery of these assets via the base URL https://eodata.${INGRESS_HOST}/
- such that the registered STAC Items include asset hrefs that are rooted under this base URL.
Example - Service for asset access⚓︎
By way of an example, a simple NGINX service can be deployed to provide access to these assets - under the service URL https://eodata.${INGRESS_HOST}/
- to correctly resolve the asset hrefs as registered in the STAC Items.
Visualise with STAC Browser⚓︎
Use STAC Browser to navigate the harvested STAC Collection and the referenced assets.
source ~/.eoepca/state
xdg-open "https://radiantearth.github.io/stac-browser/#/external/resource-catalogue.${INGRESS_HOST}/stac/"
Uninstallation⚓︎
To uninstall the Resource Registration Building Block and clean up associated resources:
helm uninstall landsat-harvester-worker -n resource-registration
kubectl delete -f registration-harvester/generated-ingress.yaml
helm uninstall registration-harvester-api-engine -n resource-registration
kubectl delete -f registration-api/generated-ingress.yaml
helm uninstall registration-api -n resource-registration
kubectl delete namespace resource-registration