Resource Registration Deployment Guide⚓︎
The Resource Registration Building Block enables data and metadata ingestion into platform services. It handles:
- Metadata registration into Resource Discovery
- Data registration into Data Access services
- Resource visualisation configuration
Introduction⚓︎
The Resource Registration Building Block manages resource ingestion into the platform for discovery, access and collaboration. It supports:
- Datasets (EO data, auxiliary data)
- Processing workflows
- Jupyter Notebooks
- Web services and applications
- Documentation and metadata
The BB integrates with other platform services to enable:
- Automated metadata extraction
- Resource discovery indexing
- Access control configuration
- Usage tracking
Components Overview⚓︎
The Resource Registration BB comprises three main components:
-
Registration API
An OGC API Processes interface for registering, updating, or deleting resources on the local platform. -
Harvester
Automates workflows (via Flowable BPMN) to harvest data from external sources. This guide demonstrates harvesting Landsat data from USGS. -
Common Registration Library
A Python library consolidating upstream packages (e.g. STAC tools, eometa tools) for business logic in workflows and resource handling.
Prerequisites⚓︎
Before deploying the Resource Registration Building Block, ensure you have the following:
| Component | Requirement | Documentation Link |
|---|---|---|
| Kubernetes | Cluster (tested on v1.32) | Installation Guide |
| Helm | Version 3.7 or newer | Installation Guide |
| kubectl | Configured for cluster access | Installation Guide |
| TLS Certificates | Managed via cert-manager or manually |
TLS Certificate Management Guide |
| Ingress Controller | Properly installed (e.g., NGINX, APISIX) | Installation Guide |
Clone the Deployment Guide Repository:
git clone https://github.com/EOEPCA/deployment-guide
cd deployment-guide/scripts/resource-registration
Validate your environment:
Run the validation script to ensure all prerequisites are met:
Deployment Steps⚓︎
1. Run the Configuration Script⚓︎
Generate configuration files and prepare deployment:
Configuration Parameters
During the script execution, you will be prompted to provide:
INGRESS_HOST: Base domain for ingress hosts.- Example:
example.com
- Example:
CLUSTER_ISSUER: Cert-Manager ClusterIssuer for TLS certificates.- Example:
letsencrypt-http01-apisix
- Example:
FLOWABLE_ADMIN_USER: Admin username for Flowable.- Default:
eoepca
- Default:
FLOWABLE_ADMIN_PASSWORD: Admin password for Flowable.- Default:
eoepca
- Default:
PERSISTENT_STORAGECLASS: Storage Class for persistent volumes (ReadWriteOnce) - e.g. forFlowabledatabase.- Default:
local-path
- Default:
SHARED_STORAGECLASS: Storage Class for shared volumes (ReadWriteMany) - e.g. harvestedeodata.- Default:
standardNote that
RWXis specified for theeodatavolume to which the harvester downloads harvested assets. ARWXvolume is assumed here, in anticipation that other services (pods) will require to exploit the data assets.
- Default:
RESOURCE_REGISTRATION_ENABLE_OIDC: Whether the Resource Registration endpoints should be protected via OIDC authentication.- Default:
yes
- Default:
RESOURCE_REGISTRATION_PROTECTED_TARGETS: Whether the Resource Registration target services for resource registration are protected via OIDC authentication. In this case the Resource Registration (API and harvester) must act as OIDC clients to authenticate against these services.- Default:
yes
- Default:
RESOURCE_REGISTRATION_IAM_CLIENT_ID: The Client ID used both for ingress protection of Resource Registration services, and for Resource Registration to authenticate against protected target services. The associatedCLIENT_SECRETwill be generated.- Default:
resource-registration
- Default:
2. Apply Kubernetes Secrets⚓︎
Create required secrets for the Registration API and Harvester components:
During the script execution, you’ll be prompted for optional external service credentials:
USGS M2M Credentials (for Landsat harvesting)⚓︎
For the purpose of this demonstration, we advise you to create this account so we can showcase the Landsat harvesting capabilities of the Registration Harvester.
If you want to harvest Landsat data, you’ll need credentials from USGS Machine-to-Machine (M2M) API:
- Register for a free account at USGS
- Use the Generate Application Token page
- Create a token with the
M2M APIscope - Enter these credentials when prompted by the script
CDSE Credentials (for Sentinel harvesting)⚓︎
If you plan to harvest Sentinel data from the Copernicus Data Space Ecosystem (CDSE), you’ll need to provide CDSE credentials:
TBD
3. Deploy the Registration API Using Helm⚓︎
The Registration API provides a RESTful interface through which resources can be directly registered, updated, or deleted.
Deploy the Registration API using the generated values file:
helm repo add eoepca-dev https://eoepca.github.io/helm-charts-dev
helm repo update eoepca-dev
helm upgrade -i registration-api eoepca-dev/registration-api \
--version 2.0.0-dev12 \
--namespace resource-registration \
--create-namespace \
--values registration-api/generated-values.yaml
Deploy the ingress routes:
4. Deploy the Registration Harvester Components⚓︎
The Registration Harvester consists of the Flowable engine and worker deployments.
Deploy Flowable Engine⚓︎
helm repo add flowable https://flowable.github.io/helm/
helm repo update flowable
helm upgrade -i registration-harvester-api-engine flowable/flowable \
--version 7.0.0 \
--namespace resource-registration \
--create-namespace \
--values registration-harvester/generated-values.yaml
Deploy the ingress for the Flowable Engine:
Deploy Landsat Harvester Worker⚓︎
Deploy the worker that executes Landsat harvesting tasks:
helm upgrade -i registration-harvester-worker-landsat eoepca-dev/registration-harvester \
--version 2.0.0-rc3 \
--namespace resource-registration \
--create-namespace \
--values registration-harvester/harvester-values/values-landsat.yaml
5. Monitor the Deployment⚓︎
Check the status of all deployments:
Verify all pods are running:
6. Create the Keycloak Client for Resource Registration⚓︎
A Keycloak client is required for Resource Registration for two purposes:
- We want to protect the Resource Registration endpoints via OIDC
Ref.RESOURCE_REGISTRATION_ENABLE_OIDC - The Resource Registration needs to connect with other services that are protected via OIDC (e.g., resource-catalogue, eoapi)
Ref.RESOURCE_REGISTRATION_PROTECTED_TARGETS
If neither of these apply, you can skip this step.
The client can be created using the Crossplane Keycloak provider via the Client CRD.
source ~/.eoepca/state
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: ${RESOURCE_REGISTRATION_IAM_CLIENT_ID}-keycloak-client
namespace: iam-management
stringData:
client_secret: ${RESOURCE_REGISTRATION_IAM_CLIENT_SECRET}
---
apiVersion: openidclient.keycloak.m.crossplane.io/v1alpha1
kind: Client
metadata:
name: ${RESOURCE_REGISTRATION_IAM_CLIENT_ID}
namespace: iam-management
spec:
forProvider:
realmId: ${REALM}
clientId: ${RESOURCE_REGISTRATION_IAM_CLIENT_ID}
name: Resource Registration
description: Resource Registration OIDC
enabled: true
accessType: CONFIDENTIAL
rootUrl: ${HTTP_SCHEME}://registration-api.${INGRESS_HOST}
baseUrl: ${HTTP_SCHEME}://registration-api.${INGRESS_HOST}
adminUrl: ${HTTP_SCHEME}://registration-api.${INGRESS_HOST}
serviceAccountsEnabled: true
directAccessGrantsEnabled: true
standardFlowEnabled: true
oauth2DeviceAuthorizationGrantEnabled: true
useRefreshTokens: true
authorization:
- allowRemoteResourceManagement: false
decisionStrategy: UNANIMOUS
keepDefaults: true
policyEnforcementMode: ENFORCING
validRedirectUris:
- "/*"
webOrigins:
- "/*"
clientSecretSecretRef:
name: ${RESOURCE_REGISTRATION_IAM_CLIENT_ID}-keycloak-client
key: client_secret
providerConfigRef:
name: provider-keycloak
kind: ProviderConfig
EOF
Validation and Usage⚓︎
Automated Validation⚓︎
Run the validation script to verify the deployment:
Access Points⚓︎
Registration API:
Service root:
Swagger / OpenAPI documentation:
Flowable REST API:
source ~/.eoepca/state
xdg-open "${HTTP_SCHEME}://registration-harvester-api.${INGRESS_HOST}/flowable-rest/docs/"
Registering Resources⚓︎
The Registration API provides an OGC API Processes service, through which it exposes the Registration API interfaces:
- Registration:
POST /processes/register/execution - De-registration:
POST /processes/deregister/execution
(if needed) Obtain an Access Token as eoepcauser⚓︎
If the Resource Registration endpoints are protected via OIDC, obtain an access token for the eoepcauser:
source ~/.eoepca/state
# Authenticate as test user `eoepcauser`
ACCESS_TOKEN=$( \
curl --silent --show-error \
-X POST \
-d "username=${KEYCLOAK_TEST_USER}" \
--data-urlencode "password=${KEYCLOAK_TEST_PASSWORD}" \
-d "grant_type=password" \
-d "client_id=${RESOURCE_REGISTRATION_IAM_CLIENT_ID}" \
-d "client_secret=${RESOURCE_REGISTRATION_IAM_CLIENT_SECRET}" \
"${HTTP_SCHEME}://auth.${INGRESS_HOST}/realms/${REALM}/protocol/openid-connect/token" | jq -r '.access_token' \
)
echo "Access Token: ${ACCESS_TOKEN:0:20}..."
Example - Registering a Collection⚓︎
This example registers the STAC Collection landsat-ot-c2-l2 resource into the EOEPCA Resource Catalogue instance - representing the Landsat 8-9 OLI/TIRS Collection 2 Level-2. This collection is used in later steps as a target for harvesting of some example Landsat data.
The target of this registration request is the STAC endpoint of the Resource Catalogue service deployed as part of the Resource Discovery Building Block.
source ~/.eoepca/state
curl -X POST "https://registration-api.${INGRESS_HOST}/processes/register/execution" \
${ACCESS_TOKEN:+-H} ${ACCESS_TOKEN:+Authorization: Bearer ${ACCESS_TOKEN}} \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"inputs": {
"source": {"rel": "collection", "href": "https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/etc/collections/landsat/landsat-ot-c2-l2.json"},
"target": {"rel": "https://api.stacspec.org/v1.0.0/core", "href": "https://resource-catalogue.${INGRESS_HOST}/stac"}
}
}
EOF
Validate Registration⚓︎
Check job status:
If required, authenticate to the Registration API - e.g. as user
eoepcauser.
You should see a new job with the statusCOMPLETED.
If you have deployed the Resource Discovery Building Block, verify the collection:
source ~/.eoepca/state
xdg-open "${HTTP_SCHEME}://resource-catalogue.${INGRESS_HOST}/collections/landsat-ot-c2-l2"
Using the Registration Harvester⚓︎
Deploy Workflow for Landsat harvesting⚓︎
Earlier in this page we deployed the Landsat harvester worker, which is implemented to respond to a specific set of workflow topics - as described by the values deployed with the helm chart:
landsat_discover_data(LandsatDiscoverHandler)landsat_continuous_data_discovery(LandsatContinuousDiscoveryHandler)landsat_get_download_urls(LandsatGetDownloadUrlHandler)landsat_download_data(LandsatDownloadHandler)landsat_untar(LandsatUntarHandler)landsat_extract_metadata(LandsatExtractMetadataHandler)landsat_register_metadata(LandsatRegisterMetadataHandler)
To exploit this we deploy the Landsat workflow, comprising two BPMN processes. The main process (Landsat Registration) searches for new data at USGS. For each new scene found, the workflow executes another process (Landsat Scene Ingestion) which performs the individual steps for harvesting and registering the data.
Main workflow landsat.bpmn
source ~/.eoepca/state
curl -s https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/workflows/landsat.bpmn | \
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
-F "landsat.bpmn=@-;filename=landsat.bpmn;type=text/xml" | jq
Sub-workflow landsat-scene-ingestion.bpmn for individual scene ingestion
source ~/.eoepca/state
curl -s https://raw.githubusercontent.com/EOEPCA/registration-harvester/refs/heads/main/workflows/landsat-scene-ingestion.bpmn | \
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/deployments" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
-F "landsat-scene-ingestion.bpmn=@-;filename=landsat-scene-ingestion.bpmn;type=text/xml" | jq
Execute Landsat Harvesting⚓︎
Start a Landsat harvesting job:
source ~/.eoepca/state
# Get process ID
processes="$( \
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/repository/process-definitions" \
-u "${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD}" \
)"
landsat_process_id="$(echo "$processes" | jq -r '[.data[] | select(.name == "Landsat Workflow")][0].id')"
# Start harvesting
curl -s -X POST "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/runtime/process-instances" \
-u "${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD}" \
-H "Content-Type: application/json" \
-d @- <<EOF | jq
{
"processDefinitionId": "$landsat_process_id",
"variables": [
{
"name": "datetime_interval",
"type": "string",
"value": "2024-11-13T10:00:00Z/2024-11-13T11:00:00Z"
},
{
"name": "collections",
"type": "string",
"value": "landsat-c2l2-sr"
},
{
"name": "bbox",
"type": "string",
"value": "-7,46,3,52"
}
]
}
EOF
Monitor Harvesting Progress⚓︎
Check worker logs:
Use Ctrl-C to exit the log stream.
Note that the harvesting may take some time, depending on download speeds and the number of scenes to be harvested. Therefore the following monitoring steps may be subject to delay.
Monitor process instances:
source ~/.eoepca/state
curl -s "https://registration-harvester-api.${INGRESS_HOST}/flowable-rest/service/runtime/process-instances" \
-u ${FLOWABLE_ADMIN_USER}:${FLOWABLE_ADMIN_PASSWORD} \
| jq -r '.data[] | "\(.startTime) | \(.id) | \(.processDefinitionName)"'
Check registered items:
Once harvesting completes (this may take time depending on download speeds), check the catalogue:
source ~/.eoepca/state
xdg-open "https://resource-catalogue.${INGRESS_HOST}/collections/landsat-ot-c2-l2/items"
Retain the eodata volume⚓︎
Given the time/bandwidth required to retrieve the harvested data - you may want to ensure that the Persistent Volume is retained for future reuse. For example, to reconnect with the downloaded data in the case that the Resource Registration BB is re-deployed.
Depending on your RWX storage class, the Retain reclaim policy may already be set.
Check reclaim policy of the eodata persistent volume…
EODATA_PV=$(kubectl get pvc "eodata" -n "resource-registration" -o jsonpath='{.spec.volumeName}')
POLICY=$(kubectl get pv "$EODATA_PV" -o jsonpath='{.spec.persistentVolumeReclaimPolicy}')
echo -e "\nVolume Reclaim Policy is: $POLICY\n"
Otherwise, you can patch the persistent volume as follows…
EODATA_PV=$(kubectl get pvc "eodata" -n "resource-registration" -o jsonpath='{.spec.volumeName}')
kubectl patch pv "$EODATA_PV" -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Delivery of data assets⚓︎
The default harvesting approach illustrated above maintains the harvested assets into an eodata persistent volume. The metadata records registered with the catalogue assume delivery of these assets via the base URL https://eodata.${INGRESS_HOST}/ - such that the registered STAC Items include asset hrefs that are rooted under this base URL.
Example - Service for asset access⚓︎
By way of an example, a simple NGINX service can be deployed to provide access to these assets - under the service URL https://eodata.${INGRESS_HOST}/ - to correctly resolve the asset hrefs as registered in the STAC Items.
Visualise with STAC Browser⚓︎
Use STAC Browser to navigate the harvested STAC Collection and the referenced assets.
source ~/.eoepca/state
xdg-open "https://radiantearth.github.io/stac-browser/#/external/resource-catalogue.${INGRESS_HOST}/stac/"
Additional Harvester Types⚓︎
The Registration Harvester supports additional data sources beyond Landsat:
- Sentinel data from Copernicus Data Space Ecosystem (CDSE)
- Generic STAC catalogues
Deployment of these additional harvesters follows a similar pattern but requires specific configuration and credentials. Refer to the Registration Harvester Documentation for details.
Uninstallation⚓︎
Remove all Resource Registration components:
source ~/.eoepca/state
# Remove workers
helm uninstall registration-harvester-worker-landsat -n resource-registration
# Remove ingresses
kubectl delete -f registration-harvester/generated-ingress.yaml
kubectl delete -f registration-api/generated-ingress.yaml
kubectl delete -f registration-harvester/generated-eodata-server.yaml 2>/dev/null
# Remove core components
helm uninstall registration-harvester-api-engine -n resource-registration
helm uninstall registration-api -n resource-registration
# Remove IAM resources
kubectl delete client.openidclient.keycloak.m.crossplane.io/${RESOURCE_REGISTRATION_IAM_CLIENT_ID} -n iam-management
kubectl delete secret/${RESOURCE_REGISTRATION_IAM_CLIENT_ID}-keycloak-client -n iam-management
# Remove namespace (optional - will delete all data)
kubectl delete namespace resource-registration