Skip to content

Processing - OpenEO ArgoWorkflows with Dask⚓︎

Note: This Building Block is under active development. Some features may still be evolving, so we recommend using it with consideration as updates are rolled out.

OpenEO ArgoWorkflows provides a Kubernetes-native implementation of the OpenEO API specification using Dask for distributed processing. This deployment offers an alternative to the GeoTrellis backend, leveraging Dask’s parallel computing capabilities for Earth observation data processing.

Note: OIDC authentication is configured by default for OpenEO ArgoWorkflows. The deployment integrates with external OIDC providers (e.g., EGI AAI) for authentication. Refer to the IAM Deployment Guide if you need to set up your own OIDC Provider.


Prerequisites⚓︎

Before deploying, ensure your environment meets these requirements:

Component Requirement Documentation Link
Kubernetes Cluster (tested on v1.28+) Installation Guide
Helm Version 3.5 or newer Installation Guide
kubectl Configured for cluster access Installation Guide
Ingress Properly installed Installation Guide
Cert Manager Properly installed Installation Guide
OIDC Provider Required for authentication Installation Guide
STAC Catalogue Required for data access eoAPI Deployment

Clone the Deployment Guide Repository:

git clone https://github.com/EOEPCA/deployment-guide
cd deployment-guide/scripts/processing/openeo-argo

Validate your environment:

bash check-prerequisites.sh


Deployment Steps⚓︎

1. Run the Configuration Script⚓︎

bash configure-openeo-argo.sh

You’ll be prompted for:

Parameter Description Example
INGRESS_HOST Base domain for ingress hosts example.com
PERSISTENT_STORAGECLASS Kubernetes storage class for persistent volumes standard
CLUSTER_ISSUER Cert-manager Cluster Issuer for TLS certificates letsencrypt-prod
OPENEO_ARGO_ENABLE_OIDC Enable OIDC authentication (yes/no) yes
OIDC_ISSUER_URL OIDC provider URL (if OIDC enabled) https://auth.example.com/realms/eoepca
OIDC_ORGANISATION OIDC organisation identifier (if OIDC enabled) eoepca
STAC_CATALOG_ENDPOINT STAC catalog URL https://eoapi.example.com/stac

2. Add Helm Repositories⚓︎

helm repo add argo https://argoproj.github.io/argo-helm
helm repo add dask https://helm.dask.org
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

3. Prepare the Helm Chart⚓︎

Clone the charts repository and build dependencies:

git clone https://github.com/jzvolensky/charts
helm dependency update charts/eodc/openeo-argo
helm dependency build charts/eodc/openeo-argo

4. Deploy OpenEO ArgoWorkflows⚓︎

helm upgrade -i openeo charts/eodc/openeo-argo \
    --namespace openeo \
    --create-namespace \
    --values generated-values.yaml \
    --timeout 10m

5. Deploy Ingress⚓︎

kubectl apply -f generated-ingress.yaml

6. Deploy Basic Auth Proxy (if OIDC disabled)⚓︎

If you disabled OIDC authentication during configuration:

kubectl apply -f generated-proxy-auth.yaml

7. Configure OIDC Client (if using custom OIDC)⚓︎

A Keycloak client is required for the ingress protection of the Processing BB openEO Argo Engine. The client can be created using the Crossplane Keycloak provider via the Client CRD.

source ~/.eoepca/state
cat <<EOF | kubectl apply -f -
apiVersion: openidclient.keycloak.m.crossplane.io/v1alpha1
kind: Client
metadata:
  name: openeo-argo
  namespace: iam-management
spec:
  forProvider:
    realmId: ${REALM}
    clientId: openeo-argo
    name: openEO Argo Engine
    description: openEO Argo Engine OIDC
    enabled: true
    accessType: PUBLIC
    rootUrl: ${HTTP_SCHEME}://openeo.${INGRESS_HOST}
    baseUrl: ${HTTP_SCHEME}://openeo.${INGRESS_HOST}
    adminUrl: ${HTTP_SCHEME}://openeo.${INGRESS_HOST}
    directAccessGrantsEnabled: true
    standardFlowEnabled: true
    oauth2DeviceAuthorizationGrantEnabled: true
    useRefreshTokens: true
    validRedirectUris:
      - "/*"
      - "https://editor.openeo.org/*"
    webOrigins:
      - "+"
  providerConfigRef:
    name: provider-keycloak
    kind: ProviderConfig
EOF

The Client should be created successfully.

Then remove the role Clients → openeo-public → Client scopes tab Remove roles or other scopes from “Assigned default client scopes” if they’re adding the audience


Validation⚓︎

Automated Validation⚓︎

bash validation.sh

This verifies: - All pods in the openeo namespace are running - PostgreSQL and Redis are operational - API endpoints return valid responses

Manual Validation⚓︎

Check pod status:

kubectl get pods -n openeo

API Health Check:

source ~/.eoepca/state

# Without authentication (basic info only)
curl -s https://openeo.${INGRESS_HOST}/openeo/1.1.0 | jq .

# With basic auth (if OIDC disabled)
curl -s -u eoepcauser:eoepcapass https://openeo.${INGRESS_HOST}/openeo/1.1.0 | jq .

List available processes:

curl -s https://openeo.${INGRESS_HOST}/openeo/1.1.0/processes | jq '[.processes[].id] | sort'

Check Argo Workflows:

kubectl get workflows -n openeo


API Usage⚓︎

Submit and monitor a job:

# Get access token
ACCESS_TOKEN=$(curl -s -X POST \
    "${OIDC_ISSUER_URL}/protocol/openid-connect/token" \
    -d "grant_type=password" \
    -d "username=${KEYCLOAK_TEST_USER}" \
    -d "password=${KEYCLOAK_TEST_PASSWORD}" \
    -d "client_id=openeo-argo" \
    -d "scope=openid" | jq -r '.access_token')
AUTH_TOKEN="oidc/eoepca/${ACCESS_TOKEN}"

# Create a job
JOB_ID=$(curl -s -i -X POST "https://openeo.${INGRESS_HOST}/openeo/1.1.0/jobs" \
  -H "Authorization: Bearer ${AUTH_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "process": {
      "process_graph": {
        "load": {
          "process_id": "load_collection",
          "arguments": {
            "id": "your-collection-id",
            "spatial_extent": {"west": -34.0, "south": 38.8, "east": -33.0, "north": 39.5},
            "temporal_extent": ["2025-10-20", "2025-10-31"]
          }
        },
        "save": {
          "process_id": "save_result",
          "arguments": {
            "data": {"from_node": "load"},
            "format": "GTiff"
          },
          "result": true
        }
      }
    },
    "title": "Test Job"
  }' | grep -i "^openeo-identifier:" | cut -d' ' -f2 | tr -d '\r\n')

echo "Created job: ${JOB_ID}"

# Start the job
curl -s -X POST "https://openeo.${INGRESS_HOST}/openeo/1.1.0/jobs/${JOB_ID}/results" \
  -H "Authorization: Bearer ${AUTH_TOKEN}"

# Check status
curl -s "https://openeo.${INGRESS_HOST}/openeo/1.1.0/jobs/${JOB_ID}" \
  -H "Authorization: Bearer ${AUTH_TOKEN}" | jq '{id, status, title}'

# List all jobs
curl -s "https://openeo.${INGRESS_HOST}/openeo/1.1.0/jobs" \
  -H "Authorization: Bearer ${AUTH_TOKEN}" | jq

Note: The STAC catalogue must contain collections with data formatted for OpenEO processing. Check the available collections at your STAC endpoint and ensure the spatial/temporal extent matches actual data.


Further Reading⚓︎