Skip to content

Processing - OpenEO ArgoWorkflows with Dask⚓︎

OpenEO ArgoWorkflows provides a Kubernetes-native implementation of the OpenEO API specification using Dask for distributed processing. This deployment offers an alternative to the GeoTrellis backend, leveraging Dask’s parallel computing capabilities for Earth observation data processing.

Note: OIDC authentication is configured by default for OpenEO ArgoWorkflows. The deployment integrates with external OIDC providers (e.g., EGI AAI) for authentication. Refer to the IAM Deployment Guide if you need to set up your own OIDC Provider.


Prerequisites⚓︎

Before deploying, ensure your environment meets these requirements:

Component Requirement Documentation Link
Kubernetes Cluster (tested on v1.28) Installation Guide
Helm Version 3.5 or newer Installation Guide
kubectl Configured for cluster access Installation Guide
Ingress Properly installed Installation Guide
Cert Manager Properly installed Installation Guide
OIDC Provider Required for authentication Installation Guide

Clone the Deployment Guide Repository:

git clone https://github.com/EOEPCA/deployment-guide
cd deployment-guide/scripts/processing/openeo-argo

Validate your environment:

bash check-prerequisites.sh


Deployment Steps⚓︎

1. Run the Configuration Script⚓︎

bash configure-openeo-argo.sh

You’ll be prompted for:

  • INGRESS_HOST: Base domain for ingress hosts (e.g. example.com)
  • PERSISTENT_STORAGECLASS: Kubernetes storage class for persistent volumes
  • CLUSTER_ISSUER: Cert-manager Cluster Issuer for TLS certificates
  • STAC_CATALOG_URL: STAC catalog endpoint (e.g. ${HTTP_SCHEME}://eoapi.${INGRESS_HOST}/stac)
  • OIDC_ISSUER_URL: OIDC provider URL (e.g. https://aai.egi.eu/auth/realms/egi)
  • OIDC_ORGANISATION: OIDC organisation identifier (e.g. egi)

2. Deploy OpenEO ArgoWorkflows⚓︎

The deployment consists of the core API service with PostgreSQL and Redis as supporting services.

# Add the required Helm repositories
helm repo add argo https://argoproj.github.io/argo-helm
helm repo add dask https://helm.dask.org
helm repo update

# Add the git-based Helm repository
git clone https://github.com/jzvolensky/charts

# Deploy OpenEO ArgoWorkflows
helm dependency update charts/eodc/openeo-argo
helm dependency build charts/eodc/openeo-argo


helm upgrade -i openeo charts/eodc/openeo-argo \
    --namespace openeo \
    --create-namespace \
    --values generated-values.yaml \
    --wait --timeout 10m

Then ctrl-c out and run:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: openeo-argo-access-sa.service-account-token
  namespace: openeo
  annotations:
    kubernetes.io/service-account.name: openeo-argo-access-sa
type: kubernetes.io/service-account-token
EOF

If you disabled OIDC authentication, deploy the basic auth proxy:

kubectl apply -f generated-proxy-auth.yaml

Step 3: Deploy Ingress⚓︎

Apply the ingress configuration:

kubectl apply -f generated-ingress.yaml

Step 4: Configure OIDC Client (if using custom OIDC)⚓︎

If you’re using your own OIDC provider rather than EGI AAI, create the client:

bash ../../utils/create-client.sh

When prompted: - Client ID: Use openeo-argo - Redirect URLs: Include https://openeo.${INGRESS_HOST} and https://editor.openeo.org


Validation⚓︎

1. Automated Validation⚓︎

bash validation.sh

This verifies: - All pods in the openeo namespace are running - PostgreSQL and Redis are operational - API endpoints return valid responses - Dask executor image is accessible

2. API Health Check⚓︎

source ~/.eoepca/state
curl -L https://openeo.${INGRESS_HOST}/ | jq .

Expected output: API metadata including version, endpoints, and backend capabilities.

3. Service Discovery⚓︎

# List available collections
curl -L https://openeo.${INGRESS_HOST}/collections | jq .

# List available processes
curl -L https://openeo.${INGRESS_HOST}/processes | jq .

Usage⚓︎

OpenEO Web Editor⚓︎

Test the deployment using the OpenEO Web Editor:

xdg-open https://editor.openeo.org?server=https://openeo.${INGRESS_HOST}

Login Process: 1. Select your OIDC provider (e.g., EGI or EOEPCA) 2. Authenticate with your credentials 3. Upon successful login, explore collections and build processing graphs

Python Client Usage⚓︎

Setup⚓︎

python3 -m venv venv
source venv/bin/activate
pip install openeo

Connect and Authenticate⚓︎

import openeo
import os

# Connect to the service
connection = openeo.connect("https://openeo.${INGRESS_HOST}")

# Authenticate via OIDC
connection.authenticate_oidc()

Submit a Dask-Powered Job⚓︎

# Load a collection
datacube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 11.4, "south": 46.5, "east": 11.5, "north": 46.6},
    temporal_extent=["2024-06-01", "2024-06-30"],
    bands=["B04", "B08"]
)

# Calculate NDVI
red = datacube.band("B04")
nir = datacube.band("B08")
ndvi = (nir - red) / (nir + red)

# Submit as batch job
job = ndvi.create_job(title="NDVI Calculation with Dask")
job.start_and_wait()

# Download results
job.download_results("ndvi_results/")

Monitor Dask Cluster⚓︎

The Dask cluster automatically scales based on workload. Monitor active workers:

# Get job details including Dask cluster information
job_info = job.describe()
print(f"Job status: {job_info['status']}")
print(f"Dask workers: {job_info.get('usage', {}).get('dask_workers', 'N/A')}")

Direct API Usage⚓︎

Submit a Synchronous Processing Request⚓︎

# Get access token (adjust for your OIDC provider)
ACCESS_TOKEN=$(curl -s -X POST \
    "${OIDC_ISSUER_URL}/protocol/openid-connect/token" \
    -d "grant_type=password" \
    -d "username=${OIDC_USERNAME}" \
    -d "password=${OIDC_PASSWORD}" \
    -d "client_id=${OIDC_CLIENT_ID}" \
    -d "scope=openid" | jq -r '.access_token')

# Submit processing request
curl -X POST "https://openeo.${INGRESS_HOST}/result" \
  -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "process": {
      "process_graph": {
        "load": {
          "process_id": "load_collection",
          "arguments": {
            "id": "SENTINEL2_L2A",
            "spatial_extent": {
              "west": 11.4, "south": 46.5,
              "east": 11.5, "north": 46.6
            },
            "temporal_extent": ["2024-06-01", "2024-06-10"]
          }
        },
        "save": {
          "process_id": "save_result",
          "arguments": {
            "data": {"from_node": "load"},
            "format": "GTiff"
          },
          "result": true
        }
      }
    }
  }'

Further Reading⚓︎