On Your Environment
Install the Agent Manager on an existing Kubernetes cluster — AWS EKS, Google GKE, Azure AKS, or any distribution with LoadBalancer support.
The Quick Start Guide installs everything in a single command using a dev container with k3d. Use this page when you need to install on a managed cluster.
What You Will Get
Agent Manager is a two-layer system installed in two phases:
-
Phase 1 — OpenChoreo (base layer): OpenChoreo is an open-source platform that provides the Kubernetes infrastructure Agent Manager runs on. It consists of four planes: a Control Plane for API and configuration, a Data Plane for running workloads and gateways, a Workflow Plane for builds and CI pipelines, and an Observability Plane for traces, logs, and metrics via OpenSearch.
-
Phase 2 — Agent Manager : The AI agent management platform installed on top of OpenChoreo. It includes the Console (web UI), AMP API (backend), AI Gateway, PostgreSQL (database), Secrets Extension (OpenBao for runtime secret injection), Traces Observer (trace querying), and Evaluation Engine (automated agent evaluations).
This guide installs both layers on your existing Kubernetes cluster.
This setup is for development and exploration. For production deployments, see the Production Considerations section.
Prerequisites
Cluster Requirements
| Requirement | Minimum |
|---|---|
| Kubernetes version | 1.32+ |
| Nodes | 3 |
| CPU per node | 4 cores |
| RAM per node | 8 GB |
| LoadBalancer support | Required |
| Default StorageClass | Required |
Supported Providers
- Amazon Web Services (EKS)
- Google Cloud Platform (GKE)
- Microsoft Azure (AKS)
- Any Kubernetes distribution with LoadBalancer support
Required Tools
| Tool | Version | Purpose |
|---|---|---|
| kubectl | v1.32+ | Kubernetes CLI |
| Helm | v3.12+ | Kubernetes package manager |
| curl / dig | — | DNS resolution of LoadBalancer hostnames |
kubectl version --client && helm version
Permissions
You need sufficient privileges to:
- Create namespaces, deploy Helm charts
- Create LoadBalancer services
- Manage cert-manager Issuers and Certificates
- Create CRDs and ClusterRoles
Configuration Variables
Set these once before starting the installation. Most subsequent commands reference these variables, though some examples (troubleshooting, uninstall) use literal values — substitute the corresponding variable if you have customised the defaults.
export VERSION="0.12.0"
export HELM_CHART_REGISTRY="ghcr.io/wso2"
export AMP_NS="wso2-amp"
export BUILD_CI_NS="openchoreo-workflow-plane"
export OBSERVABILITY_NS="openchoreo-observability-plane"
export DEFAULT_NS="default"
export DATA_PLANE_NS="openchoreo-data-plane"
export SECRETS_NS="amp-secrets"
export THUNDER_NS="amp-thunder"
OpenChoreo API URL — used by backend services to reach the OpenChoreo Control Plane API:
export OPENCHOREO_INTERNAL_URL="http://openchoreo-api.openchoreo-control-plane.svc.cluster.local:8080"
Thunder (Identity Provider) URLs — Thunder must be reachable from the browser for OAuth login. Set THUNDER_PUBLIC_URL to the URL where the browser will access Thunder:
# Port-forwarding (default for dev):
export THUNDER_PUBLIC_URL="http://localhost:8090"
# Public deployment example:
# export THUNDER_PUBLIC_URL="https://thunder.yourdomain.com"
# In-cluster URL (used by backend services for JWKS/token calls):
export THUNDER_INTERNAL_URL="http://amp-thunder-extension-service.${THUNDER_NS}.svc.cluster.local:8090"
Console URLs — Set to how the browser will reach the console and API:
# Port-forwarding (default for dev):
export CONSOLE_PUBLIC_URL="http://localhost:3000"
export API_PUBLIC_URL="http://localhost:9000"
export OBS_API_PUBLIC_URL="http://localhost:9098"
export INSTRUMENTATION_URL="http://localhost:22893/otel"
# Public deployment example:
# export CONSOLE_PUBLIC_URL="https://console.yourdomain.com"
# export API_PUBLIC_URL="https://api.yourdomain.com"
# export OBS_API_PUBLIC_URL="https://obs.yourdomain.com"
# export INSTRUMENTATION_URL="https://otel.yourdomain.com/otel"
Phase 1: OpenChoreo Platform
OpenChoreo organises its infrastructure into four planes, each handling a different concern:
- Control Plane — API server and configuration management for the platform
- Data Plane — runs deployed workloads and API gateways
- Workflow Plane — builds and CI pipelines for agent deployments
- Observability Plane — trace, log, and metrics collection via OpenSearch
This phase also installs Thunder (the identity provider) as a prerequisite, since the Control Plane and Observability Plane require Thunder's OIDC endpoints for JWT validation. Estimated time: ~20-30 minutes (varies by cluster and network).
Step 1: Install Cluster Prerequisites
Gateway API CRDs (v1.4.1) — standard Kubernetes resources for managing network gateways and routing:
kubectl apply --server-side --force-conflicts \
-f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.1/experimental-install.yaml
The --force-conflicts flag is needed if your cluster already has Gateway API CRDs managed by another controller (e.g., Traefik on k3s/Rancher Desktop).
cert-manager (v1.19.2) — automates TLS certificate issuance and renewal:
helm upgrade --install cert-manager oci://quay.io/jetstack/charts/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.19.2 \
--set crds.enabled=true \
--set startupapicheck.timeout=5m \
--wait --timeout 360s
External Secrets Operator (v1.3.2) — syncs secrets from external stores (like OpenBao) into Kubernetes:
helm upgrade --install external-secrets oci://ghcr.io/external-secrets/charts/external-secrets \
--namespace external-secrets \
--create-namespace \
--version 1.3.2 \
--set installCRDs=true \
--wait --timeout 180s
kgateway (v2.2.1) — the network gateway for OpenChoreo planes:
helm upgrade --install kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
--create-namespace \
--namespace openchoreo-control-plane \
--version v2.2.1
helm upgrade --install kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
--namespace openchoreo-control-plane \
--create-namespace \
--version v2.2.1 \
--set controller.extraEnv.KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES=true
Rancher Desktop / k3s users
k3s ships with Traefik which binds to host ports 80/443 and conflicts with OpenChoreo's kgateway. Remove Traefik before proceeding:
helm uninstall traefik -n kube-system
helm uninstall traefik-crd -n kube-system
After removing Traefik, re-apply the Gateway API CRDs (Traefik's CRD chart may have removed them):
kubectl apply --server-side --force-conflicts \
-f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.1/experimental-install.yaml
Step 2: Setup Secrets Store (OpenBao)
OpenBao provides secret management for the Workflow Plane and deployed agents:
helm upgrade --install openbao oci://ghcr.io/openbao/charts/openbao \
--namespace openbao \
--create-namespace \
--version 0.25.6 \
--values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-openbao.yaml \
--timeout 180s
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=openbao -n openbao --timeout=120s
Configure the External Secrets ClusterSecretStore:
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets-openbao
namespace: openbao
---
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
name: default
spec:
provider:
vault:
server: "http://openbao.openbao.svc:8200"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "openchoreo-secret-writer-role"
serviceAccountRef:
name: "external-secrets-openbao"
namespace: "openbao"
EOF
OpenBao is installed in dev mode (in-memory backend) for this guide. For production, disable dev mode and configure persistent storage.
Step 3: Setup TLS
Create a self-signed CA chain for cluster-wide certificate issuance:
kubectl apply -f - <<'EOF'
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned-bootstrap
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: openchoreo-ca
namespace: cert-manager
spec:
isCA: true
commonName: openchoreo-ca
secretName: openchoreo-ca-secret
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: selfsigned-bootstrap
kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: openchoreo-ca
spec:
ca:
secretName: openchoreo-ca-secret
EOF
kubectl wait --for=condition=Ready certificate/openchoreo-ca -n cert-manager --timeout=60s
For production, replace the self-signed CA with a trusted certificate authority (Let's Encrypt, AWS ACM, etc.).
Step 4: Install Thunder Extension (Identity Provider)
Thunder provides authentication and user management for the entire platform — login, API keys, and OAuth token exchange. It must be installed before the Control Plane because the Control Plane, Observability Plane, and Agent Manager all validate JWTs issued by Thunder.
helm install amp-thunder-extension \
oci://${HELM_CHART_REGISTRY}/wso2-amp-thunder-extension \
--version ${VERSION} \
--namespace ${THUNDER_NS} \
--create-namespace \
--set thunder.configuration.server.publicUrl="${THUNDER_PUBLIC_URL}" \
--set thunder.configuration.jwt.issuer="${THUNDER_PUBLIC_URL}" \
--set thunder.configuration.gateClient.hostname="localhost" \
--set thunder.configuration.gateClient.port=8090 \
--timeout 1800s
kubectl wait --for=condition=Available \
deployment -l app.kubernetes.io/instance=amp-thunder-extension \
-n ${THUNDER_NS} --timeout=300s
Thunder persists its configuration (including the issuer URL) in a database on first boot. If you need to change THUNDER_PUBLIC_URL after installation, you must uninstall the chart, delete its PVC, and reinstall — a helm upgrade alone will not change the issuer in issued tokens.
kubectl exec -n ${THUNDER_NS} deploy/amp-thunder-extension-deployment -- \
wget -qO- http://localhost:8090/.well-known/openid-configuration 2>/dev/null \
| grep -o '"issuer":"[^"]*"'
# Expected: "issuer":"${THUNDER_PUBLIC_URL}" (must match your THUNDER_PUBLIC_URL value)
Step 5: Install OpenChoreo Control Plane
Do an initial install with placeholder hostnames to provision the LoadBalancer:
helm upgrade --install openchoreo-control-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-control-plane \
--create-namespace \
--values - <<'EOF'
openchoreoApi:
http:
hostnames:
- "api.placeholder.tld"
backstage:
enabled: false
baseUrl: ""
http:
hostnames:
- ""
security:
oidc:
issuer: "https://thunder.placeholder.tld"
gateway:
tls:
enabled: false
EOF
Wait for the LoadBalancer IP and derive the base domain:
echo "Waiting for LoadBalancer IP..."
kubectl get svc gateway-default -n openchoreo-control-plane -w
Once the IP appears, set the domain:
CP_LB_IP=$(kubectl get svc gateway-default -n openchoreo-control-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$CP_LB_IP" ]; then
CP_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-control-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
CP_LB_IP=$(dig +short "$CP_LB_HOSTNAME" | head -1)
fi
export CP_BASE_DOMAIN="openchoreo.${CP_LB_IP//./-}.nip.io"
echo "Control Plane domain: ${CP_BASE_DOMAIN}"
EKS LoadBalancers return a hostname instead of an IP. Use dig to resolve it. For internet-facing access, add annotation: service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
Create a wildcard TLS certificate:
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cp-gateway-tls
namespace: openchoreo-control-plane
spec:
secretName: cp-gateway-tls
issuerRef:
name: openchoreo-ca
kind: ClusterIssuer
dnsNames:
- "*.${CP_BASE_DOMAIN}"
- "${CP_BASE_DOMAIN}"
privateKey:
rotationPolicy: Always
EOF
kubectl wait --for=condition=Ready certificate/cp-gateway-tls \
-n openchoreo-control-plane --timeout=60s
Reconfigure with real hostnames, TLS, and Thunder OIDC:
helm upgrade openchoreo-control-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-control-plane \
--reuse-values \
--values - <<EOF
openchoreoApi:
config:
server:
publicUrl: "https://api.${CP_BASE_DOMAIN}"
security:
authentication:
jwt:
jwks:
skip_tls_verify: true
http:
hostnames:
- "api.${CP_BASE_DOMAIN}"
backstage:
enabled: false
baseUrl: ""
http:
hostnames:
- ""
security:
oidc:
issuer: "${THUNDER_PUBLIC_URL}"
wellKnownEndpoint: "${THUNDER_INTERNAL_URL}/.well-known/openid-configuration"
jwksUrl: "${THUNDER_INTERNAL_URL}/oauth2/jwks"
authorizationUrl: "${THUNDER_PUBLIC_URL}/oauth2/authorize"
tokenUrl: "${THUNDER_INTERNAL_URL}/oauth2/token"
gateway:
tls:
enabled: true
hostname: "*.${CP_BASE_DOMAIN}"
certificateRefs:
- name: cp-gateway-tls
EOF
kubectl wait --for=condition=Available \
deployment --all -n openchoreo-control-plane --timeout=300s
skip_tls_verify: true disables JWKS TLS certificate validation. This is required here because the self-signed CA is not yet trusted by the Control Plane. For production, use CA-signed certificates and set skip_tls_verify: false (or remove the override entirely).
What the configuration does
- Backstage disabled (AMP provides its own console)
- OIDC issuer set to
THUNDER_PUBLIC_URL(matches theissclaim in Thunder-issued JWTs) - OIDC JWKS URL points to Thunder's in-cluster service (avoids external DNS dependency)
- OpenChoreo API at
api.${CP_BASE_DOMAIN} - TLS enabled with wildcard certificate
Step 6: Setup Data Plane
Copy the cluster-gateway CA certificate:
kubectl create namespace openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -
CA_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -
TLS_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
--from-literal=tls.crt="$TLS_CRT" \
--from-literal=tls.key="$TLS_KEY" \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -
Install the Data Plane:
helm install openchoreo-data-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-data-plane \
--create-namespace \
--set gateway.tls.enabled=false \
--set clusterAgent.tls.generateCerts=true \
--values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-dp.yaml
Wait for the Data Plane LoadBalancer and configure TLS:
kubectl get svc gateway-default -n openchoreo-data-plane -w
DP_LB_IP=$(kubectl get svc gateway-default -n openchoreo-data-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$DP_LB_IP" ]; then
DP_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-data-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
DP_LB_IP=$(dig +short "$DP_LB_HOSTNAME" | head -1)
fi
export DP_DOMAIN="apps.openchoreo.${DP_LB_IP//./-}.nip.io"
echo "Data Plane domain: ${DP_DOMAIN}"
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: dp-gateway-tls
namespace: openchoreo-data-plane
spec:
secretName: dp-gateway-tls
issuerRef:
name: openchoreo-ca
kind: ClusterIssuer
dnsNames:
- "*.${DP_DOMAIN}"
- "${DP_DOMAIN}"
privateKey:
rotationPolicy: Always
EOF
kubectl wait --for=condition=Ready certificate/dp-gateway-tls \
-n openchoreo-data-plane --timeout=60s
helm upgrade openchoreo-data-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-data-plane \
--reuse-values \
--values - <<EOF
gateway:
tls:
enabled: true
hostname: "*.${DP_DOMAIN}"
certificateRefs:
- name: dp-gateway-tls
EOF
kubectl wait --for=condition=Available \
deployment --all -n openchoreo-data-plane --timeout=600s
Register the Data Plane:
CA_CERT=$(kubectl get secret cluster-agent-tls \
-n openchoreo-data-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ClusterDataPlane
metadata:
name: default
namespace: default
spec:
planeID: default
clusterAgent:
clientCA:
value: |
$(echo "$CA_CERT" | sed 's/^/ /')
gateway:
ingress:
external:
name: gateway-default
namespace: openchoreo-data-plane
http:
host: "${DP_DOMAIN}"
listenerName: http
port: 80
https:
host: "${DP_DOMAIN}"
listenerName: https
port: 443
secretStoreRef:
name: default
EOF
Step 7: Setup Workflow Plane
Copy the cluster-gateway CA certificate:
kubectl create namespace openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -
CA_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -
TLS_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
--from-literal=tls.crt="$TLS_CRT" \
--from-literal=tls.key="$TLS_KEY" \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -
The Workflow Plane needs a container registry to store built agent images. The registry endpoint is configured in Phase 2 Step 3 (Platform Resources) via the global.registry.endpoint or global.baseDomain Helm values. For local development, deploy an in-cluster docker-registry in this namespace — see the k3d guide for an example.
Install the Workflow Plane:
helm install openchoreo-workflow-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-workflow-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-workflow-plane \
--create-namespace \
--set clusterAgent.tls.generateCerts=true \
--timeout 600s
kubectl wait --for=condition=Available \
deployment --all -n openchoreo-workflow-plane --timeout=600s
Register the Workflow Plane:
BP_CA_CERT=$(kubectl get secret cluster-agent-tls \
-n openchoreo-workflow-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ClusterWorkflowPlane
metadata:
name: default
namespace: default
spec:
planeID: default
clusterAgent:
clientCA:
value: |
$(echo "$BP_CA_CERT" | sed 's/^/ /')
secretStoreRef:
name: default
EOF
Step 8: Setup Observability Plane
Copy the cluster-gateway CA certificate:
kubectl create namespace openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -
CA_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -
TLS_CRT=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
-n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
--from-literal=tls.crt="$TLS_CRT" \
--from-literal=tls.key="$TLS_KEY" \
--from-literal=ca.crt="$CA_CRT" \
-n openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -
Create the ExternalSecrets for OpenSearch and Observer credentials:
kubectl apply -f - <<'EOF'
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: opensearch-admin-credentials
namespace: openchoreo-observability-plane
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: default
target:
name: opensearch-admin-credentials
data:
- secretKey: username
remoteRef:
key: opensearch-username
property: value
- secretKey: password
remoteRef:
key: opensearch-password
property: value
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: observer-secret
namespace: openchoreo-observability-plane
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: default
target:
name: observer-secret
data:
- secretKey: OPENSEARCH_USERNAME
remoteRef:
key: opensearch-username
property: value
- secretKey: OPENSEARCH_PASSWORD
remoteRef:
key: opensearch-password
property: value
- secretKey: UID_RESOLVER_OAUTH_CLIENT_SECRET
remoteRef:
key: observer-oauth-client-secret
property: value
EOF
Wait for the ExternalSecrets to sync:
kubectl wait -n openchoreo-observability-plane \
--for=condition=Ready externalsecret/opensearch-admin-credentials \
externalsecret/observer-secret --timeout=60s
Apply the custom OpenTelemetry Collector ConfigMap (required for trace ingestion):
kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/oc-collector-configmap.yaml \
-n openchoreo-observability-plane
Install the Observability Plane:
helm install openchoreo-observability-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-observability-plane \
--create-namespace \
--set gateway.tls.enabled=false \
--set clusterAgent.tls.generateCerts=true \
--set observer.controlPlaneApiUrl="http://openchoreo-api.openchoreo-control-plane.svc.cluster.local:8080" \
--set observer.extraEnv.AUTH_SERVER_BASE_URL="${THUNDER_PUBLIC_URL}" \
--set security.oidc.jwksUrl="${THUNDER_INTERNAL_URL}/oauth2/jwks" \
--set security.oidc.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
--set-string security.oidc.jwksUrlTlsInsecureSkipVerify=true \
--values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-op.yaml \
--timeout 25m
kubectl wait --for=condition=Available \
deployment --all -n openchoreo-observability-plane --timeout=900s
for sts in $(kubectl get statefulset -n openchoreo-observability-plane -o name 2>/dev/null); do
kubectl rollout status "${sts}" -n openchoreo-observability-plane --timeout=900s
done
Install observability modules (logs, metrics, tracing):
# Logs module
helm upgrade --install observability-logs-opensearch \
oci://ghcr.io/openchoreo/helm-charts/observability-logs-opensearch \
--create-namespace \
--namespace openchoreo-observability-plane \
--version 0.3.8 \
--set openSearchSetup.openSearchSecretName="opensearch-admin-credentials" \
--timeout 10m
# Enable Fluent Bit log collection
helm upgrade observability-logs-opensearch \
oci://ghcr.io/openchoreo/helm-charts/observability-logs-opensearch \
--namespace openchoreo-observability-plane \
--version 0.3.8 \
--reuse-values \
--set fluent-bit.enabled=true \
--timeout 10m
# Metrics module
helm upgrade --install observability-metrics-prometheus \
oci://ghcr.io/openchoreo/helm-charts/observability-metrics-prometheus \
--create-namespace \
--namespace openchoreo-observability-plane \
--version 0.2.4 \
--timeout 10m
# Tracing module (uses the custom OTel Collector ConfigMap)
helm upgrade --install observability-traces-opensearch \
oci://ghcr.io/openchoreo/helm-charts/observability-tracing-opensearch \
--create-namespace \
--namespace openchoreo-observability-plane \
--version 0.3.7 \
--set openSearch.enabled=false \
--set openSearchSetup.openSearchSecretName="opensearch-admin-credentials" \
--set opentelemetry-collector.configMap.existingName="amp-opentelemetry-collector-config" \
--timeout 10m
Configure TLS for the Observability Plane gateway:
OBS_LB_IP=$(kubectl get svc gateway-default -n openchoreo-observability-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$OBS_LB_IP" ]; then
OBS_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-observability-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
OBS_LB_IP=$(dig +short "$OBS_LB_HOSTNAME" | head -1)
fi
export OBS_DOMAIN="observer.${OBS_LB_IP//./-}.nip.io"
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: obs-gateway-tls
namespace: openchoreo-observability-plane
spec:
secretName: obs-gateway-tls
issuerRef:
name: openchoreo-ca
kind: ClusterIssuer
dnsNames:
- "*.${OBS_LB_IP//./-}.nip.io"
- "${OBS_DOMAIN}"
privateKey:
rotationPolicy: Always
EOF
kubectl wait --for=condition=Ready certificate/obs-gateway-tls \
-n openchoreo-observability-plane --timeout=60s
helm upgrade openchoreo-observability-plane \
oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-observability-plane \
--reuse-values \
--set gateway.tls.enabled=true \
--set "gateway.tls.hostname=*.${OBS_LB_IP//./-}.nip.io" \
--set "gateway.tls.certificateRefs[0].name=obs-gateway-tls" \
--timeout 10m
Register the Observability Plane and link it to other planes:
OP_CA_CERT=$(kubectl get secret cluster-agent-tls \
-n openchoreo-observability-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ObservabilityPlane
metadata:
name: default
namespace: default
spec:
planeID: default
clusterAgent:
clientCA:
value: |
$(echo "$OP_CA_CERT" | sed 's/^/ /')
observerURL: http://observer.openchoreo-observability-plane.svc.cluster.local:8080
EOF
# Link Data Plane to Observability
kubectl patch clusterdataplane default -n default --type merge \
-p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'
# Link Workflow Plane to Observability
kubectl patch clusterworkflowplane default -n default --type merge \
-p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'
Step 9: Verify OpenChoreo Installation
Before proceeding to Phase 2, confirm all planes are running:
echo "--- Control Plane ---"
kubectl get pods -n openchoreo-control-plane
echo "--- Data Plane ---"
kubectl get pods -n openchoreo-data-plane
echo "--- Workflow Plane ---"
kubectl get pods -n openchoreo-workflow-plane
echo "--- Observability Plane ---"
kubectl get pods -n openchoreo-observability-plane
echo "--- Thunder ---"
kubectl get pods -n amp-thunder
echo "--- Plane Registrations ---"
kubectl get clusterdataplane,clusterworkflowplane,observabilityplane -n default
All pods should be in Running or Completed state.
Phase 2: Agent Manager Installation
With OpenChoreo and Thunder running, you can now install the Agent Manager components — the API, console, and extensions that provide the AI agent management capabilities.
The Agent Manager installs as a set of Helm charts on top of OpenChoreo. The components fall into two groups based on install order:
- Agent Manager Core : Gateway Operator, Agent Manager and Platform Resources (agent component types, workflow templates etc). Each depends on the one before it.
- Extensions : Secret Management, Observability, Evaluation extensions and the AI Gateway Extension.
Thunder (identity provider) must be installed before proceeding — see the Thunder installation step in Phase 1. The variables THUNDER_PUBLIC_URL, THUNDER_INTERNAL_URL, CONSOLE_PUBLIC_URL, API_PUBLIC_URL, OBS_API_PUBLIC_URL, and INSTRUMENTATION_URL must be set from the Configuration Variables section.
Core Components
Install these in order — each depends on the one before it.
Step 1: Gateway Operator
Manages API Gateway resources and enables secure, authenticated trace ingestion into the Observability Plane.
helm install gateway-operator \
oci://ghcr.io/wso2/api-platform/helm-charts/gateway-operator \
--version 0.5.0 \
--namespace ${DATA_PLANE_NS} \
--set logging.level=debug \
--set gateway.helm.chartVersion=1.0.0 \
--timeout 600s
Wait for the operator to be ready:
kubectl wait --for=condition=Available \
deployment -l app.kubernetes.io/name=gateway-operator \
-n ${DATA_PLANE_NS} --timeout=300s
Apply the Gateway Operator configuration (JWT/JWKS authentication and rate limiting):
kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/api-platform-operator-full-config.yaml
Grant RBAC for WSO2 API Platform CRDs to the Data Plane cluster-agent:
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: wso2-api-platform-gateway-module
rules:
- apiGroups: ["gateway.api-platform.wso2.com"]
resources: ["restapis", "apigateways"]
verbs: ["*"]
- apiGroups: ["gateway.kgateway.dev"]
resources: ["backends"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: wso2-api-platform-gateway-module
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: wso2-api-platform-gateway-module
subjects:
- kind: ServiceAccount
name: cluster-agent-dataplane
namespace: ${DATA_PLANE_NS}
EOF
Deploy the observability gateway and trace API:
kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/obs-gateway.yaml
kubectl wait --for=condition=Programmed \
apigateway/obs-gateway -n ${DATA_PLANE_NS} --timeout=180s
kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/otel-collector-rest-api.yaml
kubectl wait --for=condition=Programmed \
restapi/traces-api-secure -n ${DATA_PLANE_NS} --timeout=120s
kubectl get apigateway obs-gateway -n ${DATA_PLANE_NS}
# STATUS should show "Programmed"
Step 2: Agent Manager (API + Console + PostgreSQL)
The core platform: a Go API server, a React web console, and a PostgreSQL database.
helm install amp \
oci://${HELM_CHART_REGISTRY}/wso2-agent-manager \
--version ${VERSION} \
--namespace ${AMP_NS} \
--create-namespace \
--set console.config.instrumentationUrl="${INSTRUMENTATION_URL}" \
--set console.config.auth.baseUrl="${THUNDER_PUBLIC_URL}" \
--set console.config.auth.signInRedirectURL="${CONSOLE_PUBLIC_URL}/login" \
--set console.config.auth.signOutRedirectURL="${CONSOLE_PUBLIC_URL}/login" \
--set console.config.apiBaseUrl="${API_PUBLIC_URL}" \
--set console.config.obsApiBaseUrl="${OBS_API_PUBLIC_URL}" \
--set agentManagerService.config.keyManager.issuer="${THUNDER_PUBLIC_URL}" \
--set agentManagerService.config.keyManager.jwksUrl="${THUNDER_INTERNAL_URL}/oauth2/jwks" \
--set agentManagerService.config.oidc.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
--set agentManagerService.config.openChoreo.baseURL="${OPENCHOREO_INTERNAL_URL}" \
--timeout 1800s
Wait for all components:
# PostgreSQL
kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 \
statefulset/amp-postgresql -n ${AMP_NS} --timeout=600s
# API server
kubectl wait --for=condition=Available \
deployment/amp-api -n ${AMP_NS} --timeout=600s
# Console
kubectl wait --for=condition=Available \
deployment/amp-console -n ${AMP_NS} --timeout=600s
kubectl get pods -n ${AMP_NS}
# Expected: amp-postgresql-0 (Running), amp-api-xxx (Running), amp-console-xxx (Running)
Step 3: Platform Resources
Creates the default Organization, Project, Environment, DeploymentPipeline, and workflow template resources that the console needs on first login. This chart also configures the container registry endpoint used by build workflows to push agent images.
helm install amp-platform-resources \
oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
--version ${VERSION} \
--namespace ${DEFAULT_NS} \
--timeout 1800s
Container registry configuration
The chart defaults are configured for a local k3d cluster with an in-cluster registry at host.k3d.internal:10082. For other environments, override the registry settings:
# Example: external registry with a base domain
helm install amp-platform-resources \
oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
--version ${VERSION} \
--namespace ${DEFAULT_NS} \
--set global.baseDomain="yourdomain.com" \
--set global.defaultResources.registry.tlsVerify=true \
--timeout 1800s
# Registry endpoint will be: registry.yourdomain.com
# Example: explicit registry endpoint
helm install amp-platform-resources \
oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
--version ${VERSION} \
--namespace ${DEFAULT_NS} \
--set global.registry.endpoint="your-registry.example.com:5000" \
--set global.defaultResources.registry.tlsVerify=true \
--timeout 1800s| Value | Default | Description |
|---|---|---|
global.registry.endpoint | host.k3d.internal:10082 | Registry endpoint for pushing images |
global.baseDomain | "" | When set, registry endpoint becomes registry.<baseDomain> |
global.defaultResources.registry.tlsVerify | false | Enable TLS verification for registry connections |
Extensions
These can be installed in any order after Core is ready.
Step 4: Secrets Extension (OpenBao)
Provides runtime secret injection for deployed agents. Uses OpenBao as the secrets backend.
helm install amp-secrets \
oci://${HELM_CHART_REGISTRY}/wso2-amp-secrets-extension \
--version ${VERSION} \
--namespace ${SECRETS_NS} \
--create-namespace \
--set openbao.server.dev.enabled=true \
--timeout 600s
kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 \
statefulset/amp-secrets-openbao -n ${SECRETS_NS} --timeout=300s
Dev mode uses an in-memory backend — secrets are lost on restart. For production, disable dev mode and configure persistent storage.
Step 5: Observability Extension (Traces Observer)
Deploys the Traces Observer service that queries and serves trace data to the console.
helm install amp-observability-traces \
oci://${HELM_CHART_REGISTRY}/wso2-amp-observability-extension \
--version ${VERSION} \
--namespace ${OBSERVABILITY_NS} \
--timeout 1800s
kubectl wait --for=condition=Available \
deployment/amp-traces-observer -n ${OBSERVABILITY_NS} --timeout=600s
Step 6: Evaluation Extension
Installs workflow templates for running automated evaluations (accuracy, safety, reasoning, tool usage) against agent traces.
helm install amp-evaluation-extension \
oci://${HELM_CHART_REGISTRY}/wso2-amp-evaluation-extension \
--version ${VERSION} \
--namespace ${BUILD_CI_NS} \
--timeout 1800s
The default publisher.apiKey must match publisherApiKey.value in the Agent Manager chart. Both default to amp-internal-api-key.
Step 7: AI Gateway Extension
Registers the AI Gateway with the Agent Manager and deploys the gateway stack. Install this last — it requires the Agent Manager API to be healthy and Thunder to be ready for token exchange.
The gateway.vhost is the URL that deployed agents use to reach the AI Gateway. It must be set to the in-cluster service URL so that agent workloads running inside the cluster can route LLM traffic through the gateway.
helm install amp-ai-gateway \
oci://${HELM_CHART_REGISTRY}/wso2-amp-ai-gateway-extension \
--version ${VERSION} \
--namespace ${DATA_PLANE_NS} \
--set apiGateway.controlPlane.host="amp-api-gateway-manager.${AMP_NS}.svc.cluster.local:9243" \
--set agentManager.apiUrl="http://amp-api.${AMP_NS}.svc.cluster.local:9000/api/v1" \
--set agentManager.idp.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
--set gateway.vhost="http://default-ai-gateway-gateway-runtime.${DATA_PLANE_NS}.svc.cluster.local:8084" \
--timeout 1800s
kubectl wait --for=condition=complete job/amp-gateway-bootstrap \
-n ${DATA_PLANE_NS} --timeout=300s
kubectl get jobs -n ${DATA_PLANE_NS} | grep amp-gateway-bootstrap
# STATUS should show "Complete"
Exposing the AI Gateway publicly
The AI Gateway's LoadBalancer service is already externally reachable if your cluster supports it. To use a public URL as the vhost instead of the in-cluster service URL:
# Get the AI Gateway's external IP
AI_GW_IP=$(kubectl get svc default-ai-gateway-gateway-runtime -n ${DATA_PLANE_NS} \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# Use a nip.io domain or your own DNS record
export AI_GATEWAY_VHOST="http://ai-gateway.${AI_GW_IP//./-}.nip.io:8084"
# Set during install:
# --set gateway.vhost="${AI_GATEWAY_VHOST}"
For production, point a DNS record (e.g., ai-gateway.yourdomain.com) at the LoadBalancer IP and configure TLS termination. Agents running outside the cluster will need this public URL to reach the gateway.
Verify and Access the Platform
Run a full status check to confirm everything is running:
# All pods across key namespaces
kubectl get pods -n openchoreo-control-plane
kubectl get pods -n openchoreo-data-plane
kubectl get pods -n openchoreo-workflow-plane
kubectl get pods -n openchoreo-observability-plane
kubectl get pods -n wso2-amp
kubectl get pods -n amp-thunder
kubectl get pods -n amp-secrets
# Helm releases
helm list -A | grep -E 'openchoreo|amp|gateway'
Via LoadBalancer
| Service | URL |
|---|---|
| OpenChoreo API | https://api.${CP_BASE_DOMAIN} |
Via Port Forwarding (Agent Manager)
# Agent Manager Console
kubectl port-forward -n wso2-amp svc/amp-console 3000:3000 &
# Agent Manager API
kubectl port-forward -n wso2-amp svc/amp-api 9000:9000 &
# Thunder (required for OAuth login)
kubectl port-forward -n amp-thunder svc/amp-thunder-extension-service 8090:8090 &
# Traces Observer
kubectl port-forward -n openchoreo-observability-plane svc/amp-traces-observer 9098:9098 &
# Observability Gateway (HTTP)
kubectl port-forward -n openchoreo-data-plane svc/obs-gateway-gateway-gateway-runtime 22893:22893 &
# AI Gateway (HTTP) — for testing from outside the cluster
kubectl port-forward -n openchoreo-data-plane svc/default-ai-gateway-gateway-runtime 8084:8084 &
After port forwarding:
| Service | URL |
|---|---|
| Agent Manager Console | http://localhost:3000 |
| Agent Manager API | http://localhost:9000 |
| Thunder | http://localhost:8090 |
| Traces Observer | http://localhost:9098 |
| Observability Gateway | http://localhost:22893/otel |
| AI Gateway | http://localhost:8084 |
Default credentials: admin / admin
Cloud Provider Notes
AWS EKS
- LoadBalancers return a hostname instead of an IP — use
digto resolve - For internet-facing access, annotate LoadBalancer services:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing - Ensure security groups allow HTTP/HTTPS traffic
Google Cloud Platform (GKE)
- LoadBalancers return IPs directly — no special handling needed
- Ensure firewall rules allow HTTP/HTTPS traffic to LoadBalancers
Microsoft Azure (AKS)
- LoadBalancers return IPs directly — no special handling needed
- Ensure Network Security Groups allow HTTP/HTTPS traffic
Rancher Desktop / k3s
- Remove Traefik before installation (see Step 1)
- Single-node clusters work for development but may run low on resources with all observability modules
- LoadBalancer IPs are assigned via the built-in k3s servicelb
- cgroup
pidscontroller issue — see Build workflow fails with cgroup pids error in Troubleshooting
Cleanup
Remove all Agent Manager and OpenChoreo resources:
# 1. Delete plane registrations
kubectl delete clusterdataplane default -n default
kubectl delete clusterworkflowplane default -n default
kubectl delete observabilityplane default -n default
# 2. Uninstall all Helm releases
helm uninstall amp -n wso2-amp
helm uninstall amp-ai-gateway -n openchoreo-data-plane
helm uninstall amp-thunder-extension -n amp-thunder
helm uninstall amp-secrets -n amp-secrets
helm uninstall amp-observability-traces -n openchoreo-observability-plane
helm uninstall amp-evaluation-extension -n openchoreo-workflow-plane
helm uninstall amp-platform-resources -n default
helm uninstall gateway-operator -n openchoreo-data-plane
helm uninstall openchoreo-observability-plane -n openchoreo-observability-plane
helm uninstall openchoreo-workflow-plane -n openchoreo-workflow-plane
helm uninstall openchoreo-data-plane -n openchoreo-data-plane
helm uninstall openchoreo-control-plane -n openchoreo-control-plane
helm uninstall openbao -n openbao
helm uninstall external-secrets -n external-secrets
helm uninstall cert-manager -n cert-manager
# 3. Delete namespaces
kubectl delete namespace wso2-amp amp-thunder amp-secrets \
openchoreo-observability-plane openchoreo-workflow-plane \
openchoreo-data-plane openchoreo-control-plane \
openbao external-secrets cert-manager
Production Considerations
This installation is designed for development and exploration. For production:
- Use proper domains — Replace nip.io with registered domain names and configure DNS
- Wildcard TLS certificates — Use DNS-01 validation for wildcard certificates from a trusted CA
- Identity provider — Replace Thunder dev mode with a proper IdP (Asgardeo, Auth0, Okta)
- Thunder URL — Set
THUNDER_PUBLIC_URLto a publicly accessible domain with proper TLS - Secrets backend — Disable OpenBao dev mode; configure persistent storage and proper auth
- Observability storage — Configure persistent volumes for OpenSearch
- High availability — Deploy multiple replicas across availability zones
- Resource sizing — Adjust requests/limits based on workload
- Security hardening — Apply network policies, RBAC, pod security standards
Troubleshooting
LoadBalancer not getting external IP
kubectl describe svc <service-name> -n <namespace>
For EKS, ensure the AWS Load Balancer Controller is installed and the service has the correct annotations.
On k3s/Rancher Desktop, check if another service (like Traefik) is already using the required ports:
kubectl get svc -A --field-selector spec.type=LoadBalancer
Certificate not being issued
kubectl describe certificate <cert-name> -n <namespace>
kubectl get clusterissuers
kubectl get certificaterequests -n <namespace>
Plane registration issues
kubectl get clusterdataplane default -n default -o yaml
kubectl logs -n openchoreo-control-plane -l app.kubernetes.io/name=openchoreo-control-plane
Agent Manager API returns 401 for environment/gateway calls
This typically means the OpenChoreo Control Plane's OIDC issuer does not match the iss claim in Thunder-issued JWTs. Verify:
# Check what issuer Thunder puts in tokens
kubectl exec -n amp-thunder deploy/amp-thunder-extension-deployment -- \
wget -qO- http://localhost:8090/.well-known/openid-configuration 2>/dev/null \
| grep -o '"issuer":"[^"]*"'
# Check what the Control Plane expects
kubectl get configmap openchoreo-api-config -n openchoreo-control-plane -o yaml \
| grep issuer
Both must match exactly. If they don't, update the Control Plane's security.oidc.issuer to match Thunder's issuer.
Console shows "refused to connect" on login
The console redirects to Thunder for OAuth login. Thunder must be accessible from the browser at the URL configured in THUNDER_PUBLIC_URL. For port-forwarding setups, ensure Thunder is forwarded:
kubectl port-forward -n amp-thunder svc/amp-thunder-extension-service 8090:8090 &
If you need to change Thunder's public URL after installation, you must uninstall, delete the PVC, and reinstall:
helm uninstall amp-thunder-extension -n amp-thunder
kubectl delete pvc -n amp-thunder --all
# Then reinstall with the new THUNDER_PUBLIC_URL
OpenSearch connectivity issues
kubectl get pods -n openchoreo-observability-plane -l app=opensearch
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl -v http://opensearch.openchoreo-observability-plane.svc.cluster.local:9200
Build workflow fails with cgroup pids error (Rancher Desktop)
If the build workflow fails with:
Error: OCI runtime error: crun: the requested cgroup controller `pids` is not available
Error: exit status 126
This happens on Rancher Desktop because the underlying Lima VM (Alpine Linux) does not delegate the pids cgroup controller to containers. The Podman containers inside the build workflow cannot create the required cgroup namespace.
Fix: Patch the ClusterWorkflowTemplates that use Podman to inject a containers.conf that disables cgroup management.
The patch commands below require python3 to be installed on your machine.
Run these commands to patch each template:
# Patch gcp-buildpacks-build (build-image step)
kubectl get clusterworkflowtemplate gcp-buildpacks-build -o json | \
python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''set -e
# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf
'''
data['spec']['templates'][0]['container']['args'][0] = script.replace('set -e\n', fix, 1)
json.dump(data, sys.stdout)
" | kubectl apply -f -
# Patch publish-image
kubectl get clusterworkflowtemplate publish-image -o json | \
python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''set -e
# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf
'''
data['spec']['templates'][0]['container']['args'][0] = script.replace('set -e\n', fix, 1)
json.dump(data, sys.stdout)
" | kubectl apply -f -
# Patch amp-generate-workload
kubectl get clusterworkflowtemplate amp-generate-workload -o json | \
python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf
'''
data['spec']['templates'][0]['container']['args'][0] = fix + script
json.dump(data, sys.stdout)
" | kubectl apply -f -
After patching, re-trigger the build workflow. These patches are applied in-cluster and will be overwritten if the Helm chart (amp-platform-resources) is reinstalled.
This issue affects Rancher Desktop specifically because it runs k3s inside a Lima VM with Alpine Linux, which uses OpenRC instead of systemd. The pids cgroup controller is not delegated to containers by default. Other Kubernetes distributions (EKS, GKE, AKS) are not affected.