Copy page

Version: v0.12.x

On Your Environment

Install the Agent Manager on an existing Kubernetes cluster — AWS EKS, Google GKE, Azure AKS, or any distribution with LoadBalancer support.

Just want it running locally?

The Quick Start Guide installs everything in a single command using a dev container with k3d. Use this page when you need to install on a managed cluster.

What You Will Get

Agent Manager is a two-layer system installed in two phases:

Phase 1 — OpenChoreo (base layer): OpenChoreo is an open-source platform that provides the Kubernetes infrastructure Agent Manager runs on. It consists of four planes: a Control Plane for API and configuration, a Data Plane for running workloads and gateways, a Workflow Plane for builds and CI pipelines, and an Observability Plane for traces, logs, and metrics via OpenSearch.
Phase 2 — Agent Manager : The AI agent management platform installed on top of OpenChoreo. It includes the Console (web UI), AMP API (backend), AI Gateway, PostgreSQL (database), Secrets Extension (OpenBao for runtime secret injection), Traces Observer (trace querying), and Evaluation Engine (automated agent evaluations).

This guide installs both layers on your existing Kubernetes cluster.

info

This setup is for development and exploration. For production deployments, see the Production Considerations section.

Prerequisites

Cluster Requirements

Requirement	Minimum
Kubernetes version	1.32+
Nodes	3
CPU per node	4 cores
RAM per node	8 GB
LoadBalancer support	Required
Default StorageClass	Required

Supported Providers

Amazon Web Services (EKS)
Google Cloud Platform (GKE)
Microsoft Azure (AKS)
Any Kubernetes distribution with LoadBalancer support

Required Tools

Tool	Version	Purpose
kubectl	v1.32+	Kubernetes CLI
Helm	v3.12+	Kubernetes package manager
curl / dig	—	DNS resolution of LoadBalancer hostnames

kubectl version --client && helm version

Permissions

You need sufficient privileges to:

Create namespaces, deploy Helm charts
Create LoadBalancer services
Manage cert-manager Issuers and Certificates
Create CRDs and ClusterRoles

Configuration Variables

Set these once before starting the installation. Most subsequent commands reference these variables, though some examples (troubleshooting, uninstall) use literal values — substitute the corresponding variable if you have customised the defaults.

export VERSION="0.12.0"
export HELM_CHART_REGISTRY="ghcr.io/wso2"
export AMP_NS="wso2-amp"
export BUILD_CI_NS="openchoreo-workflow-plane"
export OBSERVABILITY_NS="openchoreo-observability-plane"
export DEFAULT_NS="default"
export DATA_PLANE_NS="openchoreo-data-plane"
export SECRETS_NS="amp-secrets"
export THUNDER_NS="amp-thunder"

OpenChoreo API URL — used by backend services to reach the OpenChoreo Control Plane API:

export OPENCHOREO_INTERNAL_URL="http://openchoreo-api.openchoreo-control-plane.svc.cluster.local:8080"

Thunder (Identity Provider) URLs — Thunder must be reachable from the browser for OAuth login. Set THUNDER_PUBLIC_URL to the URL where the browser will access Thunder:

# Port-forwarding (default for dev):
export THUNDER_PUBLIC_URL="http://localhost:8090"

# Public deployment example:
#   export THUNDER_PUBLIC_URL="https://thunder.yourdomain.com"

# In-cluster URL (used by backend services for JWKS/token calls):
export THUNDER_INTERNAL_URL="http://amp-thunder-extension-service.${THUNDER_NS}.svc.cluster.local:8090"

Console URLs — Set to how the browser will reach the console and API:

# Port-forwarding (default for dev):
export CONSOLE_PUBLIC_URL="http://localhost:3000"
export API_PUBLIC_URL="http://localhost:9000"
export OBS_API_PUBLIC_URL="http://localhost:9098"
export INSTRUMENTATION_URL="http://localhost:22893/otel"

# Public deployment example:
#   export CONSOLE_PUBLIC_URL="https://console.yourdomain.com"
#   export API_PUBLIC_URL="https://api.yourdomain.com"
#   export OBS_API_PUBLIC_URL="https://obs.yourdomain.com"
#   export INSTRUMENTATION_URL="https://otel.yourdomain.com/otel"

Phase 1: OpenChoreo Platform

OpenChoreo organises its infrastructure into four planes, each handling a different concern:

Control Plane — API server and configuration management for the platform
Data Plane — runs deployed workloads and API gateways
Workflow Plane — builds and CI pipelines for agent deployments
Observability Plane — trace, log, and metrics collection via OpenSearch

This phase also installs Thunder (the identity provider) as a prerequisite, since the Control Plane and Observability Plane require Thunder's OIDC endpoints for JWT validation. Estimated time: ~20-30 minutes (varies by cluster and network).

Step 1: Install Cluster Prerequisites

Gateway API CRDs (v1.4.1) — standard Kubernetes resources for managing network gateways and routing:

kubectl apply --server-side --force-conflicts \
  -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.1/experimental-install.yaml

info

The --force-conflicts flag is needed if your cluster already has Gateway API CRDs managed by another controller (e.g., Traefik on k3s/Rancher Desktop).

cert-manager (v1.19.2) — automates TLS certificate issuance and renewal:

helm upgrade --install cert-manager oci://quay.io/jetstack/charts/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.19.2 \
  --set crds.enabled=true \
  --set startupapicheck.timeout=5m \
  --wait --timeout 360s

External Secrets Operator (v1.3.2) — syncs secrets from external stores (like OpenBao) into Kubernetes:

helm upgrade --install external-secrets oci://ghcr.io/external-secrets/charts/external-secrets \
  --namespace external-secrets \
  --create-namespace \
  --version 1.3.2 \
  --set installCRDs=true \
  --wait --timeout 180s

kgateway (v2.2.1) — the network gateway for OpenChoreo planes:

helm upgrade --install kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds \
  --create-namespace \
  --namespace openchoreo-control-plane \
  --version v2.2.1

helm upgrade --install kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
  --namespace openchoreo-control-plane \
  --create-namespace \
  --version v2.2.1 \
  --set controller.extraEnv.KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES=true

Rancher Desktop / k3s users

k3s ships with Traefik which binds to host ports 80/443 and conflicts with OpenChoreo's kgateway. Remove Traefik before proceeding:

helm uninstall traefik -n kube-system
helm uninstall traefik-crd -n kube-system

After removing Traefik, re-apply the Gateway API CRDs (Traefik's CRD chart may have removed them):

kubectl apply --server-side --force-conflicts \
  -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.1/experimental-install.yaml

Step 2: Setup Secrets Store (OpenBao)

OpenBao provides secret management for the Workflow Plane and deployed agents:

helm upgrade --install openbao oci://ghcr.io/openbao/charts/openbao \
  --namespace openbao \
  --create-namespace \
  --version 0.25.6 \
  --values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-openbao.yaml \
  --timeout 180s

kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=openbao -n openbao --timeout=120s

Configure the External Secrets ClusterSecretStore:

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-secrets-openbao
  namespace: openbao
---
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: default
spec:
  provider:
    vault:
      server: "http://openbao.openbao.svc:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "openchoreo-secret-writer-role"
          serviceAccountRef:
            name: "external-secrets-openbao"
            namespace: "openbao"
EOF

warning

OpenBao is installed in dev mode (in-memory backend) for this guide. For production, disable dev mode and configure persistent storage.

Step 3: Setup TLS

Create a self-signed CA chain for cluster-wide certificate issuance:

kubectl apply -f - <<'EOF'
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-bootstrap
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: openchoreo-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: openchoreo-ca
  secretName: openchoreo-ca-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-bootstrap
    kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: openchoreo-ca
spec:
  ca:
    secretName: openchoreo-ca-secret
EOF

kubectl wait --for=condition=Ready certificate/openchoreo-ca -n cert-manager --timeout=60s

info

For production, replace the self-signed CA with a trusted certificate authority (Let's Encrypt, AWS ACM, etc.).

Step 4: Install Thunder Extension (Identity Provider)

Thunder provides authentication and user management for the entire platform — login, API keys, and OAuth token exchange. It must be installed before the Control Plane because the Control Plane, Observability Plane, and Agent Manager all validate JWTs issued by Thunder.

helm install amp-thunder-extension \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-thunder-extension \
  --version ${VERSION} \
  --namespace ${THUNDER_NS} \
  --create-namespace \
  --set thunder.configuration.server.publicUrl="${THUNDER_PUBLIC_URL}" \
  --set thunder.configuration.jwt.issuer="${THUNDER_PUBLIC_URL}" \
  --set thunder.configuration.gateClient.hostname="localhost" \
  --set thunder.configuration.gateClient.port=8090 \
  --timeout 1800s

kubectl wait --for=condition=Available \
  deployment -l app.kubernetes.io/instance=amp-thunder-extension \
  -n ${THUNDER_NS} --timeout=300s

warning

Thunder persists its configuration (including the issuer URL) in a database on first boot. If you need to change THUNDER_PUBLIC_URL after installation, you must uninstall the chart, delete its PVC, and reinstall — a helm upgrade alone will not change the issuer in issued tokens.

Verify

kubectl exec -n ${THUNDER_NS} deploy/amp-thunder-extension-deployment -- \
  wget -qO- http://localhost:8090/.well-known/openid-configuration 2>/dev/null \
  | grep -o '"issuer":"[^"]*"'
# Expected: "issuer":"${THUNDER_PUBLIC_URL}"  (must match your THUNDER_PUBLIC_URL value)

Step 5: Install OpenChoreo Control Plane

Do an initial install with placeholder hostnames to provision the LoadBalancer:

helm upgrade --install openchoreo-control-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-control-plane \
  --create-namespace \
  --values - <<'EOF'
openchoreoApi:
  http:
    hostnames:
      - "api.placeholder.tld"
backstage:
  enabled: false
  baseUrl: ""
  http:
    hostnames:
      - ""
security:
  oidc:
    issuer: "https://thunder.placeholder.tld"
gateway:
  tls:
    enabled: false
EOF

Wait for the LoadBalancer IP and derive the base domain:

echo "Waiting for LoadBalancer IP..."
kubectl get svc gateway-default -n openchoreo-control-plane -w

Once the IP appears, set the domain:

CP_LB_IP=$(kubectl get svc gateway-default -n openchoreo-control-plane \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$CP_LB_IP" ]; then
  CP_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-control-plane \
    -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
  CP_LB_IP=$(dig +short "$CP_LB_HOSTNAME" | head -1)
fi
export CP_BASE_DOMAIN="openchoreo.${CP_LB_IP//./-}.nip.io"
echo "Control Plane domain: ${CP_BASE_DOMAIN}"

EKS Users

EKS LoadBalancers return a hostname instead of an IP. Use dig to resolve it. For internet-facing access, add annotation: service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

Create a wildcard TLS certificate:

kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cp-gateway-tls
  namespace: openchoreo-control-plane
spec:
  secretName: cp-gateway-tls
  issuerRef:
    name: openchoreo-ca
    kind: ClusterIssuer
  dnsNames:
    - "*.${CP_BASE_DOMAIN}"
    - "${CP_BASE_DOMAIN}"
  privateKey:
    rotationPolicy: Always
EOF

kubectl wait --for=condition=Ready certificate/cp-gateway-tls \
  -n openchoreo-control-plane --timeout=60s

Reconfigure with real hostnames, TLS, and Thunder OIDC:

helm upgrade openchoreo-control-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-control-plane \
  --reuse-values \
  --values - <<EOF
openchoreoApi:
  config:
    server:
      publicUrl: "https://api.${CP_BASE_DOMAIN}"
    security:
      authentication:
        jwt:
          jwks:
            skip_tls_verify: true
  http:
    hostnames:
      - "api.${CP_BASE_DOMAIN}"
backstage:
  enabled: false
  baseUrl: ""
  http:
    hostnames:
      - ""
security:
  oidc:
    issuer: "${THUNDER_PUBLIC_URL}"
    wellKnownEndpoint: "${THUNDER_INTERNAL_URL}/.well-known/openid-configuration"
    jwksUrl: "${THUNDER_INTERNAL_URL}/oauth2/jwks"
    authorizationUrl: "${THUNDER_PUBLIC_URL}/oauth2/authorize"
    tokenUrl: "${THUNDER_INTERNAL_URL}/oauth2/token"
gateway:
  tls:
    enabled: true
    hostname: "*.${CP_BASE_DOMAIN}"
    certificateRefs:
      - name: cp-gateway-tls
EOF

kubectl wait --for=condition=Available \
  deployment --all -n openchoreo-control-plane --timeout=300s

warning

skip_tls_verify: true disables JWKS TLS certificate validation. This is required here because the self-signed CA is not yet trusted by the Control Plane. For production, use CA-signed certificates and set skip_tls_verify: false (or remove the override entirely).

What the configuration does

Backstage disabled (AMP provides its own console)
OIDC issuer set to THUNDER_PUBLIC_URL (matches the iss claim in Thunder-issued JWTs)
OIDC JWKS URL points to Thunder's in-cluster service (avoids external DNS dependency)
OpenChoreo API at api.${CP_BASE_DOMAIN}
TLS enabled with wildcard certificate

Step 6: Setup Data Plane

Copy the cluster-gateway CA certificate:

kubectl create namespace openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -

CA_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -

TLS_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
  --from-literal=tls.crt="$TLS_CRT" \
  --from-literal=tls.key="$TLS_KEY" \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-data-plane --dry-run=client -o yaml | kubectl apply -f -

Install the Data Plane:

helm install openchoreo-data-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-data-plane \
  --create-namespace \
  --set gateway.tls.enabled=false \
  --set clusterAgent.tls.generateCerts=true \
  --values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-dp.yaml

Wait for the Data Plane LoadBalancer and configure TLS:

kubectl get svc gateway-default -n openchoreo-data-plane -w

DP_LB_IP=$(kubectl get svc gateway-default -n openchoreo-data-plane \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$DP_LB_IP" ]; then
  DP_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-data-plane \
    -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
  DP_LB_IP=$(dig +short "$DP_LB_HOSTNAME" | head -1)
fi
export DP_DOMAIN="apps.openchoreo.${DP_LB_IP//./-}.nip.io"
echo "Data Plane domain: ${DP_DOMAIN}"

kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: dp-gateway-tls
  namespace: openchoreo-data-plane
spec:
  secretName: dp-gateway-tls
  issuerRef:
    name: openchoreo-ca
    kind: ClusterIssuer
  dnsNames:
    - "*.${DP_DOMAIN}"
    - "${DP_DOMAIN}"
  privateKey:
    rotationPolicy: Always
EOF

kubectl wait --for=condition=Ready certificate/dp-gateway-tls \
  -n openchoreo-data-plane --timeout=60s

helm upgrade openchoreo-data-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-data-plane \
  --reuse-values \
  --values - <<EOF
gateway:
  tls:
    enabled: true
    hostname: "*.${DP_DOMAIN}"
    certificateRefs:
      - name: dp-gateway-tls
EOF

kubectl wait --for=condition=Available \
  deployment --all -n openchoreo-data-plane --timeout=600s

CA_CERT=$(kubectl get secret cluster-agent-tls \
  -n openchoreo-data-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)

kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ClusterDataPlane
metadata:
  name: default
  namespace: default
spec:
  planeID: default
  clusterAgent:
    clientCA:
      value: |
$(echo "$CA_CERT" | sed 's/^/        /')
  gateway:
    ingress:
      external:
        name: gateway-default
        namespace: openchoreo-data-plane
        http:
          host: "${DP_DOMAIN}"
          listenerName: http
          port: 80
        https:
          host: "${DP_DOMAIN}"
          listenerName: https
          port: 443
  secretStoreRef:
    name: default
EOF

Step 7: Setup Workflow Plane

Copy the cluster-gateway CA certificate:

kubectl create namespace openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -

CA_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -

TLS_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
  --from-literal=tls.crt="$TLS_CRT" \
  --from-literal=tls.key="$TLS_KEY" \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-workflow-plane --dry-run=client -o yaml | kubectl apply -f -

Container Registry

The Workflow Plane needs a container registry to store built agent images. The registry endpoint is configured in Phase 2 Step 3 (Platform Resources) via the global.registry.endpoint or global.baseDomain Helm values. For local development, deploy an in-cluster docker-registry in this namespace — see the k3d guide for an example.

Install the Workflow Plane:

helm install openchoreo-workflow-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-workflow-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-workflow-plane \
  --create-namespace \
  --set clusterAgent.tls.generateCerts=true \
  --timeout 600s

kubectl wait --for=condition=Available \
  deployment --all -n openchoreo-workflow-plane --timeout=600s

BP_CA_CERT=$(kubectl get secret cluster-agent-tls \
  -n openchoreo-workflow-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)

kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ClusterWorkflowPlane
metadata:
  name: default
  namespace: default
spec:
  planeID: default
  clusterAgent:
    clientCA:
      value: |
$(echo "$BP_CA_CERT" | sed 's/^/        /')
  secretStoreRef:
    name: default
EOF

Step 8: Setup Observability Plane

Copy the cluster-gateway CA certificate:

kubectl create namespace openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -

CA_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
kubectl create configmap cluster-gateway-ca \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -

TLS_CRT=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.crt}' | base64 -d)
TLS_KEY=$(kubectl get secret cluster-gateway-ca \
  -n openchoreo-control-plane -o jsonpath='{.data.tls\.key}' | base64 -d)
kubectl create secret generic cluster-gateway-ca \
  --from-literal=tls.crt="$TLS_CRT" \
  --from-literal=tls.key="$TLS_KEY" \
  --from-literal=ca.crt="$CA_CRT" \
  -n openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -

Create the ExternalSecrets for OpenSearch and Observer credentials:

kubectl apply -f - <<'EOF'
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: opensearch-admin-credentials
  namespace: openchoreo-observability-plane
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: default
  target:
    name: opensearch-admin-credentials
  data:
  - secretKey: username
    remoteRef:
      key: opensearch-username
      property: value
  - secretKey: password
    remoteRef:
      key: opensearch-password
      property: value
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: observer-secret
  namespace: openchoreo-observability-plane
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: default
  target:
    name: observer-secret
  data:
  - secretKey: OPENSEARCH_USERNAME
    remoteRef:
      key: opensearch-username
      property: value
  - secretKey: OPENSEARCH_PASSWORD
    remoteRef:
      key: opensearch-password
      property: value
  - secretKey: UID_RESOLVER_OAUTH_CLIENT_SECRET
    remoteRef:
      key: observer-oauth-client-secret
      property: value
EOF

Wait for the ExternalSecrets to sync:

kubectl wait -n openchoreo-observability-plane \
  --for=condition=Ready externalsecret/opensearch-admin-credentials \
  externalsecret/observer-secret --timeout=60s

Apply the custom OpenTelemetry Collector ConfigMap (required for trace ingestion):

kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/oc-collector-configmap.yaml \
  -n openchoreo-observability-plane

Install the Observability Plane:

helm install openchoreo-observability-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-observability-plane \
  --create-namespace \
  --set gateway.tls.enabled=false \
  --set clusterAgent.tls.generateCerts=true \
  --set observer.controlPlaneApiUrl="http://openchoreo-api.openchoreo-control-plane.svc.cluster.local:8080" \
  --set observer.extraEnv.AUTH_SERVER_BASE_URL="${THUNDER_PUBLIC_URL}" \
  --set security.oidc.jwksUrl="${THUNDER_INTERNAL_URL}/oauth2/jwks" \
  --set security.oidc.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
  --set-string security.oidc.jwksUrlTlsInsecureSkipVerify=true \
  --values https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/single-cluster/values-op.yaml \
  --timeout 25m

kubectl wait --for=condition=Available \
  deployment --all -n openchoreo-observability-plane --timeout=900s

for sts in $(kubectl get statefulset -n openchoreo-observability-plane -o name 2>/dev/null); do
  kubectl rollout status "${sts}" -n openchoreo-observability-plane --timeout=900s
done

Install observability modules (logs, metrics, tracing):

# Logs module
helm upgrade --install observability-logs-opensearch \
  oci://ghcr.io/openchoreo/helm-charts/observability-logs-opensearch \
  --create-namespace \
  --namespace openchoreo-observability-plane \
  --version 0.3.8 \
  --set openSearchSetup.openSearchSecretName="opensearch-admin-credentials" \
  --timeout 10m

# Enable Fluent Bit log collection
helm upgrade observability-logs-opensearch \
  oci://ghcr.io/openchoreo/helm-charts/observability-logs-opensearch \
  --namespace openchoreo-observability-plane \
  --version 0.3.8 \
  --reuse-values \
  --set fluent-bit.enabled=true \
  --timeout 10m

# Metrics module
helm upgrade --install observability-metrics-prometheus \
  oci://ghcr.io/openchoreo/helm-charts/observability-metrics-prometheus \
  --create-namespace \
  --namespace openchoreo-observability-plane \
  --version 0.2.4 \
  --timeout 10m

# Tracing module (uses the custom OTel Collector ConfigMap)
helm upgrade --install observability-traces-opensearch \
  oci://ghcr.io/openchoreo/helm-charts/observability-tracing-opensearch \
  --create-namespace \
  --namespace openchoreo-observability-plane \
  --version 0.3.7 \
  --set openSearch.enabled=false \
  --set openSearchSetup.openSearchSecretName="opensearch-admin-credentials" \
  --set opentelemetry-collector.configMap.existingName="amp-opentelemetry-collector-config" \
  --timeout 10m

Configure TLS for the Observability Plane gateway:

OBS_LB_IP=$(kubectl get svc gateway-default -n openchoreo-observability-plane \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -z "$OBS_LB_IP" ]; then
  OBS_LB_HOSTNAME=$(kubectl get svc gateway-default -n openchoreo-observability-plane \
    -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
  OBS_LB_IP=$(dig +short "$OBS_LB_HOSTNAME" | head -1)
fi
export OBS_DOMAIN="observer.${OBS_LB_IP//./-}.nip.io"

kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: obs-gateway-tls
  namespace: openchoreo-observability-plane
spec:
  secretName: obs-gateway-tls
  issuerRef:
    name: openchoreo-ca
    kind: ClusterIssuer
  dnsNames:
    - "*.${OBS_LB_IP//./-}.nip.io"
    - "${OBS_DOMAIN}"
  privateKey:
    rotationPolicy: Always
EOF

kubectl wait --for=condition=Ready certificate/obs-gateway-tls \
  -n openchoreo-observability-plane --timeout=60s

helm upgrade openchoreo-observability-plane \
  oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
  --version 1.0.0-rc.1 \
  --namespace openchoreo-observability-plane \
  --reuse-values \
  --set gateway.tls.enabled=true \
  --set "gateway.tls.hostname=*.${OBS_LB_IP//./-}.nip.io" \
  --set "gateway.tls.certificateRefs[0].name=obs-gateway-tls" \
  --timeout 10m

OP_CA_CERT=$(kubectl get secret cluster-agent-tls \
  -n openchoreo-observability-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)

kubectl apply -f - <<EOF
apiVersion: openchoreo.dev/v1alpha1
kind: ObservabilityPlane
metadata:
  name: default
  namespace: default
spec:
  planeID: default
  clusterAgent:
    clientCA:
      value: |
$(echo "$OP_CA_CERT" | sed 's/^/        /')
  observerURL: http://observer.openchoreo-observability-plane.svc.cluster.local:8080
EOF

# Link Data Plane to Observability
kubectl patch clusterdataplane default -n default --type merge \
  -p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'

# Link Workflow Plane to Observability
kubectl patch clusterworkflowplane default -n default --type merge \
  -p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'

Step 9: Verify OpenChoreo Installation

Before proceeding to Phase 2, confirm all planes are running:

echo "--- Control Plane ---"
kubectl get pods -n openchoreo-control-plane
echo "--- Data Plane ---"
kubectl get pods -n openchoreo-data-plane
echo "--- Workflow Plane ---"
kubectl get pods -n openchoreo-workflow-plane
echo "--- Observability Plane ---"
kubectl get pods -n openchoreo-observability-plane
echo "--- Thunder ---"
kubectl get pods -n amp-thunder
echo "--- Plane Registrations ---"
kubectl get clusterdataplane,clusterworkflowplane,observabilityplane -n default

All pods should be in Running or Completed state.

Phase 2: Agent Manager Installation

With OpenChoreo and Thunder running, you can now install the Agent Manager components — the API, console, and extensions that provide the AI agent management capabilities.

The Agent Manager installs as a set of Helm charts on top of OpenChoreo. The components fall into two groups based on install order:

Agent Manager Core : Gateway Operator, Agent Manager and Platform Resources (agent component types, workflow templates etc). Each depends on the one before it.
Extensions : Secret Management, Observability, Evaluation extensions and the AI Gateway Extension.

Prerequisites

Thunder (identity provider) must be installed before proceeding — see the Thunder installation step in Phase 1. The variables THUNDER_PUBLIC_URL, THUNDER_INTERNAL_URL, CONSOLE_PUBLIC_URL, API_PUBLIC_URL, OBS_API_PUBLIC_URL, and INSTRUMENTATION_URL must be set from the Configuration Variables section.

Core Components

Install these in order — each depends on the one before it.

Step 1: Gateway Operator

Manages API Gateway resources and enables secure, authenticated trace ingestion into the Observability Plane.

helm install gateway-operator \
  oci://ghcr.io/wso2/api-platform/helm-charts/gateway-operator \
  --version 0.5.0 \
  --namespace ${DATA_PLANE_NS} \
  --set logging.level=debug \
  --set gateway.helm.chartVersion=1.0.0 \
  --timeout 600s

Wait for the operator to be ready:

kubectl wait --for=condition=Available \
  deployment -l app.kubernetes.io/name=gateway-operator \
  -n ${DATA_PLANE_NS} --timeout=300s

Apply the Gateway Operator configuration (JWT/JWKS authentication and rate limiting):

kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/api-platform-operator-full-config.yaml

Grant RBAC for WSO2 API Platform CRDs to the Data Plane cluster-agent:

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: wso2-api-platform-gateway-module
rules:
  - apiGroups: ["gateway.api-platform.wso2.com"]
    resources: ["restapis", "apigateways"]
    verbs: ["*"]
  - apiGroups: ["gateway.kgateway.dev"]
    resources: ["backends"]
    verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: wso2-api-platform-gateway-module
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: wso2-api-platform-gateway-module
subjects:
  - kind: ServiceAccount
    name: cluster-agent-dataplane
    namespace: ${DATA_PLANE_NS}
EOF

Deploy the observability gateway and trace API:

kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/obs-gateway.yaml

kubectl wait --for=condition=Programmed \
  apigateway/obs-gateway -n ${DATA_PLANE_NS} --timeout=180s

kubectl apply -f https://raw.githubusercontent.com/wso2/agent-manager/amp/v${VERSION}/deployments/values/otel-collector-rest-api.yaml

kubectl wait --for=condition=Programmed \
  restapi/traces-api-secure -n ${DATA_PLANE_NS} --timeout=120s

Verify

kubectl get apigateway obs-gateway -n ${DATA_PLANE_NS}
# STATUS should show "Programmed"

Step 2: Agent Manager (API + Console + PostgreSQL)

The core platform: a Go API server, a React web console, and a PostgreSQL database.

helm install amp \
  oci://${HELM_CHART_REGISTRY}/wso2-agent-manager \
  --version ${VERSION} \
  --namespace ${AMP_NS} \
  --create-namespace \
  --set console.config.instrumentationUrl="${INSTRUMENTATION_URL}" \
  --set console.config.auth.baseUrl="${THUNDER_PUBLIC_URL}" \
  --set console.config.auth.signInRedirectURL="${CONSOLE_PUBLIC_URL}/login" \
  --set console.config.auth.signOutRedirectURL="${CONSOLE_PUBLIC_URL}/login" \
  --set console.config.apiBaseUrl="${API_PUBLIC_URL}" \
  --set console.config.obsApiBaseUrl="${OBS_API_PUBLIC_URL}" \
  --set agentManagerService.config.keyManager.issuer="${THUNDER_PUBLIC_URL}" \
  --set agentManagerService.config.keyManager.jwksUrl="${THUNDER_INTERNAL_URL}/oauth2/jwks" \
  --set agentManagerService.config.oidc.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
  --set agentManagerService.config.openChoreo.baseURL="${OPENCHOREO_INTERNAL_URL}" \
  --timeout 1800s

Wait for all components:

# PostgreSQL
kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 \
  statefulset/amp-postgresql -n ${AMP_NS} --timeout=600s

# API server
kubectl wait --for=condition=Available \
  deployment/amp-api -n ${AMP_NS} --timeout=600s

# Console
kubectl wait --for=condition=Available \
  deployment/amp-console -n ${AMP_NS} --timeout=600s

Verify

kubectl get pods -n ${AMP_NS}
# Expected: amp-postgresql-0 (Running), amp-api-xxx (Running), amp-console-xxx (Running)

Step 3: Platform Resources

Creates the default Organization, Project, Environment, DeploymentPipeline, and workflow template resources that the console needs on first login. This chart also configures the container registry endpoint used by build workflows to push agent images.

helm install amp-platform-resources \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
  --version ${VERSION} \
  --namespace ${DEFAULT_NS} \
  --timeout 1800s

Container registry configuration

The chart defaults are configured for a local k3d cluster with an in-cluster registry at host.k3d.internal:10082. For other environments, override the registry settings:

# Example: external registry with a base domain
helm install amp-platform-resources \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
  --version ${VERSION} \
  --namespace ${DEFAULT_NS} \
  --set global.baseDomain="yourdomain.com" \
  --set global.defaultResources.registry.tlsVerify=true \
  --timeout 1800s
# Registry endpoint will be: registry.yourdomain.com

# Example: explicit registry endpoint
helm install amp-platform-resources \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-platform-resources-extension \
  --version ${VERSION} \
  --namespace ${DEFAULT_NS} \
  --set global.registry.endpoint="your-registry.example.com:5000" \
  --set global.defaultResources.registry.tlsVerify=true \
  --timeout 1800s

Value	Default	Description
`global.registry.endpoint`	`host.k3d.internal:10082`	Registry endpoint for pushing images
`global.baseDomain`	`""`	When set, registry endpoint becomes `registry.<baseDomain>`
`global.defaultResources.registry.tlsVerify`	`false`	Enable TLS verification for registry connections

Extensions

These can be installed in any order after Core is ready.

Step 4: Secrets Extension (OpenBao)

Provides runtime secret injection for deployed agents. Uses OpenBao as the secrets backend.

helm install amp-secrets \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-secrets-extension \
  --version ${VERSION} \
  --namespace ${SECRETS_NS} \
  --create-namespace \
  --set openbao.server.dev.enabled=true \
  --timeout 600s

kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 \
  statefulset/amp-secrets-openbao -n ${SECRETS_NS} --timeout=300s

warning

Dev mode uses an in-memory backend — secrets are lost on restart. For production, disable dev mode and configure persistent storage.

Step 5: Observability Extension (Traces Observer)

Deploys the Traces Observer service that queries and serves trace data to the console.

helm install amp-observability-traces \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-observability-extension \
  --version ${VERSION} \
  --namespace ${OBSERVABILITY_NS} \
  --timeout 1800s

kubectl wait --for=condition=Available \
  deployment/amp-traces-observer -n ${OBSERVABILITY_NS} --timeout=600s

Step 6: Evaluation Extension

Installs workflow templates for running automated evaluations (accuracy, safety, reasoning, tool usage) against agent traces.

helm install amp-evaluation-extension \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-evaluation-extension \
  --version ${VERSION} \
  --namespace ${BUILD_CI_NS} \
  --timeout 1800s

info

The default publisher.apiKey must match publisherApiKey.value in the Agent Manager chart. Both default to amp-internal-api-key.

Step 7: AI Gateway Extension

Registers the AI Gateway with the Agent Manager and deploys the gateway stack. Install this last — it requires the Agent Manager API to be healthy and Thunder to be ready for token exchange.

The gateway.vhost is the URL that deployed agents use to reach the AI Gateway. It must be set to the in-cluster service URL so that agent workloads running inside the cluster can route LLM traffic through the gateway.

helm install amp-ai-gateway \
  oci://${HELM_CHART_REGISTRY}/wso2-amp-ai-gateway-extension \
  --version ${VERSION} \
  --namespace ${DATA_PLANE_NS} \
  --set apiGateway.controlPlane.host="amp-api-gateway-manager.${AMP_NS}.svc.cluster.local:9243" \
  --set agentManager.apiUrl="http://amp-api.${AMP_NS}.svc.cluster.local:9000/api/v1" \
  --set agentManager.idp.tokenUrl="${THUNDER_INTERNAL_URL}/oauth2/token" \
  --set gateway.vhost="http://default-ai-gateway-gateway-runtime.${DATA_PLANE_NS}.svc.cluster.local:8084" \
  --timeout 1800s

kubectl wait --for=condition=complete job/amp-gateway-bootstrap \
  -n ${DATA_PLANE_NS} --timeout=300s

Verify

kubectl get jobs -n ${DATA_PLANE_NS} | grep amp-gateway-bootstrap
# STATUS should show "Complete"

Exposing the AI Gateway publicly

The AI Gateway's LoadBalancer service is already externally reachable if your cluster supports it. To use a public URL as the vhost instead of the in-cluster service URL:

# Get the AI Gateway's external IP
AI_GW_IP=$(kubectl get svc default-ai-gateway-gateway-runtime -n ${DATA_PLANE_NS} \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# Use a nip.io domain or your own DNS record
export AI_GATEWAY_VHOST="http://ai-gateway.${AI_GW_IP//./-}.nip.io:8084"

# Set during install:
#   --set gateway.vhost="${AI_GATEWAY_VHOST}"

For production, point a DNS record (e.g., ai-gateway.yourdomain.com) at the LoadBalancer IP and configure TLS termination. Agents running outside the cluster will need this public URL to reach the gateway.

Verify and Access the Platform

Run a full status check to confirm everything is running:

# All pods across key namespaces
kubectl get pods -n openchoreo-control-plane
kubectl get pods -n openchoreo-data-plane
kubectl get pods -n openchoreo-workflow-plane
kubectl get pods -n openchoreo-observability-plane
kubectl get pods -n wso2-amp
kubectl get pods -n amp-thunder
kubectl get pods -n amp-secrets

# Helm releases
helm list -A | grep -E 'openchoreo|amp|gateway'

Via LoadBalancer

Service	URL
OpenChoreo API	`https://api.${CP_BASE_DOMAIN}`

Via Port Forwarding (Agent Manager)

# Agent Manager Console
kubectl port-forward -n wso2-amp svc/amp-console 3000:3000 &

# Agent Manager API
kubectl port-forward -n wso2-amp svc/amp-api 9000:9000 &

# Thunder (required for OAuth login)
kubectl port-forward -n amp-thunder svc/amp-thunder-extension-service 8090:8090 &

# Traces Observer
kubectl port-forward -n openchoreo-observability-plane svc/amp-traces-observer 9098:9098 &

# Observability Gateway (HTTP)
kubectl port-forward -n openchoreo-data-plane svc/obs-gateway-gateway-gateway-runtime 22893:22893 &

# AI Gateway (HTTP) — for testing from outside the cluster
kubectl port-forward -n openchoreo-data-plane svc/default-ai-gateway-gateway-runtime 8084:8084 &

After port forwarding:

Service	URL
Agent Manager Console	http://localhost:3000
Agent Manager API	http://localhost:9000
Thunder	http://localhost:8090
Traces Observer	http://localhost:9098
Observability Gateway	http://localhost:22893/otel
AI Gateway	http://localhost:8084

Default credentials: admin / admin

Cloud Provider Notes

AWS EKS

LoadBalancers return a hostname instead of an IP — use dig to resolve

For internet-facing access, annotate LoadBalancer services:

service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

Ensure security groups allow HTTP/HTTPS traffic

Google Cloud Platform (GKE)

LoadBalancers return IPs directly — no special handling needed
Ensure firewall rules allow HTTP/HTTPS traffic to LoadBalancers

Microsoft Azure (AKS)

LoadBalancers return IPs directly — no special handling needed
Ensure Network Security Groups allow HTTP/HTTPS traffic

Rancher Desktop / k3s

Remove Traefik before installation (see Step 1)
Single-node clusters work for development but may run low on resources with all observability modules
LoadBalancer IPs are assigned via the built-in k3s servicelb
cgroup pids controller issue — see Build workflow fails with cgroup pids error in Troubleshooting

Cleanup

Remove all Agent Manager and OpenChoreo resources:

# 1. Delete plane registrations
kubectl delete clusterdataplane default -n default
kubectl delete clusterworkflowplane default -n default
kubectl delete observabilityplane default -n default

# 2. Uninstall all Helm releases
helm uninstall amp -n wso2-amp
helm uninstall amp-ai-gateway -n openchoreo-data-plane
helm uninstall amp-thunder-extension -n amp-thunder
helm uninstall amp-secrets -n amp-secrets
helm uninstall amp-observability-traces -n openchoreo-observability-plane
helm uninstall amp-evaluation-extension -n openchoreo-workflow-plane
helm uninstall amp-platform-resources -n default
helm uninstall gateway-operator -n openchoreo-data-plane
helm uninstall openchoreo-observability-plane -n openchoreo-observability-plane
helm uninstall openchoreo-workflow-plane -n openchoreo-workflow-plane
helm uninstall openchoreo-data-plane -n openchoreo-data-plane
helm uninstall openchoreo-control-plane -n openchoreo-control-plane
helm uninstall openbao -n openbao
helm uninstall external-secrets -n external-secrets
helm uninstall cert-manager -n cert-manager

# 3. Delete namespaces
kubectl delete namespace wso2-amp amp-thunder amp-secrets \
  openchoreo-observability-plane openchoreo-workflow-plane \
  openchoreo-data-plane openchoreo-control-plane \
  openbao external-secrets cert-manager

Production Considerations

This installation is designed for development and exploration. For production:

Use proper domains — Replace nip.io with registered domain names and configure DNS
Wildcard TLS certificates — Use DNS-01 validation for wildcard certificates from a trusted CA
Identity provider — Replace Thunder dev mode with a proper IdP (Asgardeo, Auth0, Okta)
Thunder URL — Set THUNDER_PUBLIC_URL to a publicly accessible domain with proper TLS
Secrets backend — Disable OpenBao dev mode; configure persistent storage and proper auth
Observability storage — Configure persistent volumes for OpenSearch
High availability — Deploy multiple replicas across availability zones
Resource sizing — Adjust requests/limits based on workload
Security hardening — Apply network policies, RBAC, pod security standards

Troubleshooting

LoadBalancer not getting external IP

kubectl describe svc <service-name> -n <namespace>

For EKS, ensure the AWS Load Balancer Controller is installed and the service has the correct annotations.

On k3s/Rancher Desktop, check if another service (like Traefik) is already using the required ports:

kubectl get svc -A --field-selector spec.type=LoadBalancer

Certificate not being issued

kubectl describe certificate <cert-name> -n <namespace>
kubectl get clusterissuers
kubectl get certificaterequests -n <namespace>

Plane registration issues

kubectl get clusterdataplane default -n default -o yaml
kubectl logs -n openchoreo-control-plane -l app.kubernetes.io/name=openchoreo-control-plane

Agent Manager API returns 401 for environment/gateway calls

This typically means the OpenChoreo Control Plane's OIDC issuer does not match the iss claim in Thunder-issued JWTs. Verify:

# Check what issuer Thunder puts in tokens
kubectl exec -n amp-thunder deploy/amp-thunder-extension-deployment -- \
  wget -qO- http://localhost:8090/.well-known/openid-configuration 2>/dev/null \
  | grep -o '"issuer":"[^"]*"'

# Check what the Control Plane expects
kubectl get configmap openchoreo-api-config -n openchoreo-control-plane -o yaml \
  | grep issuer

Both must match exactly. If they don't, update the Control Plane's security.oidc.issuer to match Thunder's issuer.

Console shows "refused to connect" on login

The console redirects to Thunder for OAuth login. Thunder must be accessible from the browser at the URL configured in THUNDER_PUBLIC_URL. For port-forwarding setups, ensure Thunder is forwarded:

kubectl port-forward -n amp-thunder svc/amp-thunder-extension-service 8090:8090 &

If you need to change Thunder's public URL after installation, you must uninstall, delete the PVC, and reinstall:

helm uninstall amp-thunder-extension -n amp-thunder
kubectl delete pvc -n amp-thunder --all
# Then reinstall with the new THUNDER_PUBLIC_URL

OpenSearch connectivity issues

kubectl get pods -n openchoreo-observability-plane -l app=opensearch
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -v http://opensearch.openchoreo-observability-plane.svc.cluster.local:9200

Build workflow fails with cgroup pids error (Rancher Desktop)

If the build workflow fails with:

Error: OCI runtime error: crun: the requested cgroup controller `pids` is not available
Error: exit status 126

This happens on Rancher Desktop because the underlying Lima VM (Alpine Linux) does not delegate the pids cgroup controller to containers. The Podman containers inside the build workflow cannot create the required cgroup namespace.

Fix: Patch the ClusterWorkflowTemplates that use Podman to inject a containers.conf that disables cgroup management.

note

The patch commands below require python3 to be installed on your machine.

Run these commands to patch each template:

# Patch gcp-buildpacks-build (build-image step)
kubectl get clusterworkflowtemplate gcp-buildpacks-build -o json | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''set -e

# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf

'''
data['spec']['templates'][0]['container']['args'][0] = script.replace('set -e\n', fix, 1)
json.dump(data, sys.stdout)
" | kubectl apply -f -

# Patch publish-image
kubectl get clusterworkflowtemplate publish-image -o json | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''set -e

# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf

'''
data['spec']['templates'][0]['container']['args'][0] = script.replace('set -e\n', fix, 1)
json.dump(data, sys.stdout)
" | kubectl apply -f -

# Patch amp-generate-workload
kubectl get clusterworkflowtemplate amp-generate-workload -o json | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
script = data['spec']['templates'][0]['container']['args'][0]
fix = '''# Fix: disable cgroup management for Podman (Rancher Desktop cgroup pids workaround)
cat > /tmp/containers.conf <<CCONF
[engine]
cgroup_manager = \"cgroupfs\"
events_logger = \"file\"
[containers]
pids_limit = 0
CCONF
export CONTAINERS_CONF=/tmp/containers.conf

'''
data['spec']['templates'][0]['container']['args'][0] = fix + script
json.dump(data, sys.stdout)
" | kubectl apply -f -

After patching, re-trigger the build workflow. These patches are applied in-cluster and will be overwritten if the Helm chart (amp-platform-resources) is reinstalled.

note

This issue affects Rancher Desktop specifically because it runs k3s inside a Lima VM with Alpine Linux, which uses OpenRC instead of systemd. The pids cgroup controller is not delegated to containers by default. Other Kubernetes distributions (EKS, GKE, AKS) are not affected.

What You Will Get​

Prerequisites​

Cluster Requirements​

Supported Providers​

Required Tools​

Permissions​

Configuration Variables​

Phase 1: OpenChoreo Platform​

Step 1: Install Cluster Prerequisites​

Step 2: Setup Secrets Store (OpenBao)​

Step 3: Setup TLS​

Step 4: Install Thunder Extension (Identity Provider)​

Step 5: Install OpenChoreo Control Plane​

Step 6: Setup Data Plane​

Step 7: Setup Workflow Plane​

Step 8: Setup Observability Plane​

Step 9: Verify OpenChoreo Installation​

Phase 2: Agent Manager Installation​

Core Components​

Step 1: Gateway Operator​

Step 2: Agent Manager (API + Console + PostgreSQL)​

Step 3: Platform Resources​

Container registry configuration

Extensions​

Step 4: Secrets Extension (OpenBao)​

Step 5: Observability Extension (Traces Observer)​

Step 6: Evaluation Extension​

Step 7: AI Gateway Extension​

Verify and Access the Platform​

Via LoadBalancer​

Via Port Forwarding (Agent Manager)​

Cloud Provider Notes​

Cleanup​

Production Considerations​

Troubleshooting​

What You Will Get

Prerequisites

Cluster Requirements

Supported Providers

Required Tools

Permissions

Configuration Variables

Phase 1: OpenChoreo Platform

Step 1: Install Cluster Prerequisites

Step 2: Setup Secrets Store (OpenBao)

Step 3: Setup TLS

Step 4: Install Thunder Extension (Identity Provider)

Step 5: Install OpenChoreo Control Plane

Step 6: Setup Data Plane

Step 7: Setup Workflow Plane

Step 8: Setup Observability Plane

Step 9: Verify OpenChoreo Installation

Phase 2: Agent Manager Installation

Core Components

Step 1: Gateway Operator

Step 2: Agent Manager (API + Console + PostgreSQL)

Step 3: Platform Resources

Extensions

Step 4: Secrets Extension (OpenBao)

Step 5: Observability Extension (Traces Observer)

Step 6: Evaluation Extension

Step 7: AI Gateway Extension

Verify and Access the Platform

Via LoadBalancer

Via Port Forwarding (Agent Manager)

Cloud Provider Notes

Cleanup

Production Considerations

Troubleshooting