Run Agent Manager on a VM with Docker
Both installation paths on this page Simple and Advanced are intended for evaluation, demos, and proof-of-concept use only. Do not use them to run production workloads or to handle sensitive or regulated data.
For production, run Agent Manager on a properly operated Kubernetes platform with high availability, a managed and backed-up database, secret management, monitoring, and a hardened, redundant ingress following your organization's production practices. Use these installers to try Agent Manager out, not to run it for real.
Install Agent Manager on a Linux VM where Docker is the only host dependency. Pick the path that fits you:
- Simple — give the installer the VM's public IP and it does everything else: hostnames are derived from the IP via sslip.io and TLS certificates are issued automatically by Let's Encrypt. No domain, no DNS setup, no certificate handling. Best for demos and quick evaluations.
- Advanced — a config-file-driven installer for custom domains and operator-managed TLS: use your own custom domain, bring your own certificates, or front the VM with a load balancer that terminates TLS. Adds pre-flight validation of your config, certificates, and DNS.
- Simple (IP + automatic TLS)
- Advanced (custom domain / BYOC / load balancer)
The simple installer exposes the platform over HTTPS using sslip.io hostnames derived from the VM's public IP, so there's no domain registration and no client /etc/hosts edits.
Prerequisites​
You only need an SSH client to log into the VM; everything else runs on the VM.
- Git to clone this repository and fetch the installer. This is the one tool you need before running the script (the script can't install what you use to download it). Most images have it; on a minimal one install it with
sudo apt-get update && sudo apt-get install -y git(Debian/Ubuntu). - Docker is required — the whole stack runs on it (k3d runs the Kubernetes cluster as Docker containers, and Caddy runs as a container). If Docker isn't already installed, the script installs it for you, along with k3d, kubectl, helm, and lsof.
- A Linux VM with a static (reserved) public IP and SSH access (sudo). The install derives every hostname, TLS certificate, and OAuth issuer from the IP (
*.amp.<IP>.sslip.io), so a changing IP breaks the install — and stopping the VM (for example to resize its disk) releases an ephemeral IP. Reserve the address before installing. If the IP ever changes, reinstall against the new IP. - At least 50 GB of disk. Building and running agents pushes the in-cluster image store past 13 GB; on a smaller disk the node hits
DiskPressure, which evicts pods and can take cluster DNS down mid-build. - At least 4 vCPUs and 8 GB of RAM to run the full k3d + OpenChoreo + Agent Manager stack comfortably.
- Inbound
443/tcpopen in the cloud security group / firewall — and only 443. Certificates issue via the TLS-ALPN-01 ACME challenge, which runs inside the:443TLS handshake, so no inbound port 80 is ever needed. The:443exposure must be TCP passthrough (not a TLS-terminating load balancer in front), since the challenge happens inside the handshake.
Install​
SSH into the VM, get the installer, and run it with sudo:
# on the VM
git clone https://github.com/wso2/agent-manager.git
cd agent-manager/deployments/vm
git checkout tags/amp/v0.16.0
sudo ./install-vm.sh \
--host <VM_PUBLIC_IP> \
--version 0.16.0 \
--email you@example.com
Pass --host the VM's public IPv4 address — a cloud VM usually can't read its own public IP (it's NAT'd behind the address you used to SSH in), so the installer needs it to build the *.amp.<IP>.sslip.io hostnames.
The installer runs in two phases — bootstrap (Docker + tools + firewall) and the platform install + Caddy startup. Allow 8–15 minutes. It needs sudo because it installs Docker, opens the firewall, and creates the cluster.
Options​
| Flag | Default | Purpose |
|---|---|---|
--host | (required) | The VM's public IPv4 address |
--version | (required) | Agent Manager release to install — use the same amp/v* tag you checked out above |
--email | (none) | ACME contact for expiry notices |
--no-external-gateways | off | Drop the gateway control-plane endpoint if you won't connect external gateways |
What gets exposed​
The installer fronts the stack with Caddy, an open-source web server that terminates TLS, obtains and renews Let's Encrypt certificates automatically, and reverse-proxies each public hostname to the right service. It runs as a single amp-caddy Docker container and is the only process listening on the internet-facing ports.
Only :443 faces the internet; all other service ports are bound to the VM's loopback and reached only by Caddy.
Every public hostname resolves to the VM's IP (via sslip.io) and arrives at Caddy on :443; Caddy terminates TLS and reverse-proxies to the matching loopback port. Certificates are obtained over that same :443 using the TLS-ALPN-01 challenge, so no inbound port 80 is needed. The deployed-agent wildcard gets its certificate on demand at first request.
| URL | Purpose |
|---|---|
https://console.amp.<IP>.sslip.io | Console UI |
https://api.amp.<IP>.sslip.io | Agent Manager API (used by amctl) |
https://thunder.amp.<IP>.sslip.io | Thunder OAuth (login) |
https://observer.amp.<IP>.sslip.io | Traces Observer |
https://gateway.amp.<IP>.sslip.io/otel | OTel trace ingest from deployed agents |
https://<org>-<project>.agents.<IP>.sslip.io/... | Deployed-agent invocation endpoints (one wildcard host per org/project) |
https://cp.amp.<IP>.sslip.io | Gateway control plane — connect external gateways here (on by default) |
Log in​
Open https://console.amp.<IP>.sslip.io and sign in as the seeded Agent Manager admin user amp-admin (password amp-admin). This user holds the AMP admin role, which grants every application permission.
From the 0.16.0 release, role-based access control is enforced on the API (rbacEnabled), so the token must carry the right scopes. Note that Thunder's own system account (admin / admin, shown in the bootstrap logs) is not granted the Agent Manager application role — signing in with it lets you reach the console but every API call fails with 403 insufficient permissions. Always use amp-admin.
Deployed-agent invocation​
When you deploy an agent, its endpoint is published on a per-project host <org>-<project>.agents.<IP>.sslip.io and routed by Caddy to the OpenChoreo data-plane gateway. Because these hostnames are dynamic (a new one per org/project), Caddy issues their TLS certificates on demand at the first request (via the same ACME challenge as the fixed hosts), rather than up front. Invocations are authenticated with a user token that the gateway validates against the public Thunder issuer.
Because issuance is on demand and uses TLS-ALPN-01 (the challenge runs inside the :443 handshake), the very first request to a newly-deployed agent host can fail with a one-time certificate error — most visibly ERR_CERTIFICATE_TRANSPARENCY_REQUIRED in Chrome. That first connection is consumed by Caddy answering the ACME challenge, so the browser briefly sees the challenge certificate instead of the real one. Issuance completes within a second or two; reload the page (or open it in a fresh tab) and it serves the trusted Let's Encrypt certificate. This only affects the first hit per new agent host — the certificate is then cached in the amp-caddy-data volume.
amp-api advertises each agent endpoint with the https:// scheme (the installer sets tlsEnabled on the service), so the console — and any other caller — invokes it over TLS directly through the wildcard site.
TLS​
Caddy obtains and auto-renews trusted Let's Encrypt certificates on first start — no manual certificate steps. Issuance uses the TLS-ALPN-01 challenge, which runs inside the :443 TLS handshake, so only inbound 443 is ever required and there is no port-80 dependency. Certificates and the ACME account persist in the amp-caddy-data Docker volume, so restarts do not re-request them.
Because the challenge happens inside the TLS handshake, the public :443 must reach Caddy as raw TCP — do not put a TLS-terminating load balancer in front of the VM. There is no :80 listener, so plain http:// URLs are not served (no automatic http→https redirect); always use the https:// URLs the installer prints.
Persistence and teardown​
Application data (PostgreSQL), issued certificates, and the k3d cluster persist across Docker/host restarts via named volumes. To tear down completely, delete the cluster, then remove the Caddy front door and its volumes (which hold the issued certificates and ACME account):
cd agent-manager/deployments/quick-start
sudo ./uninstall.sh --delete-cluster # delete the k3d cluster (workloads + app data)
sudo docker rm -f amp-caddy # remove the Caddy front door
sudo docker volume rm amp-caddy-data amp-caddy-config # drop the cached certs + ACME account
Use sudo — the installer runs Docker and k3d as root. Plain ./uninstall.sh (without --delete-cluster) only removes the Helm releases and leaves the cluster running; uninstall.sh does not touch the Caddy container or its volumes, so remove those separately as shown.
Connect an external gateway​
Agent Manager can drive external WSO2 AI gateways. The control-plane endpoint https://cp.amp.<IP>.sslip.io is exposed by default for this. In the console, open Infrastructure → Gateways, generate a registration token, and follow the generated commands — they point the gateway at cp.amp.<IP>.sslip.io:443, where it opens a control WebSocket and pulls its configuration. If you do not need external gateways, install with --no-external-gateways to drop this endpoint.
Security: the registration token grants a gateway your LLM-provider API keys and proxy credentials. Treat it as a secret, revoke/regenerate it from the Gateways page when a gateway is decommissioned, and optionally restrict cp.amp... to known gateway source IPs at the firewall.
Troubleshooting​
- Certificates never issue / hosts unreachable from outside — open inbound
:443in your cloud security group / NACL, and make sure the public:443reaches the VM as raw TCP: a TLS-terminating load balancer in front breaks the TLS-ALPN-01 challenge. The installer can't verify external reachability from inside the VM, so this surfaces as Caddy failing to obtain certificates (docker logs amp-caddy). - Certificate not issued — check
docker logs amp-caddy. Let's Encrypt rate limits on sslip.io are high but not infinite; if hit, retry shortly. - Login redirect mismatch — confirm you reached the console via its
console.amp.<IP>.sslip.ioURL, not the raw IP. 403 insufficient permissionson API calls — you are signed in as Thunder's systemadminaccount, which has no Agent Manager application role. Sign out and sign back in asamp-admin(see Log in).- Certificate error on first agent invocation (
ERR_CERTIFICATE_TRANSPARENCY_REQUIREDor similar) — the per-agent certificate is issued on demand, and the first request races with that issuance. Reload the page after a second or two; it only happens once per new agent host (see Deployed-agent invocation).
The advanced installer (install-advanced.sh) is for deployments that need a real domain or operator-managed certificates. It is driven by a config file and supports three TLS modes. Like the simple installer, it runs on the VM with sudo.
Use the advanced installer when you want any of:
- a custom domain (e.g.
console.amp.mycompany.com) instead of an IP-derivedsslip.ioname; - bring-your-own certificates (BYOC) issued by a corporate CA or pulled from a secrets store, rather than Let's Encrypt;
- TLS terminated upstream by a cloud load balancer or corporate proxy that already owns the public certificate.
If none of those apply, prefer the Simple tab.
Prerequisites​
The compute, disk, and tooling prerequisites are the same as the Simple tab:
- Git to clone this repository and fetch the installer. This is the one tool you need before running the script (the script can't install what you use to download it); on a minimal image install it with
sudo apt-get update && sudo apt-get install -y git(Debian/Ubuntu). - Docker — the whole stack runs on it. If it isn't already installed, the script installs it for you, along with k3d, kubectl, helm, lsof, and openssl.
- A Linux VM with at least 4 vCPUs, 8 GB RAM, and 50 GB of disk, with SSH access (sudo).
In addition, the advanced installer needs:
- Control of your own DNS for the chosen domain. The installer derives all service hostnames from a single base domain (
DOMAIN_BASE), so you create DNS records under that domain (see DNS). - The right inbound port open, depending on the TLS mode (see TLS modes):
443forletsencryptandbyoc, or your chosen forward port forupstream.
Configure​
Generate an annotated config template and edit it:
# on the VM
git clone https://github.com/wso2/agent-manager.git
cd agent-manager/deployments/vm
git checkout tags/amp/v0.16.0
./install-advanced.sh --init > amp-config.env
# edit amp-config.env
The config file is plain shell (sourced by the installer). The keys are:
| Key | Required | Purpose |
|---|---|---|
AMP_VERSION | yes | Agent Manager release to install — use the same amp/v* tag you checked out above (0.16.0) |
DOMAIN_BASE | yes | Base domain; service hosts are derived as <svc>.<DOMAIN_BASE> |
TLS_MODE | yes | letsencrypt, byoc, or upstream |
ACME_EMAIL | letsencrypt | ACME contact for expiry notices |
TLS_CERT_FILE / TLS_KEY_FILE | byoc | Paths to the operator certificate and private key |
UPSTREAM_LISTEN_PORT | upstream | Plain-HTTP port Caddy listens on behind the LB (default 80). Must not be a loopback-bound cluster port (3000/8080/9000/9098/9243/19080/22893); 80 is safe |
UPSTREAM_TRUSTED_PROXIES | upstream | Space-separated CIDRs of the LB whose X-Forwarded-* headers Caddy trusts (default 0.0.0.0/0) |
EXTERNAL_GATEWAYS | no | true (default) exposes the cp endpoint for external data-plane gateways |
HOST_CONSOLE, HOST_API, HOST_THUNDER, HOST_OBSERVER, HOST_GATEWAY, HOST_CP | no | Override an individual service hostname (default <svc>.<DOMAIN_BASE>) |
AGENTS_BASE | no | Base for deployed-agent hostnames (default agents.<DOMAIN_BASE>) |
With DOMAIN_BASE=amp.mycompany.com, the derived hosts are console.amp.mycompany.com, api.amp.mycompany.com, thunder.amp.mycompany.com, observer.amp.mycompany.com, gateway.amp.mycompany.com, cp.amp.mycompany.com, and deployed agents at <org>-<project>.agents.amp.mycompany.com.
TLS modes​
In every mode the URLs published to browsers and clients are https:// — that is what the user sees. Only how TLS is terminated differs.
| Mode | How TLS is handled | Inbound port to open | When to use |
|---|---|---|---|
letsencrypt | Caddy obtains and renews trusted Let's Encrypt certificates automatically (TLS-ALPN-01, inside the :443 handshake) | 443 (raw TCP, no proxy in front) | You control DNS for the domain and want automatic certificates |
byoc | Caddy serves your supplied certificate and key on :443; no ACME | 443 | Certificates come from a corporate CA or a secrets store |
upstream | A cloud load balancer / proxy in front terminates TLS; Caddy listens plain-HTTP on UPSTREAM_LISTEN_PORT and only routes by Host | the LB's forward port (the LB owns 443 publicly) | You already run an edge load balancer that holds the public certificate |
BYOC certificate requirements​
Deployed-agent endpoints live one DNS level deeper than the service hosts, at <org>-<project>.<AGENTS_BASE>. A standard *.<DOMAIN_BASE> wildcard does not cover that tier, and there is no ACME in byoc mode to issue per-host certificates on demand. So your single certificate must carry SANs covering both *.<DOMAIN_BASE> and *.<AGENTS_BASE>. The installer's pre-flight checks this and fails fast (naming the missing SAN) if it is absent, along with verifying the cert and key match and the cert is not expired.
For example, a cert request covering both tiers:
openssl req -x509 -newkey rsa:2048 -nodes -days 365 \
-keyout privkey.pem -out fullchain.pem -subj "/CN=amp.mycompany.com" \
-addext "subjectAltName=DNS:*.amp.mycompany.com,DNS:*.agents.amp.mycompany.com"
Upstream (load-balancer) topology​
In upstream mode the load balancer owns :443 and the public certificate. Configure it to forward each derived hostname to the VM's UPSTREAM_LISTEN_PORT over plain HTTP, and to set the X-Forwarded-Proto: https header — Caddy trusts it so the backends still see the original https scheme. Because the LB fronts DNS, the installer's DNS check is advisory (not a hard failure) in this mode.
Because the listen port carries plain HTTP and Caddy trusts the forwarded scheme, lock down who can reach it: restrict UPSTREAM_LISTEN_PORT to the load balancer at the firewall, and set UPSTREAM_TRUSTED_PROXIES to the LB's source CIDRs so only the LB can set X-Forwarded-*. The default (0.0.0.0/0) trusts any source and relies solely on the firewall — fine if the port is firewalled to the LB, but scoping both is safer. For a GCP Application Load Balancer the source ranges are 130.211.0.0/22 and 35.191.0.0/16, so:
UPSTREAM_TRUSTED_PROXIES="130.211.0.0/22 35.191.0.0/16"
DNS​
For letsencrypt and byoc, point A records for every service host at the VM's public IP, plus a wildcard for the deployed-agent tier. Two wildcard records are the simplest:
*.amp.mycompany.com A <VM_PUBLIC_IP> # covers console/api/thunder/observer/gateway/cp
*.agents.amp.mycompany.com A <VM_PUBLIC_IP> # covers deployed agents (one level deeper)
The second record is separate because deployed-agent hostnames sit one level below the service hosts, and a *.amp.mycompany.com wildcard does not match x.agents.amp.mycompany.com. If you use a proxying DNS provider (for example Cloudflare's orange-cloud), set these records to DNS-only — a proxy that terminates TLS in front of the VM breaks the TLS-ALPN-01 challenge.
In letsencrypt mode these records must resolve to the VM before you run the installer — ACME issuance fails otherwise, and the installer's DNS pre-flight hard-fails with the exact records to create. (The check accepts the VM's public egress IP as well as its local interface IPs, so it works correctly on NAT'd cloud VMs.) In upstream mode, point DNS at the load balancer instead; the VM-side check is advisory.
Install​
Validate and preview without touching the cluster first:
sudo ./install-advanced.sh --config amp-config.env --dry-run
This loads the config, runs the cert and (in letsencrypt) DNS pre-flight, and prints the derived hosts, helm overrides, and the rendered Caddyfile. When it looks right, run the real install:
sudo ./install-advanced.sh --config amp-config.env
It runs in two phases — bootstrap (Docker + tools + firewall) and the platform install + Caddy startup — and takes 8–15 minutes. It needs sudo because it installs Docker, opens the firewall, and creates the cluster. On completion it prints the access URLs.
Persistence and teardown​
Application data (PostgreSQL), issued certificates, and the k3d cluster persist across Docker/host restarts via named volumes. In letsencrypt mode the amp-caddy-data volume caches issued certificates and the ACME account, so restarts do not re-request them. To tear down completely:
cd agent-manager/deployments/quick-start
sudo ./uninstall.sh --delete-cluster # delete the k3d cluster (workloads + app data)
sudo docker rm -f amp-caddy # remove the Caddy front door
sudo docker volume rm amp-caddy-data amp-caddy-config # drop the cached certs + ACME account
Use sudo (Docker and k3d run as root). Plain ./uninstall.sh without --delete-cluster only removes the Helm releases and leaves the cluster running; uninstall.sh does not touch the Caddy container or its volumes, so remove those separately as shown.
Changing the domain or hostnames after install requires a teardown first. The platform install is idempotent in the "create if missing" sense — on a re-run it skips releases that already exist, so editing DOMAIN_BASE (or the HOST_* overrides) and re-running does not reconfigure the already-installed services; only Caddy's front-door TLS changes, leaving the apps advertising the old hostnames. To move an existing install to a different domain, tear it down (sudo ./uninstall.sh --delete-cluster, then remove amp-caddy and its volumes as in Persistence and teardown) and install again with the new config. (Switching only the TLS_MODE between letsencrypt/byoc/upstream while keeping the same hostnames is fine — that only re-renders Caddy.)
Connect an external gateway​
This works the same as in the Simple tab: the control-plane endpoint https://cp.<DOMAIN_BASE> is exposed by default. Generate a registration token from Infrastructure → Gateways and follow the generated commands. Set EXTERNAL_GATEWAYS=false to drop the endpoint if you do not connect external gateways. The registration token grants a gateway your LLM-provider API keys — treat it as a secret and revoke it when a gateway is decommissioned.
Troubleshooting​
- Config rejected before install — the installer prints which key is missing or invalid (e.g. an unknown
TLS_MODE, orbyocwithoutTLS_CERT_FILE). Fixamp-config.envand re-run. - Certificate validation failed (byoc) — the cert and key do not match, the cert is expired, or its SANs do not cover a service host or the
*.<AGENTS_BASE>wildcard. The message names the specific problem; reissue the certificate with the required SANs (see BYOC certificate requirements). - DNS pre-flight failed (letsencrypt) — one or more hostnames do not resolve to the VM. Create the A records listed under DNS and re-run. The message names the hosts and the expected IP.
- Certificates never issue / hosts unreachable (letsencrypt) — open inbound
:443in your cloud security group, and ensure it reaches the VM as raw TCP; a TLS-terminating load balancer in front breaks TLS-ALPN-01. If you have such a load balancer, useupstreammode instead. Checkdocker logs amp-caddy. - Changed the domain but the console still shows the old hostnames — re-running with a new
DOMAIN_BASEdoes not reconfigure existing releases. Tear down and reinstall (see Persistence and teardown). 403 insufficient permissionson API calls — sign in asamp-admin, not Thunder's systemadminaccount.