# Evaluation Monitors

This tutorial walks you through creating an evaluation monitor, viewing results, and managing monitors in the AMP Console.

## Prerequisites[​](#prerequisites "Direct link to Prerequisites")

* A running AMP instance (see [Quick Start](/agent-manager/docs/v0.11.x/getting-started/quick-start/.md))
* An agent registered in AMP with an active environment
* Agent traces being collected (see [Observe Your First Agent](/agent-manager/docs/v0.11.x/tutorials/observe-first-agent/.md))
* For LLM-as-Judge evaluators: an API key for a [supported LLM provider](/agent-manager/docs/v0.11.x/concepts/evaluation/.md#supported-llm-providers)

***

## Create a Monitor[​](#create-a-monitor "Direct link to Create a Monitor")

### Step 1: Navigate to Evaluation[​](#step-1-navigate-to-evaluation "Direct link to Step 1: Navigate to Evaluation")

1. Open the AMP Console and select your agent.
2. Click the **Evaluation** tab.
3. Click **Add Monitor**.

![Evaluation tab with Add Monitor button](/agent-manager/assets/images/evaluation-tab-4433e0b08c99d5e1703edf27f6626d9d.png)

***

### Step 2: Configure Monitor Details[​](#step-2-configure-monitor-details "Direct link to Step 2: Configure Monitor Details")

Fill in the monitor configuration:

* **Monitor Title**: A descriptive name for the monitor (e.g., "Production Quality Monitor").

* **Identifier**: Auto-generated from the title. You can customize it (must be lowercase with hyphens, 3–60 characters).

* **Data Collection Type**: Choose one:

  <!-- -->

  * **Past Traces**: evaluate traces from a specific time window. Set a **Start Time** and **End Time**. The evaluation runs immediately after creation.
  * **Future Traces**: evaluate new traces on a recurring schedule. Set an **interval** in minutes (minimum 5 minutes).

![Monitor details form](/agent-manager/assets/images/create-step1-251e3f9de1f8422dbc8a8263884bc364.png)

Choosing a monitor type

Use **Past Traces** when you want to assess historical agent behavior, such as reviewing last week's interactions after a deployment. Use **Future Traces** for ongoing production quality monitoring.

***

### Step 3: Select and Configure Evaluators[​](#step-3-select-and-configure-evaluators "Direct link to Step 3: Select and Configure Evaluators")

1. Browse the evaluator grid. Each card shows the evaluator name, tags, and a brief description.
2. Click an evaluator card to open its details and configuration.
3. Configure parameters as needed. For example, set `max_latency_ms` for the Latency evaluator, or choose a model for an LLM-as-Judge evaluator.
4. Click **Add Evaluator** to include it in the monitor.
5. Repeat for all evaluators you want to use. You must select at least one.

For a full reference of available evaluators and their parameters, see [Built-in Evaluators](/agent-manager/docs/v0.11.x/concepts/evaluation/.md#built-in-evaluators). You can also create your own (see [Custom Evaluators](/agent-manager/docs/v0.11.x/tutorials/custom-evaluators/.md)).

![Evaluator selection grid](/agent-manager/assets/images/create-step2-evaluators-e2510d3b27b92c86f4d2300f3301b216.png)

![Evaluator configuration drawer](/agent-manager/assets/images/evaluator-drawer-d703bac41269fc05bde4b7552498071b.png)

***

### Step 4: Configure LLM Providers (LLM-as-Judge only)[​](#step-4-configure-llm-providers-llm-as-judge-only "Direct link to Step 4: Configure LLM Providers (LLM-as-Judge only)")

If you selected any LLM-as-Judge evaluators, you need to configure at least one LLM provider. Skip this step if you only selected rule-based evaluators.

1. In the evaluator configuration panel, find the **LLM Providers** section.
2. Select a provider from the dropdown (OpenAI, Anthropic, Google AI Studio, Groq, or Mistral AI).
3. Enter your API key.
4. Click **Add** to save the credentials.

The **model** field on LLM-as-Judge evaluators uses `provider/model` format (e.g., `openai/gpt-4o-mini`, `anthropic/claude-sonnet-4-6`). The available models depend on the providers you have configured.

tip

You only need to add each provider once per monitor. All evaluators using that provider share the same credentials.

![LLM provider configuration](/agent-manager/assets/images/llm-provider-config-df8267ee276724f1f14c857231ce30df.png)

***

### Step 5: Create the Monitor[​](#step-5-create-the-monitor "Direct link to Step 5: Create the Monitor")

Review your configuration and click **Create Monitor**.

* **Historical monitors** start evaluating immediately. Results appear in the dashboard once the run completes.
* **Continuous monitors** start in Active status. The first evaluation runs within 60 seconds, then repeats at the configured interval.

***

## View Monitor Results[​](#view-monitor-results "Direct link to View Monitor Results")

After creation, you'll see your monitor in the monitor list. Click a monitor to open its dashboard.

### Dashboard Overview[​](#dashboard-overview "Direct link to Dashboard Overview")

The monitor dashboard provides several views of your evaluation results:

* **Time Range Selector**: filter results by Last 24 Hours, Last 3 Days, Last 7 Days, or Last 30 Days. Historical monitors show their fixed trace window instead.

* **Agent Performance Chart**: a radar chart showing mean scores across all evaluators, giving a quick visual summary of agent strengths and weaknesses.

* **Evaluation Summary**: shows the weighted average score and total evaluation count, with **per-level statistics**:

  * **Trace level**: number of traces evaluated, evaluator count, and skip rate
  * **Agent level**: number of agent executions evaluated, evaluator count, and skip rate
  * **LLM level**: number of LLM invocations evaluated, evaluator count, and skip rate

  Only levels with configured evaluators appear in the summary.

* **Run Summary**: latest run status with quick access to run history.

* **Performance by Evaluator**: a time-series chart showing how each evaluator's score trends over time. Useful for spotting regressions or improvements.

![Monitor dashboard](/agent-manager/assets/images/monitor-dashboard-2737f9a2afd23cf404a75ef0e93e1be8.png)

### Score Breakdowns[​](#score-breakdowns "Direct link to Score Breakdowns")

When your monitor includes agent-level or LLM-level evaluators, the dashboard shows additional breakdown tables below the performance chart.

#### Score Breakdown by Agent[​](#score-breakdown-by-agent "Direct link to Score Breakdown by Agent")

A table with one row per agent found in the evaluated traces. Each row shows:

* **Agent name**: the agent's identifier from the trace
* **Evaluator scores**: mean score for each agent-level evaluator, displayed as color-coded percentage chips. A dash (–) indicates the evaluator was skipped for that agent.
* **Count**: the number of agent executions evaluated

This helps you identify which agent in a multi-agent system needs improvement.

#### Score Breakdown by Model[​](#score-breakdown-by-model "Direct link to Score Breakdown by Model")

A table with one row per LLM model used across the evaluated traces. Each row shows:

* **Model name**: the LLM model identifier (e.g., `gpt-4o`, `claude-sonnet-4-6`)
* **Evaluator scores**: mean score for each LLM-level evaluator
* **Count**: the number of LLM invocations evaluated

This helps you compare quality across different models used by your agents.

### Run History[​](#run-history "Direct link to Run History")

The dashboard also shows a history of all evaluation runs. Each run displays:

* **Status**: pending, running, success, or failed
* **Trace window**: the start and end time of traces evaluated
* **Timestamps**: when the run started and completed

You can take actions on individual runs:

* **Rerun**: re-execute the evaluation run.
* **View Logs**: see detailed execution logs for troubleshooting.

![Run history](/agent-manager/assets/images/run-history-4c920ef26f152b9f32882fe5a3dad22f.png)

### Run Logs[​](#run-logs "Direct link to Run Logs")

Click **View Logs** on any run to open the log viewer. This displays the application logs from the monitor's evaluation job, useful for diagnosing failed or unexpected runs.

![Run logs](/agent-manager/assets/images/run-logs-f9041130b81ca131b58421d6cda105f0.png)

***

## View Scores in Trace View[​](#view-scores-in-trace-view "Direct link to View Scores in Trace View")

Evaluation scores are also visible directly in the trace view, making it easy to investigate specific agent interactions without switching to the monitor dashboard.

### Score Column in Traces Table[​](#score-column-in-traces-table "Direct link to Score Column in Traces Table")

The traces list includes a **Score** column showing the average evaluator score for each trace. Scores are color-coded (green for high scores, red for low), giving you a quick visual indicator of which traces need attention.

![Traces table with score column](/agent-manager/assets/images/traces-table-scores-1e5ee859f46ec5bcd34a2d1a58af38b8.png)

### Scores in Span Details[​](#scores-in-span-details "Direct link to Scores in Span Details")

Click any trace to open the trace timeline. Select a span to view its details panel:

1. **Score chips in the header**: evaluator scores appear as color-coded percentage chips in the span's basic info section, alongside duration, token count, and model information.

2. **Scores tab**: a dedicated tab shows each evaluator's result:

   <!-- -->

   * **Evaluator name**: prefixed with the monitor name when the same evaluator appears in multiple monitors (e.g., `production-monitor / Accuracy`)
   * **Score chip**: color-coded percentage (green for high, red for low)
   * **Explanation**: markdown-rendered explanation from the evaluator describing why this score was given
   * **Skipped evaluators**: shown with a skip reason instead of a score

Trace-level scores appear on the root span. Agent-level and LLM-level scores appear on their respective agent and LLM spans.

![Span details with scores tab](/agent-manager/assets/images/span-scores-tab-f96f8977d464d73b5ba34f4277e7d9ef.png)

***

## Start and Suspend a Monitor[​](#start-and-suspend-a-monitor "Direct link to Start and Suspend a Monitor")

This applies to **continuous monitors** only.

* **Suspend**: Click the **pause** button in the actions column. The monitor stops running on schedule but retains all configuration and historical results. You can resume it at any time.
* **Start**: Click the **play** button on a suspended monitor. Evaluation resumes within 60 seconds.

info

Historical monitors cannot be started or suspended. They run once when created.

***

## Edit a Monitor[​](#edit-a-monitor "Direct link to Edit a Monitor")

1. Click the **edit** (pencil) icon in the monitor list actions column.
2. The monitor configuration wizard opens with the current settings.
3. Update the fields you want to change: display name, evaluators, evaluator parameters, LLM provider credentials, interval (for continuous monitors), or time range (for historical monitors).
4. Click **Save** to apply the changes.

info

The monitor type (continuous or historical) cannot be changed after creation.

***

## Delete a Monitor[​](#delete-a-monitor "Direct link to Delete a Monitor")

1. Click the **delete** (trash) icon in the monitor list actions column.
2. Confirm the deletion in the dialog.

Deletion permanently removes the monitor and all its associated run history and scores. This action cannot be undone.
