iSharkFly-Docs/amazon-bedrock-agentcore-samples

mirror of synced 2026-05-22 22:53:35 +00:00

Files

T

afarntrog 3ff6e8ee2b update evals package name (#985 )

* update evals package name

* update evals package name

2026-02-18 16:57:48 -05:00

00-prereqs

feat: policy & eval samples (#712 )

2025-12-02 12:39:56 -08:00

01-creating-custom-evaluators

feat: policy & eval samples (#712 )

2025-12-02 12:39:56 -08:00

02-running-evaluations

feat: policy & eval samples (#712 )

2025-12-02 12:39:56 -08:00

03-advanced

update evals package name (#985 )

2026-02-18 16:57:48 -05:00

04-using-evaluation-results

Adding evaluation analyzer agent to analyze LLMaaJ responses and suggest improvment (#785 )

2025-12-17 09:03:40 -05:00

images

feat: policy & eval samples (#712 )

2025-12-02 12:39:56 -08:00

README.md

feat: policy & eval samples (#712 )

2025-12-02 12:39:56 -08:00

requirements.txt

Updates policy workshop (#769 )

2025-12-12 18:44:04 -05:00

README.md

Overview

Amazon Bedrock AgentCore Evaluations helps you optimize your agent's quality based on real-world interactions.

Key Features

While AgentCore Observability provides operational insights into agent health, AgentCore Evaluations focuses on agent decision quality and performance outcomes.

It provides built-in and custom evaluators with both on-demand and online evaluation capabilities.

Built-in and Custom Evaluators

AgentCore Evaluations offers 13 built-in evaluators for critical dimensions like correctness, helpfulness, and safety, plus the ability to create custom evaluators for business-specific requirements.

Test your agents during development and deployment using the on-demand evaluations API, or monitor production agents with the online evaluations API.

On-demand Evaluations

Run synchronous, on-demand evaluations using built-in and custom metrics on individual traces.

The system uses OpenTelemetry (OTEL) traces to perform scoring and returns a response that includes:

Score value
Explanation for the score
Token usage

Online Evaluations

In production, you need continuous performance monitoring across all interactions without manually evaluating each trace. A statistical sample is often sufficient for generating meaningful performance metrics.

AgentCore Evaluations' online capabilities enable automatic sampling and evaluation:

Define your sample size and trace selection criteria
Choose your evaluation metrics (built-in or custom)
AgentCore Evaluations handles the rest, generating the performance data you need to monitor your agent at scale

Tutorials overview

In these tutorials we will cover the following functionality:

Pre-requisites: Creating a sample agent to use during the evaluation tutorials
Create a custom evaluator: Learn about built-in and custom metrics, and create a custom metric for evaluating your agents
Using on-demand and online evaluations: Learn how to use on-demand and online evaluations to build, optimize, and monitor your agent at scale
Advanced: Explore advanced capabilities including using the boto3 SDK to query Amazon CloudWatch logs for on-demand evaluation, and creating local dashboards to visualize experiments with different agent configuration