iSharkFly-Docs/amazon-bedrock-agentcore-samples

mirror of synced 2026-05-22 14:43:35 +00:00

Files

T

Vasa 27b7022a8c market-trends-agent: add code-based evaluators + observability wiring (#1413 )

* Add code-based evaluators to market-trends-agent

- Add 5 Lambda-backed code-based evaluators (schema_validator, stock_price_drift,
  pii_regex, pii_comprehend, workflow_contract_gsr) with online evaluation config
- Add evaluator deploy/invoke/results scripts under evaluators/scripts/
- Enable LangchainInstrumentor so gen_ai.tool.* spans flow to AgentCore Observability
- Replace hardcoded us-east-1 with AWS_REGION env var fallback across agent and tests
- Rewrite deploy.py to use CodeBuild + bedrock-agentcore-control directly (no starter toolkit dep)
- Pin boto3 >= 1.42.0 for Evaluations control-plane APIs
- Update README: evaluator documentation, IAM split, troubleshooting, cleanup ordering
- Update architecture diagram to reflect evaluator layer
- Remove Dockerfile and .dockerignore (container built by CodeBuild, no local Docker needed)

* Fix F821 missing os import and harden stock_price_drift URL fetch

- test_broker_card.py: add 'import os' (F821 from linter)
- stock_price_drift/lambda_function.py: reject non-https reference URLs
  before urlopen() and annotate with nosec B310 / noqa S310 (Bandit)

* Apply ruff format to market-trends-agent files (python-lint CI fix)

* Re-trigger CI (previous scan job hit ECONNRESET during artifact upload)

2026-04-30 14:48:42 -07:00

6.3 KiB

Raw Blame History

Changelog

[Unreleased]

Fixed

`evaluators/scripts/deploy.py` — production control plane endpoint

Removed the CP_ENDPOINT env var and its gamma default (https://gamma.us-west-2.elcapcp.genesis-primitives.aws.dev). That endpoint is internal-only and not accessible from customer accounts.
Changed _cp_client() to use the bedrock-agentcore-control boto3 service (production control plane). Evaluators registered here are visible to the production data plane (bedrock-agentcore), which resolves the ResourceNotFoundException that occurred when evaluators were registered on the gamma CP.
Removed the hardcoded AGENT_RUNTIME_ARN default (pointing to a specific account/runtime). Added _resolve_agent_arn() which reads from the AGENT_RUNTIME_ARN env var or falls back to the .agent_arn file written by deploy.py. Exits with a clear error message if neither is set.
Fixed _create_online_config() to accept agent_runtime_arn as a parameter instead of reading the module-level constant, making the function easier to test and reason about.

`evaluators/iam/trust-policy.json` — remove internal service principal

Removed preprod.genesis-service.aws.internal from the trust policy Principal.Service list. This was an Amazon-internal pre-production service principal that is not valid in customer accounts and would cause IAM role assumption to fail at runtime.
Trust policy now contains only bedrock-agentcore.amazonaws.com.

`evaluators/scripts/invoke.py` — remove hardcoded account ARN

Removed the hardcoded AGENT_RUNTIME_ARN default (pointing to a specific account). Replaced with the same _resolve_agent_arn() pattern used in evaluators/scripts/deploy.py — reads from AGENT_RUNTIME_ARN env var or .agent_arn file.

Removed

`pyproject.toml` — starter toolkit dependency

Removed bedrock-agentcore-starter-toolkit from the project dependencies. This package was used only in deploy.py for the Runtime class; the agent code itself uses the bedrock-agentcore SDK directly (BedrockAgentCoreApp, MemoryClient).

Changed

`cleanup.py` — replace starter toolkit with SDK and boto3

Removed from bedrock_agentcore_starter_toolkit import Runtime and self.runtime = Runtime().
Added boto3.client("bedrock-agentcore-control") as self.agentcore_control. Runtime deletion now calls agentcore_control.delete_agent_runtime(agentRuntimeId=agent_id) directly.
Memory cleanup continues to use bedrock_agentcore.memory.MemoryClient (SDK), unchanged.

`deploy.py` — replace starter toolkit with SDK and boto3

Removed from bedrock_agentcore_starter_toolkit import Runtime and all uses of Runtime.configure(), Runtime.launch(), and Runtime.status().
Added from botocore.exceptions import ClientError import.
Added _trigger_codebuild() method — triggers the existing CodeBuild project (bedrock-agentcore-{agent_name}-builder) via boto3 and polls for completion. Raises RuntimeError with clear instructions if the project does not exist (pointing the user to run agentcore deploy once to bootstrap it).
Added _ensure_runtime() method — uses boto3.client("bedrock-agentcore-control") to list existing runtimes and either update the matching one or create a new runtime. Replaces the starter toolkit's Runtime.launch().
Rewrote deploy_agent() to call _trigger_codebuild() then _ensure_runtime() instead of the toolkit. Memory creation and IAM creation remain unchanged (already used the SDK and boto3 respectively).

Fixed (discovered during live testing)

`evaluators/scripts/invoke.py` — missing `Path` import

Added from pathlib import Path (was missing after the _resolve_agent_arn() refactor).

`evaluators/scripts/deploy.py` — `aws/spans` added to data source

The online eval config was initially created with only the runtime log group (/aws/bedrock-agentcore/runtimes/…-DEFAULT). The actual OTel spans (with gen_ai.tool.name, session.id, etc.) live in aws/spans. Updated _create_online_config() to include both log groups.

`evaluators/workflow_contract_gsr/lambda_function.py` — agent-agnostic contract

DEFAULT_CONTRACT originally used LangGraph tool names only (identify_broker, get_broker_financial_profile, update_broker_financial_interests, parse_broker_profile_from_message). Updated to also cover the Strands agent's tool names (update_broker_profile, get_broker_profile) and removed the identify_broker group (not a separate tool in the Strands implementation). Both agent styles now score correctly against the contract.

`evaluators/schema_validator/lambda_function.py` — status-only span support

Strands agents emit gen_ai.tool.status: "success" in span attributes but do not embed output text (gen_ai.tool.call.result is absent). Added a fallback in _tool_output_text() to return the status string when no richer output is available. Added _is_status_only() helper so _validate_get_stock_data() and _validate_search_news() pass on status-only spans rather than failing. Agents that do embed result text continue to be validated structurally as before.

Added

`README.md` — custom code-based evaluators documentation

Added a full "Evaluating Your Agent with Custom Code-Based Evaluators" section covering:
- How code-based evaluators work (data flow diagram)
- Description of all five evaluators with level, folder, and what each checks
- Evaluator label reference table
- IAM requirements for the execution roles
- Step-by-step setup instructions (evaluators/scripts/deploy.py)
- Traffic generation guide (evaluators/scripts/invoke.py) with per-scenario expected outcomes
- Results viewing guide (evaluators/scripts/results.py)
- AgentCore CLI reference — agentcore eval evaluator create, agentcore add online-eval, agentcore run eval, agentcore evals history, agentcore logs evals, agentcore pause/resume online-eval
- Evaluator cleanup instructions
Added evaluators to the architecture diagram and component table.
Corrected LLM model name in architecture section (Claude Haiku 4.5, matching the code).
Added link to official AWS docs: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-based-evaluators.html

6.3 KiB Raw Blame History

Changelog

[Unreleased]

Fixed

evaluators/scripts/deploy.py — production control plane endpoint

evaluators/iam/trust-policy.json — remove internal service principal

evaluators/scripts/invoke.py — remove hardcoded account ARN

Removed

pyproject.toml — starter toolkit dependency

Changed

cleanup.py — replace starter toolkit with SDK and boto3

deploy.py — replace starter toolkit with SDK and boto3

Fixed (discovered during live testing)

evaluators/scripts/invoke.py — missing Path import

evaluators/scripts/deploy.py — aws/spans added to data source

evaluators/workflow_contract_gsr/lambda_function.py — agent-agnostic contract

evaluators/schema_validator/lambda_function.py — status-only span support

Added

README.md — custom code-based evaluators documentation

6.3 KiB

Raw Blame History

`evaluators/scripts/deploy.py` — production control plane endpoint

`evaluators/iam/trust-policy.json` — remove internal service principal

`evaluators/scripts/invoke.py` — remove hardcoded account ARN

`pyproject.toml` — starter toolkit dependency

`cleanup.py` — replace starter toolkit with SDK and boto3

`deploy.py` — replace starter toolkit with SDK and boto3

`evaluators/scripts/invoke.py` — missing `Path` import

`evaluators/scripts/deploy.py` — `aws/spans` added to data source

`evaluators/workflow_contract_gsr/lambda_function.py` — agent-agnostic contract

`evaluators/schema_validator/lambda_function.py` — status-only span support

`README.md` — custom code-based evaluators documentation