27b7022a8c
* Add code-based evaluators to market-trends-agent - Add 5 Lambda-backed code-based evaluators (schema_validator, stock_price_drift, pii_regex, pii_comprehend, workflow_contract_gsr) with online evaluation config - Add evaluator deploy/invoke/results scripts under evaluators/scripts/ - Enable LangchainInstrumentor so gen_ai.tool.* spans flow to AgentCore Observability - Replace hardcoded us-east-1 with AWS_REGION env var fallback across agent and tests - Rewrite deploy.py to use CodeBuild + bedrock-agentcore-control directly (no starter toolkit dep) - Pin boto3 >= 1.42.0 for Evaluations control-plane APIs - Update README: evaluator documentation, IAM split, troubleshooting, cleanup ordering - Update architecture diagram to reflect evaluator layer - Remove Dockerfile and .dockerignore (container built by CodeBuild, no local Docker needed) * Fix F821 missing os import and harden stock_price_drift URL fetch - test_broker_card.py: add 'import os' (F821 from linter) - stock_price_drift/lambda_function.py: reject non-https reference URLs before urlopen() and annotate with nosec B310 / noqa S310 (Bandit) * Apply ruff format to market-trends-agent files (python-lint CI fix) * Re-trigger CI (previous scan job hit ECONNRESET during artifact upload)
6.3 KiB
6.3 KiB
Changelog
[Unreleased]
Fixed
evaluators/scripts/deploy.py — production control plane endpoint
- Removed the
CP_ENDPOINTenv var and its gamma default (https://gamma.us-west-2.elcapcp.genesis-primitives.aws.dev). That endpoint is internal-only and not accessible from customer accounts. - Changed
_cp_client()to use thebedrock-agentcore-controlboto3 service (production control plane). Evaluators registered here are visible to the production data plane (bedrock-agentcore), which resolves theResourceNotFoundExceptionthat occurred when evaluators were registered on the gamma CP. - Removed the hardcoded
AGENT_RUNTIME_ARNdefault (pointing to a specific account/runtime). Added_resolve_agent_arn()which reads from theAGENT_RUNTIME_ARNenv var or falls back to the.agent_arnfile written bydeploy.py. Exits with a clear error message if neither is set. - Fixed
_create_online_config()to acceptagent_runtime_arnas a parameter instead of reading the module-level constant, making the function easier to test and reason about.
evaluators/iam/trust-policy.json — remove internal service principal
- Removed
preprod.genesis-service.aws.internalfrom the trust policyPrincipal.Servicelist. This was an Amazon-internal pre-production service principal that is not valid in customer accounts and would cause IAM role assumption to fail at runtime. - Trust policy now contains only
bedrock-agentcore.amazonaws.com.
evaluators/scripts/invoke.py — remove hardcoded account ARN
- Removed the hardcoded
AGENT_RUNTIME_ARNdefault (pointing to a specific account). Replaced with the same_resolve_agent_arn()pattern used inevaluators/scripts/deploy.py— reads fromAGENT_RUNTIME_ARNenv var or.agent_arnfile.
Removed
pyproject.toml — starter toolkit dependency
- Removed
bedrock-agentcore-starter-toolkitfrom the project dependencies. This package was used only indeploy.pyfor theRuntimeclass; the agent code itself uses thebedrock-agentcoreSDK directly (BedrockAgentCoreApp,MemoryClient).
Changed
cleanup.py — replace starter toolkit with SDK and boto3
- Removed
from bedrock_agentcore_starter_toolkit import Runtimeandself.runtime = Runtime(). - Added
boto3.client("bedrock-agentcore-control")asself.agentcore_control. Runtime deletion now callsagentcore_control.delete_agent_runtime(agentRuntimeId=agent_id)directly. - Memory cleanup continues to use
bedrock_agentcore.memory.MemoryClient(SDK), unchanged.
deploy.py — replace starter toolkit with SDK and boto3
- Removed
from bedrock_agentcore_starter_toolkit import Runtimeand all uses ofRuntime.configure(),Runtime.launch(), andRuntime.status(). - Added
from botocore.exceptions import ClientErrorimport. - Added
_trigger_codebuild()method — triggers the existing CodeBuild project (bedrock-agentcore-{agent_name}-builder) via boto3 and polls for completion. RaisesRuntimeErrorwith clear instructions if the project does not exist (pointing the user to runagentcore deployonce to bootstrap it). - Added
_ensure_runtime()method — usesboto3.client("bedrock-agentcore-control")to list existing runtimes and either update the matching one or create a new runtime. Replaces the starter toolkit'sRuntime.launch(). - Rewrote
deploy_agent()to call_trigger_codebuild()then_ensure_runtime()instead of the toolkit. Memory creation and IAM creation remain unchanged (already used the SDK and boto3 respectively).
Fixed (discovered during live testing)
evaluators/scripts/invoke.py — missing Path import
- Added
from pathlib import Path(was missing after the_resolve_agent_arn()refactor).
evaluators/scripts/deploy.py — aws/spans added to data source
- The online eval config was initially created with only the runtime log group
(
/aws/bedrock-agentcore/runtimes/…-DEFAULT). The actual OTel spans (withgen_ai.tool.name,session.id, etc.) live inaws/spans. Updated_create_online_config()to include both log groups.
evaluators/workflow_contract_gsr/lambda_function.py — agent-agnostic contract
DEFAULT_CONTRACToriginally used LangGraph tool names only (identify_broker,get_broker_financial_profile,update_broker_financial_interests,parse_broker_profile_from_message). Updated to also cover the Strands agent's tool names (update_broker_profile,get_broker_profile) and removed theidentify_brokergroup (not a separate tool in the Strands implementation). Both agent styles now score correctly against the contract.
evaluators/schema_validator/lambda_function.py — status-only span support
- Strands agents emit
gen_ai.tool.status: "success"in span attributes but do not embed output text (gen_ai.tool.call.resultis absent). Added a fallback in_tool_output_text()to return the status string when no richer output is available. Added_is_status_only()helper so_validate_get_stock_data()and_validate_search_news()pass on status-only spans rather than failing. Agents that do embed result text continue to be validated structurally as before.
Added
README.md — custom code-based evaluators documentation
- Added a full "Evaluating Your Agent with Custom Code-Based Evaluators" section covering:
- How code-based evaluators work (data flow diagram)
- Description of all five evaluators with level, folder, and what each checks
- Evaluator label reference table
- IAM requirements for the execution roles
- Step-by-step setup instructions (
evaluators/scripts/deploy.py) - Traffic generation guide (
evaluators/scripts/invoke.py) with per-scenario expected outcomes - Results viewing guide (
evaluators/scripts/results.py) - AgentCore CLI reference —
agentcore eval evaluator create,agentcore add online-eval,agentcore run eval,agentcore evals history,agentcore logs evals,agentcore pause/resume online-eval - Evaluator cleanup instructions
- Added evaluators to the architecture diagram and component table.
- Corrected LLM model name in architecture section (Claude Haiku 4.5, matching the code).
- Added link to official AWS docs: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-based-evaluators.html