* feat(cdk): reorganize CDK samples into python/ and typescript/ folders - Move existing Python CDK samples to cdk/python/ - Add TypeScript CDK samples folder with knowledge-base-rag-agent - Update cdk/README.md with language comparison table - Update parent README with new paths and TypeScript mention - Add cdk/python/README.md for Python-specific guidance 🤖 Assisted by Amazon Q Developer * docs: add Jerad Engebreth to CONTRIBUTORS.md 🤖 Assisted by Amazon Q Developer * fix(cdk/typescript): document known vulnerabilities and fix npm workspaces build - Add Known Dependency Vulnerabilities section to README documenting upstream issues in aws-amplify (fast-xml-parser, lodash) - Add build/test scripts to Lambda layer package.json to fix npm workspaces build command 🤖 Assisted by Amazon Q Developer * fix(security): add HEALTHCHECK and non-root USER to Dockerfile - Add HEALTHCHECK instruction for container orchestration - Create non-root appuser for security best practices - Addresses CKV_DOCKER_2, CKV_DOCKER_3 security findings * fix(security): address CodeQL findings for insecure randomness and HTML sanitization - Replace Math.random() with crypto.randomBytes() for session ID generation - Use iterative sanitization loop to handle nested/obfuscated HTML tags - Addresses CodeQL insecure randomness and incomplete sanitization findings * fix(security): improve HTML sanitization to address CodeQL findings - Handle closing tags with spaces like </script > - Add data: and vbscript: URL scheme blocking - Use tag-based approach instead of content-matching regex - Add more dangerous tags (form, input, button, etc.) * remove unused import * fix(lint): fix import ordering and remove extra blank lines - Sort imports alphabetically (logging before os) - Remove extra blank line in knowledge_base.py - Consistent import grouping (stdlib, then third-party) * fix(security): use HTML entity encoding instead of regex-based sanitization - Replace regex-based tag stripping with HTML entity encoding - Encode all special characters (&, <, >, ", ', /, `, =) - This approach is CodeQL-compliant and more secure - Regex-based HTML filtering is inherently flawed * fix(lint): add __all__ to fix F401 unused import warnings - Add __all__ exports to infra_utils/__init__.py files - Explicitly declares AgentCoreRole as public API * style: apply ruff formatting to all Python files in 04-infrastructure-as-code - Format 32 Python files with ruff - Includes CDK Python samples, Terraform samples, and TypeScript agent code * refactor: rename project from bedrock-agentcore-template to knowledge-base-rag-agent - Update package.json names for root and infrastructure packages - Update README and docs with new project name and paths - Update CloudWatch, SNS, KMS, and Cognito resource names - Regenerate package-lock.json with new package names * refactor: complete project rename to knowledge-base-rag-agent - Update README title and all documentation headers - Update TypeScript stack descriptions and resource names - Update Python agent module docstrings - Update Dockerfile header comment - Update Lambda function package description - Rename runtime to knowledge_base_rag_agent - Rename memory to knowledge_base_rag_agent_memory - Rename API to Knowledge Base RAG Agent API - Update Secrets Manager secret name * fix: correct Docker references and fix Lambda bundling - Update README and docs to clarify Docker is for AgentCore Runtime container, not Lambda bundling - Add @aws-lambda-powertools/logger dependency for Lambda function - Add esbuild as dev dependency for NodejsFunction bundling - Fix S3 bucket deployment to use single deployment with auto content-type detection - Deploy config.json separately with prune:false to preserve other files --------- Co-authored-by: Jerad Engebreth <awsjerad@amazon.com>
Multi-Agent Runtime on Amazon Bedrock AgentCore (Terraform)
This Terraform module deploys a multi-agent system using Amazon Bedrock AgentCore Runtime with Agent-to-Agent (A2A) communication capabilities.
Table of Contents
- Overview
- Architecture
- What's Included
- Prerequisites
- Quick Start
- Deployment Process
- Authentication Model
- Testing
- Agent Capabilities
- Customization
- File Structure
- Monitoring and Observability
- Security
- Pricing
- Troubleshooting
- Cleanup
- Advanced Topics
- Next Steps
- Resources
- Contributing
- License
Overview
This pattern demonstrates deploying a multi-agent system with two coordinating agents that communicate via the Agent-to-Agent (A2A) protocol. Agent1 (Orchestrator) can delegate specialized tasks to Agent2 (Specialist), enabling modular and scalable agent architectures.
Key Features:
- Two-agent architecture with A2A communication
- Automated Docker image building via CodeBuild
- S3-based source code management with change detection
- IAM-based security with least-privilege access
- Sequential deployment ensuring proper dependencies
This makes it ideal for:
- Building complex multi-agent workflows
- Implementing agent specialization patterns
- Creating scalable agent orchestration systems
- Learning A2A communication protocols
Architecture
System Components
Agent1 (Orchestrator Agent)
- Receives initial user requests
- Orchestrates workflow between multiple agents
- Contains a specialized tool (
call_specialist_agent) to invoke Agent2 - Has IAM permissions to invoke Agent2's runtime
- Environment variable
AGENT2_ARNenables A2A communication
Agent2 (Specialist Agent)
- Independent specialist agent with domain-specific capabilities
- Provides data analysis and processing functions
- Can be invoked by Agent1 via A2A protocol
- No dependencies on other agents
Agent-to-Agent (A2A) Communication
The A2A communication pattern enables:
- Orchestration: Agent1 coordinates complex workflows
- Specialization: Agent2 focuses on specific capabilities
- Scalability: Easy to add more specialized agents
- Security: IAM-based authorization between agents
What's Included
This Terraform configuration creates:
- 2 S3 Buckets: Source code storage for both agents with versioning
- 2 ECR Repositories: Container registries for ARM64 Docker images
- 2 CodeBuild Projects: Automated image building and pushing
- 3 IAM Roles:
- Agent1 execution role (with A2A permissions)
- Agent2 execution role (standard permissions)
- CodeBuild service role
- 2 Agent Runtimes:
- Agent1 (Orchestrator) with AGENT2_ARN environment variable
- Agent2 (Specialist) independent runtime
- Build Automation: Automatic rebuild on code changes (MD5-based detection)
- Supporting Resources: S3 lifecycle policies, ECR lifecycle policies, IAM policies
Total: ~30 AWS resources deployed and managed by Terraform
Prerequisites
Required Tools
-
Terraform (>= 1.6)
- Recommended: tfenv for version management
- Or download directly: terraform.io/downloads
Note:
brew install terraformprovides v1.5.7 (deprecated). Use tfenv or direct download for >= 1.6. -
AWS CLI (configured with credentials)
aws configure -
Python 3.11+ (for testing scripts)
python --version # Verify Python 3.11 or later pip install boto3 -
Docker (for local testing, optional)
AWS Account Requirements
- AWS Account with appropriate permissions
- Access to Amazon Bedrock AgentCore service
- Permissions to create:
- S3 buckets
- ECR repositories
- CodeBuild projects
- IAM roles and policies
- AgentCore Runtime resources
Quick Start
1. Configure Variables
Copy the example variables file and customize:
cp terraform.tfvars.example terraform.tfvars
Edit terraform.tfvars with your preferred values:
orchestrator_name: Name for the orchestrator agent (default: "OrchestratorAgent")specialist_name: Name for the specialist agent (default: "SpecialistAgent")stack_name: Stack identifier (default: "agentcore-multi-agent")aws_region: AWS region for deployment (default: "us-west-2")network_mode: PUBLIC or PRIVATE networking
2. Initialize Terraform
See State Management Options in the main README for detailed guidance on local vs. remote state.
Quick start with local state:
terraform init
For team collaboration, use remote state - see the main README for setup instructions.
3. Deploy
Method 1: Using Deploy Script (Recommended)
chmod +x deploy.sh
./deploy.sh
The script validates configuration, shows the plan, and deploys all resources.
Method 2: Direct Terraform Commands
terraform plan
terraform apply
Note: Deployment includes creating infrastructure, building Docker images sequentially (Agent2 first, then Agent1), and establishing A2A communication. Total deployment time: ~5-10 minutes
4. Verify Deployment
# View all outputs
terraform output
# Get Agent ARNs
terraform output orchestrator_runtime_arn
terraform output specialist_runtime_arn
Deployment Process
Sequential Build Process
The deployment follows a strict sequence to ensure proper dependencies:
1. S3 Buckets Creation (orchestrator & specialist)
2. ECR Repositories Creation (orchestrator & specialist)
3. IAM Roles Creation (with A2A permissions)
4. CodeBuild Projects Creation (orchestrator & specialist)
5. Agent2 Docker Build → Agent2 Runtime Creation
6. Agent1 Docker Build → Agent1 Runtime Creation (depends on Agent2)
Critical Dependencies:
- Agent1 runtime depends on Agent2 runtime being created first
- Agent1 build depends on Agent2 build completing successfully
- Agent1 receives
AGENT2_ARNas an environment variable
Build Triggers
The infrastructure automatically triggers Docker image builds:
- When source code changes (MD5 hash detection)
- When infrastructure changes require rebuild
- Sequential: Agent2 builds first, then Agent1
Authentication Model
This pattern uses IAM-based authentication with workload identity tokens:
- Service Principal: Agents assume IAM roles via
bedrock-agentcore.amazonaws.com - Workload Identity: Agents obtain access tokens for secure operations
- A2A Authorization: Agent1 has
InvokeAgentRuntimepermission for Agent2 - API Access: Direct AWS API invocation using IAM credentials
Note: This is a backend infrastructure pattern with no user authentication layer. For user-facing applications, you would add Cognito or API Gateway authorizers separately.
Testing
The included test_multi_agent.py script is infrastructure-agnostic and works with any deployment method (Terraform, CDK, CloudFormation, or manual).
Prerequisites for Testing
Before testing, ensure you have the required packages installed:
Option A: Using uv (Recommended)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install boto3 # Required for agent invocation
Option B: System-wide installation
pip install boto3 # Required for agent invocation
Note: boto3 is required for the test script to invoke both agent runtimes via AWS API.
Basic Testing
# Get ARNs from Terraform
ORCHESTRATOR_ARN=$(terraform output -raw orchestrator_runtime_arn)
SPECIALIST_ARN=$(terraform output -raw specialist_runtime_arn)
# Test both agents
python test_multi_agent.py $ORCHESTRATOR_ARN $SPECIALIST_ARN
Test Scenarios
The script runs two tests:
- Simple Query: Basic orchestrator invocation
- A2A Communication: Orchestrator delegates to specialist via A2A protocol
Expected Output
TEST 1: Simple Query (Orchestrator) ✅
TEST 2: Complex Query with A2A Communication ✅
✅ ALL TESTS PASSED
Agent Capabilities
Agent1 (Orchestrator)
Tools:
call_specialist_agent: Invokes Agent2 for specialized processing- Parameters:
query(string) - Returns: Processed results from Agent2
- Parameters:
Use Cases:
- Complex workflow orchestration
- Multi-step data processing
- Delegation to specialized agents
Agent2 (Specialist)
Capabilities:
- Domain-specific data analysis
- Detailed information processing
- Expert-level responses
Use Cases:
- Data analysis and transformation
- Domain-specific processing
- Specialized computations
Customization
Modify Agent Code
-
Edit Agent Files
# Orchestrator Agent vim agent-orchestrator-code/agent.py vim agent-orchestrator-code/requirements.txt # Specialist Agent vim agent-specialist-code/agent.py vim agent-specialist-code/requirements.txt -
Redeploy
terraform apply # Automatically detects changes and rebuilds
Add More Agents
To add a new agent (e.g., Coordinator):
- Create
coordinator-code/directory with implementation - Add
coordinator.tffor the runtime resource - Update
s3.tf,ecr.tf,iam.tf,codebuild.tf - Create
buildspec-coordinator.yml - Update
main.tffor build sequence - Update
outputs.tfandvariables.tf
Modify Network Configuration
Change from PUBLIC to PRIVATE networking:
# terraform.tfvars
network_mode = "PRIVATE"
Requires VPC configuration (not included in this module).
File Structure
multi-agent-runtime/
├── agent-orchestrator-code/ # Orchestrator agent source code
│ ├── agent.py # Main agent implementation
│ ├── Dockerfile # Container definition
│ └── requirements.txt # Python dependencies
├── agent-specialist-code/ # Specialist agent source code
│ ├── agent.py # Main agent implementation
│ ├── Dockerfile # Container definition
│ └── requirements.txt # Python dependencies
├── orchestrator.tf # Orchestrator runtime configuration
├── specialist.tf # Specialist runtime configuration
├── main.tf # Main Terraform configuration
├── variables.tf # Input variables
├── outputs.tf # Output definitions
├── iam.tf # IAM roles and policies
├── s3.tf # S3 buckets for source code
├── ecr.tf # ECR repositories
├── codebuild.tf # CodeBuild projects
├── versions.tf # Terraform and provider versions
├── buildspec-orchestrator.yml # Orchestrator build specification
├── buildspec-specialist.yml # Specialist build specification
├── terraform.tfvars.example # Example variable values
├── backend.tf.example # Example backend configuration
├── deploy.sh # Deployment automation script
├── destroy.sh # Cleanup automation script
├── test_multi_agent.py # Infrastructure-agnostic test script
└── README.md # This file
Monitoring and Observability
CloudWatch Logs
# Orchestrator logs
aws logs tail /aws/bedrock-agentcore/agentcore-multi-agent-orchestrator-runtime --follow
# Specialist logs
aws logs tail /aws/bedrock-agentcore/agentcore-multi-agent-specialist-runtime --follow
Metrics
Access metrics in CloudWatch:
- Agent invocation count
- Agent execution duration
- Error rates
- A2A call metrics
AWS Console
Monitor in AWS Console:
- Bedrock AgentCore: Console Link
- ECR Repositories: View Docker images
- CodeBuild: Monitor build status
- CloudWatch: View logs and metrics
Security
IAM Permissions
Agent1 Execution Role:
- Standard AgentCore permissions
- Critical:
bedrock-agentcore:InvokeAgentRuntimefor Agent2
Agent2 Execution Role:
- Standard AgentCore permissions only
- No cross-agent invocation permissions needed
CodeBuild Role:
- S3 access to both agent source buckets
- ECR push access to both repositories
- CloudWatch Logs write access
Network Security
- Agents run in specified network mode (PUBLIC/PRIVATE)
- ECR repositories have account-level access controls
- S3 buckets block public access
- IAM policies follow least-privilege principle
Secrets Management
For sensitive data:
- Use AWS Secrets Manager
- Pass secret ARNs as environment variables
- Retrieve secrets at runtime in agent code
Pricing
For current pricing information, please refer to:
- Amazon Bedrock Pricing
- Amazon ECR Pricing
- AWS CodeBuild Pricing
- Amazon S3 Pricing
- Amazon CloudWatch Pricing
Note: Actual costs depend on your usage patterns, AWS region, and specific services consumed.
Troubleshooting
Common Issues
Issue: Agent1 fails to invoke Agent2
- Solution: Verify AGENT2_ARN environment variable is set
- Check: IAM permissions include InvokeAgentRuntime
Issue: Build fails
- Solution: Check CodeBuild logs in CloudWatch
- Check: Verify source code is in correct directories
Issue: Runtime not created
- Solution: Verify ECR image exists and is tagged correctly
- Check: Review Terraform state for errors
Debug Commands
# Check Terraform state
terraform show
# Validate configuration
terraform validate
# View specific resource
terraform state show aws_bedrockagentcore_agent_runtime.orchestrator
# Get detailed build logs
PROJECT_NAME=$(terraform output -raw orchestrator_codebuild_project)
aws codebuild batch-get-builds --ids $(aws codebuild list-builds-for-project --project-name $PROJECT_NAME --query 'ids[0]' --output text)
Cleanup
Automated Cleanup
chmod +x destroy.sh
./destroy.sh
The script shows the destruction plan, requires confirmation, and destroys all resources.
Manual Cleanup
terraform destroy
Important: Verify in AWS Console that all resources are deleted:
- Bedrock AgentCore runtimes
- ECR repositories
- S3 buckets
- CodeBuild projects
- IAM roles
Advanced Topics
Adding Custom Tools
- Define tool schema in agent code
- Implement tool handler function
- Register tool with agent
- Rebuild and deploy
Implementing Memory
Add session management in agent code:
session_data = {}
def handle_request(input_text, session_id):
if session_id not in session_data:
session_data[session_id] = {}
# Use session_data for context
Multi-Region Deployment
For multi-region:
- Configure backend for state locking
- Deploy to each region separately
- Use Route53 for failover
- Consider cross-region replication for S3/ECR
Next Steps
-
Test the deployment
python test_multi_agent.py $(terraform output -raw orchestrator_runtime_arn) $(terraform output -raw specialist_runtime_arn) -
Customize agents for your specific use case
- Add domain-specific tools to agents
- Implement custom business logic
- Integrate with external APIs
-
Explore related patterns
- MCP Server Pattern - MCP protocol with JWT auth
- AgentCore Samples - More examples
-
Add production features
- Monitoring and alerting
- Custom authentication layer (if needed)
- VPC deployment for private networking
- CI/CD pipeline integration
Resources
- Amazon Bedrock AgentCore Documentation
- Terraform AWS Provider
- Agent-to-Agent Communication
- Model Context Protocol
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the MIT-0 license. See the LICENSE file for details.
