1
0
mirror of synced 2026-05-22 22:53:35 +00:00

Lakehouse agent - ensures identity propagation from logged in federated user to ensure row level security on data fetched from databases supported by MCP tools on MCP server (#853)

* Lakehouse agent which supports role based access control and row level data access protectionat data layer

* Updated the auth flow and moved instructions into notebooks

* Completed a draft

* notebooks added

* Fix: Add aws_session_utils.py to MCP server and update import

* Debug: Add version print statement to verify deployment

* Replace print statements with logging module for OpenTelemetry capture

* Update agent JWT config to accept both app and M2M client IDs

* End to end testing. Readme updated with screenshots

* End to end testing. Readme updated with screenshots

* Added streamlit notebook

* Removed redundant files

* Added Gi and Sunita to CONTRIBUTORS.md

* Added alias

* Update time needed to test values

* added cleanup notebook

* Added architecture diagram

* Ensured local files are cleaned up which was causing stale state of the MCP server and leading to Auth issues

* Ignore errors for optional parameters

* Fixed cleanup issues and ensured end to end cleanup and restore

* Explanation of oauth and authentication flow from user to MCP

* Fixed typo

* updated README

* Correct README to remove Lake Formation references. Lake Formation does not support dynamic column filters. Reverting to use interceptors only

* Added TODO with current limitations of Lakeformation

* Added TODO with current limitations of Lakeformation

* Cleanup of acct id masking file

* Added SSO based credential loading which will default to region of the SSO profile as fallback if no valid credentials are available in .env

* Fixing region resolution to be consistent resolution pattern

---------

Signed-off-by: Sunita Koppar <47020304+skopp002@users.noreply.github.com>
Co-authored-by: Sunita Koppar <skoppar@amazon.com>
Co-authored-by: Gi Kim <giryoong@amazon.com>
This commit is contained in:
Sunita Koppar
2026-01-26 12:19:00 -08:00
committed by GitHub
parent da34591f01
commit 90ff84e1b2
66 changed files with 13441 additions and 1 deletions
+60
View File
@@ -0,0 +1,60 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
# Virtual environments
venv/
env/
ENV/
.venv/
.parent_venv/
# Jupyter Notebook
.ipynb_checkpoints
*.ipynb_checkpoints/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# AWS
.aws/
*.pem
.env
.env.local
.env.*.local
# Logs
*.log
logs/
# Distribution / packaging
dist/
build/
*.egg-info/
# Testing
.pytest_cache/
.coverage
htmlcov/
# Temporary files
*.tmp
*.bak
*.swp
# AgentCore specific
.bedrock_agentcore/
.bedrock_agentcore.yaml
.agentcore.yaml
.agentcore.json
@@ -0,0 +1,369 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lakehouse Agent - Prerequisites Setup\n",
"\n",
"This notebook helps you set up the initial configuration in AWS Systems Manager (SSM) Parameter Store.\n",
"\n",
"**What this notebook does:**\n",
"- Validates your AWS credentials and region\n",
"- Reads AWS Account credentials from .env current directory\n",
"- Creates S3 bucket for lakehouse data storage\n",
"- Creates initial SSM parameters with the `/app/lakehouse-agent/` prefix\n",
"- Validates the configuration\n",
"\n",
"**Prerequisites:**\n",
"- AWS credentials configured (via AWS CLI or environment variables)\n",
"- Python 3.10 or later\n",
"- boto3 installed: `pip install boto3`\n",
"\n",
"**IAM Permissions Required:**\n",
"- `ssm:PutParameter`\n",
"- `ssm:GetParameter`\n",
"- `sts:GetCallerIdentity`\n",
"- `s3:CreateBucket`\n",
"- `s3:HeadBucket`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json\n",
"from datetime import datetime\n",
"\n",
"print(\"✅ Imports successful\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensure you have .env file in current working directory with the AWS credentials:\n",
"AWS_ACCESS_KEY_ID=\"your aws key\"\n",
"AWS_SECRET_ACCESS_KEY=\"your aws secret\"\n",
"AWS_SESSION_TOKEN=\"session token\"\n",
"AWS_DEFAULT_REGION=\"your preferred region\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load AWS credentials and initialize session\n",
"from utils.notebook_init import init_aws\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients with the validated session\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"sts_client = session.client('sts', region_name=region)\n",
"\n",
"# Store for later use\n",
"AWS_REGION = region\n",
"AWS_ACCOUNT_ID = account_id\n",
"\n",
"print(f'\\n✅ Setup complete')\n",
"print(f' Account ID: {account_id}')\n",
"print(f' Region: {region}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Define Initial Configuration\n",
"\n",
"Set your initial configuration values. These will be stored in SSM Parameter Store with the `/app/lakehouse-agent/` prefix.\n",
"\n",
"**Important Notes:**\n",
"- **AWS_REGION and AWS_ACCOUNT_ID** are auto-detected and NOT stored in SSM\n",
"- **S3_BUCKET_NAME**: Provide just the base name (e.g., `lk-agent`)\n",
" - This notebook will create the S3 bucket with full name: `{account_id}-{region}-{base_name}`\n",
" - The full bucket name will be saved to SSM for all subsequent notebooks\n",
" - Example: `XXXXXXXXXXXX-us-east-1-lk-agent`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initial configuration - UPDATE THESE VALUES\n",
"config = {\n",
" # S3 Configuration\n",
" # NOTE: Provide just the base name (e.g., 'lk-agent')\n",
" # The deployment script will create a bucket with the full name:\n",
" # {account_id}-{region}-{base_name}\n",
" # Example: XXXXXXXXXXXX-us-east-1-lk-agent\n",
" 'S3_BUCKET_NAME': 'lk-agent', # CHANGE THIS - use a unique base name\n",
" 'S3_CLAIMS_PREFIX': 'lakehouse-data/claims/',\n",
" 'S3_USERS_PREFIX': 'lakehouse-data/users/',\n",
" 'S3_ATHENA_RESULTS_PREFIX': 'athena-results/',\n",
" \n",
" # Athena Configuration\n",
" 'DATABASE_NAME': 'lakehouse_db',\n",
" 'ATHENA_WORKGROUP': 'primary',\n",
" \n",
" # Security Configuration\n",
" 'SECURITY_MODE': 'lakeformation',\n",
" 'LOCAL_DEVELOPMENT': 'false',\n",
" 'LOG_LEVEL': 'INFO',\n",
" \n",
" # Test Users\n",
" 'TEST_USER_1': 'user001@example.com',\n",
" 'TEST_USER_2': 'user002@example.com',\n",
" 'TEST_USER_3': 'adjuster001@example.com',\n",
" 'TEST_PASSWORD': 'TempPass123!'\n",
"}\n",
"\n",
"print(\"📋 Initial Configuration:\")\n",
"for key, value in config.items():\n",
" if key == 'S3_BUCKET_NAME':\n",
" print(f\" {key}: {value}\")\n",
" print(f\" → Full bucket name will be: {account_id}-{region}-{value}\")\n",
" else:\n",
" print(f\" {key}: {value}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create SSM Parameters\n",
"\n",
"This will create all parameters in SSM Parameter Store with the `/app/lakehouse-agent/` prefix.\n",
"\n",
"**Sensitive parameters** (containing SECRET, PASSWORD, KEY) will be created as SecureString."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def is_sensitive(key):\n",
" \"\"\"Check if parameter should be SecureString\"\"\"\n",
" sensitive_keywords = ['SECRET', 'PASSWORD', 'KEY', 'TOKEN']\n",
" return any(keyword in key.upper() for keyword in sensitive_keywords)\n",
"\n",
"def create_ssm_parameter(key, value, overwrite=False):\n",
" \"\"\"Create or update SSM parameter\"\"\"\n",
" # Convert to SSM parameter name (lowercase with /app/lakehouse-agent/ prefix)\n",
" # Convert underscores to hyphens for consistency\n",
" param_name = f\"/app/lakehouse-agent/{key.lower().replace('_', '-')}\"\n",
" param_type = 'SecureString' if is_sensitive(key) else 'String'\n",
" \n",
" try:\n",
" ssm_client.put_parameter(\n",
" Name=param_name,\n",
" Value=str(value),\n",
" Type=param_type,\n",
" Description=f\"Lakehouse Agent - {key}\",\n",
" Overwrite=overwrite\n",
" )\n",
" return True, param_type\n",
" except ssm_client.exceptions.ParameterAlreadyExists:\n",
" return False, param_type\n",
" except Exception as e:\n",
" print(f\"❌ Error creating {param_name}: {e}\")\n",
" return None, param_type\n",
"\n",
"# Create parameters\n",
"print(\"🔄 Creating SSM Parameters...\\n\")\n",
"created = 0\n",
"skipped = 0\n",
"failed = 0\n",
"\n",
"for key, value in config.items():\n",
" result, param_type = create_ssm_parameter(key, value, overwrite=False)\n",
" param_name = f\"/app/lakehouse-agent/{key.lower().replace('_', '-')}\"\n",
" \n",
" if result is True:\n",
" print(f\"✅ Created {param_name} ({param_type})\")\n",
" created += 1\n",
" elif result is False:\n",
" print(f\"⏭️ Skipped {param_name} (already exists)\")\n",
" skipped += 1\n",
" else:\n",
" failed += 1\n",
"\n",
"print(f\"\\n📊 Summary:\")\n",
"print(f\" Created: {created}\")\n",
"print(f\" Skipped: {skipped}\")\n",
"print(f\" Failed: {failed}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2.5: Create S3 Bucket\n",
"\n",
"Create the S3 bucket that will be used for all lakehouse data storage. The bucket will be created with the full name format: `{account_id}-{region}-{base_name}` and the full name will be saved to SSM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create S3 bucket with full name\n",
"bucket_base_name = config['S3_BUCKET_NAME']\n",
"full_bucket_name = f\"{account_id}-{region}-{bucket_base_name}\"\n",
"\n",
"print(f\"📦 Creating S3 bucket: {full_bucket_name}\\n\")\n",
"\n",
"# Initialize S3 client\n",
"s3_client = session.client('s3', region_name=region)\n",
"\n",
"try:\n",
" # Check if bucket already exists\n",
" s3_client.head_bucket(Bucket=full_bucket_name)\n",
" print(f\"✅ Bucket {full_bucket_name} already exists\")\n",
" bucket_existed = True\n",
"except:\n",
" # Bucket doesn't exist, create it\n",
" try:\n",
" if region == 'us-east-1':\n",
" s3_client.create_bucket(Bucket=full_bucket_name)\n",
" else:\n",
" s3_client.create_bucket(\n",
" Bucket=full_bucket_name,\n",
" CreateBucketConfiguration={'LocationConstraint': region}\n",
" )\n",
" print(f\"✅ Created S3 bucket: {full_bucket_name}\")\n",
" bucket_existed = False\n",
" except Exception as e:\n",
" print(f\"❌ Error creating bucket: {e}\")\n",
" raise\n",
"\n",
"# Update SSM parameter with the full bucket name\n",
"print(f\"\\n💾 Saving full bucket name to SSM...\")\n",
"try:\n",
" ssm_client.put_parameter(\n",
" Name='/app/lakehouse-agent/s3-bucket-name',\n",
" Value=full_bucket_name,\n",
" Type='String',\n",
" Description='S3 bucket name for lakehouse data storage (full name)',\n",
" Overwrite=True\n",
" )\n",
" print(f\"✅ Updated SSM parameter /app/lakehouse-agent/s3-bucket-name\")\n",
" print(f\" Value: {full_bucket_name}\")\n",
"except Exception as e:\n",
" print(f\"❌ Error updating SSM parameter: {e}\")\n",
" raise\n",
"\n",
"print(f\"\\n✅ S3 bucket setup complete!\")\n",
"print(f\" Bucket: s3://{full_bucket_name}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Validate Configuration\n",
"\n",
"Let's verify all parameters were created successfully and the S3 bucket is accessible."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def validate_ssm_parameters():\n",
" \"\"\"Validate all required parameters exist in SSM\"\"\"\n",
" print(\"🔍 Validating SSM Parameters...\\n\")\n",
" \n",
" missing = []\n",
" found = []\n",
" \n",
" for key in config.keys():\n",
" param_name = f\"/app/lakehouse-agent/{key.lower().replace('_', '-')}\"\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name)\n",
" param_type = response['Parameter']['Type']\n",
" \n",
" if param_type == 'SecureString':\n",
" value = '****** (encrypted)'\n",
" else:\n",
" value = response['Parameter']['Value']\n",
" \n",
" print(f\"✅ {param_name}: {value}\")\n",
" found.append(param_name)\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f\"❌ {param_name}: NOT FOUND\")\n",
" missing.append(param_name)\n",
" \n",
" print(f\"\\n📊 Validation Summary:\")\n",
" print(f\" Found: {len(found)}\")\n",
" print(f\" Missing: {len(missing)}\")\n",
" \n",
" if missing:\n",
" print(f\"\\n⚠️ Missing parameters: {', '.join(missing)}\")\n",
" return False\n",
" else:\n",
" print(f\"\\n✅ All parameters validated successfully!\")\n",
" return True\n",
"\n",
"validate_ssm_parameters()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"✅ **Prerequisites Complete!**\n",
"\n",
"Your configuration is ready:\n",
"- ✅ SSM Parameter Store configured\n",
"- ✅ S3 bucket created: `{full_bucket_name}`\n",
"- ✅ All parameters validated\n",
"\n",
"**Next:** Run `01-deploy-athena.ipynb` to create the Athena database and tables.\n",
"\n",
"The Athena deployment will automatically use the S3 bucket created in this notebook."
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lakehouse Agent - Deploy Athena Database\n",
"\n",
"This notebook deploys the Athena database and tables for the lakehouse data layer.\n",
"\n",
"**What this notebook does:**\n",
"- Uses the S3 bucket created in prerequisites setup\n",
"- Uploads sample claims and users data to S3\n",
"- Creates Athena database: `lakehouse_db`\n",
"- Creates tables: `claims` and `users`\n",
"- Verifies deployment with test queries\n",
"\n",
"**Prerequisites:**\n",
"- ✅ Completed `00-prerequisites-setup.ipynb` (S3 bucket must be created)\n",
"- ✅ SSM parameters configured with `/app/lakehouse-agent/` prefix\n",
"- ✅ AWS credentials with Athena, S3, and Glue permissions\n",
"\n",
"**IAM Permissions Required:**\n",
"- `athena:*`\n",
"- `s3:*`\n",
"- `glue:*`\n",
"- `ssm:GetParameter`\n",
"\n",
"**Duration:** ~10 minutes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"import subprocess\n",
"from pathlib import Path\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"print(f\"✅ Ready to proceed with AWS operations\")\n",
"print(f\" Account ID: {account_id}\")\n",
"print(f\" Region: {region}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Validate Prerequisites\n",
"\n",
"Check that all required SSM parameters from the previous notebook exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize AWS clients using the validated session from cell 1\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"\n",
"# Check required parameters with new naming convention\n",
"print(\"🔍 Validating prerequisites...\\n\")\n",
"\n",
"required_params = [\n",
" '/app/lakehouse-agent/s3-bucket-name',\n",
" '/app/lakehouse-agent/database-name',\n",
" '/app/lakehouse-agent/athena-workgroup'\n",
"]\n",
"\n",
"missing = []\n",
"config_values = {}\n",
"\n",
"for param in required_params:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param)\n",
" value = response['Parameter']['Value']\n",
" # Extract key name for config_values dict\n",
" key = param.split('/')[-1]\n",
" config_values[key] = value\n",
" print(f\"✅ {param}: {value}\")\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f\"❌ {param}: NOT FOUND\")\n",
" missing.append(param)\n",
"\n",
"if missing:\n",
" print(f\"\\n❌ Missing parameters: {', '.join(missing)}\")\n",
" print(\"Please run 00-prerequisites-setup.ipynb first\")\n",
"else:\n",
" print(\"\\n✅ All prerequisites validated!\")\n",
" \n",
" # Load configuration from SSM\n",
" BUCKET_NAME = config_values['s3-bucket-name']\n",
" DATABASE_NAME = config_values['database-name']\n",
" WORKGROUP = config_values.get('athena-workgroup', 'primary')\n",
" \n",
" print(f\"\\n📋 Configuration:\")\n",
" print(f\" Bucket: {BUCKET_NAME}\")\n",
" print(f\" Database: {DATABASE_NAME}\")\n",
" print(f\" Workgroup: {WORKGROUP}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Deploy Athena Database\n",
"\n",
"Run the Athena setup script to create the database, tables, and upload sample data.\n",
"\n",
"**Note**: The S3 bucket was already created in the prerequisites notebook. This step will use that existing bucket and create Athena tables on top of it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🚀 Running Athena setup...\\n\")\n",
"\n",
"print(f\"📦 Using S3 bucket from SSM: {BUCKET_NAME}\")\n",
"print(f\" The setup script will read this from SSM Parameter Store\")\n",
"print()\n",
"\n",
"# Run setup_athena.py WITHOUT --bucket-name argument\n",
"# The script will automatically read the bucket name from SSM Parameter Store\n",
"result = subprocess.run(\n",
" ['python', 'setup_athena.py'],\n",
" cwd='deployment/athena-setup',\n",
" capture_output=True,\n",
" text=True\n",
")\n",
"\n",
"print(result.stdout)\n",
"\n",
"if result.returncode != 0:\n",
" print(\"❌ Error during Athena setup:\")\n",
" print(result.stderr)\n",
"else:\n",
" print(\"\\n✅ Athena setup completed successfully!\")\n",
" print(\"\\n💾 Database and tables created using existing S3 bucket\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Validate Deployment\n",
"\n",
"Verify that the database and tables were created successfully."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🔍 Validating Athena deployment...\\n\")\n",
"\n",
"# Use session from cell 1 to create AWS clients\n",
"athena_client = session.client('athena', region_name=region)\n",
"glue_client = session.client('glue', region_name=region)\n",
"\n",
"# Check database exists\n",
"try:\n",
" response = glue_client.get_database(Name=DATABASE_NAME)\n",
" print(f\"✅ Database '{DATABASE_NAME}' exists\")\n",
"except glue_client.exceptions.EntityNotFoundException:\n",
" print(f\"❌ Database '{DATABASE_NAME}' not found\")\n",
"\n",
"# Check tables exist\n",
"try:\n",
" response = glue_client.get_tables(DatabaseName=DATABASE_NAME)\n",
" tables = response['TableList']\n",
" \n",
" print(f\"\\n📋 Tables in {DATABASE_NAME}:\")\n",
" for table in tables:\n",
" table_name = table['Name']\n",
" column_count = len(table['StorageDescriptor']['Columns'])\n",
" print(f\" • {table_name} ({column_count} columns)\")\n",
" \n",
" if len(tables) >= 2:\n",
" print(\"\\n✅ All tables created successfully\")\n",
" else:\n",
" print(f\"\\n⚠️ Expected 2 tables, found {len(tables)}\")\n",
" \n",
"except Exception as e:\n",
" print(f\"❌ Error checking tables: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"✅ **Athena Database Deployment Complete!**\n",
"\n",
"Your Athena database is now set up with:\n",
"- Database: `lakehouse_db`\n",
"- Tables: `claims` (9 sample claims), `users` (3 test users)\n",
"- S3 data location: `s3://{BUCKET_NAME}/lakehouse-data/`\n",
"\n",
"**Next:** Run `02-deploy-cognito.ipynb` to set up authentication.\n",
"\n",
"### Test Queries\n",
"\n",
"You can test the deployment with these queries in the Athena console:\n",
"\n",
"```sql\n",
"-- Count all claims\n",
"SELECT COUNT(*) as total_claims FROM lakehouse_db.claims;\n",
"\n",
"-- View claims for user001\n",
"SELECT claim_id, claim_type, claim_status, claim_amount \n",
"FROM lakehouse_db.claims \n",
"WHERE user_id = 'user001@example.com';\n",
"\n",
"-- View all users\n",
"SELECT * FROM lakehouse_db.users;\n",
"```\n",
"\n",
"### Verify SSM Parameters\n",
"\n",
"The setup script automatically saves configuration to SSM:\n",
"```bash\n",
"aws ssm get-parameter --name /app/lakehouse-agent/s3-bucket-name\n",
"aws ssm get-parameter --name /app/lakehouse-agent/database-name\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,193 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy Cognito\n",
"\n",
"Set up user authentication with AWS Cognito."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- ✅ Run `01-deploy-athena.ipynb` first\n",
"\n",
"## What This Notebook Does\n",
"\n",
"1. Creates Cognito User Pool\n",
"2. Creates App Client with OAuth scopes\n",
"3. Creates test users\n",
"4. Saves configuration to SSM\n",
"\n",
"## Next Notebook\n",
"\n",
"- **03-deploy-mcp-server.ipynb**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"from pathlib import Path\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"cognito_client = session.client('cognito-idp', region_name=region)\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"\n",
"print(f\"✅ Ready to proceed with AWS operations\")\n",
"print(f\" Account ID: {account_id}\")\n",
"print(f\" Region: {region}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Run Cognito Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"\n",
"result = subprocess.run(\n",
" ['python', 'setup_cognito.py', '--region', region],\n",
" cwd='deployment/cognito-setup',\n",
" capture_output=True,\n",
" text=True\n",
")\n",
"\n",
"print(result.stdout)\n",
"if result.returncode != 0:\n",
" print('❌ Error:', result.stderr)\n",
"else:\n",
" print('\\n✅ Cognito setup complete!')\n",
" print('\\n📋 Configuration automatically saved to SSM Parameter Store')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Verify Cognito Configuration in SSM\n",
"\n",
"The setup_cognito.py script automatically saves all configuration to SSM Parameter Store.\n",
"Run this cell to verify the parameters were saved correctly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify Cognito Configuration in SSM Parameter Store\n",
"\n",
"print(\"Verifying Cognito parameters in SSM Parameter Store...\\n\")\n",
"\n",
"# List of parameters to check\n",
"parameters_to_check = [\n",
" '/app/lakehouse-agent/cognito-user-pool-id',\n",
" '/app/lakehouse-agent/cognito-app-client-id',\n",
" '/app/lakehouse-agent/cognito-domain',\n",
" '/app/lakehouse-agent/cognito-resource-server-id',\n",
"]\n",
"\n",
"# Check each parameter\n",
"all_found = True\n",
"for param_name in parameters_to_check:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name)\n",
" value = response['Parameter']['Value']\n",
" # Mask sensitive values\n",
" display_value = value[:30] + '...' if len(value) > 30 else value\n",
" print(f'✅ {param_name}')\n",
" print(f' Value: {display_value}')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f'❌ {param_name} - NOT FOUND')\n",
" all_found = False\n",
" except Exception as e:\n",
" print(f'⚠️ {param_name} - ERROR: {e}')\n",
" all_found = False\n",
"\n",
"# Check secure parameters (without displaying values)\n",
"secure_params = [\n",
" '/app/lakehouse-agent/cognito-app-client-secret',\n",
"]\n",
"\n",
"for param_name in secure_params:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name, WithDecryption=False)\n",
" print(f'✅ {param_name} (SecureString)')\n",
" print(f' Value: ***MASKED***')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f'❌ {param_name} - NOT FOUND')\n",
" all_found = False\n",
" except Exception as e:\n",
" print(f'⚠️ {param_name} - ERROR: {e}')\n",
"\n",
"if all_found:\n",
" print('\\n✅ All Cognito parameters verified in SSM Parameter Store!')\n",
"else:\n",
" print('\\n⚠️ Some parameters are missing. Re-run the setup_cognito.py script.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"✅ **Cognito Deployment Complete!**\n",
"\n",
"**Test Users Created:**\n",
"- user001@example.com / TempPass123!\n",
"- user002@example.com / TempPass123!\n",
"- adjuster001@example.com / TempPass123!\n",
"\n",
"**Next Steps:**\n",
"Run **04-deploy-mcp-server.ipynb**"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,172 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 4: Deploy MCP Server\n",
"\n",
"Deploy the MCP Server to AgentCore Runtime."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- ✅ Run `02-deploy-cognito.ipynb` first\n",
"- ✅ Docker installed and running\n",
"\n",
"## What This Notebook Does\n",
"\n",
"1. Deploys MCP Server to AgentCore Runtime\n",
"2. Configures JWT authentication with Cognito\n",
"3. Saves Runtime ARN to SSM\n",
"\n",
"## Next Notebook\n",
"\n",
"- **04-deploy-gateway.ipynb**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"import os\n",
"from pathlib import Path\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"os.environ[\"AWS_DEFAULT_REGION\"] = region\n",
"\n",
"print('✅ Ready to proceed with AWS operations')\n",
"print(f' Account ID: {account_id}')\n",
"print(f' Region: {region}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Deploy MCP Server\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"\n",
"# Run deploy_runtime.py with --yes flag to skip interactive confirmation\n",
"# This allows the script to run without blocking in the notebook\n",
"result = subprocess.run(\n",
" ['python', 'deploy_runtime.py', '--yes'],\n",
" cwd='deployment/mcp-lakehouse-server',\n",
" capture_output=True,\n",
" text=True\n",
")\n",
"\n",
"print(result.stdout)\n",
"if result.returncode != 0:\n",
" print('❌ Error:', result.stderr)\n",
"else:\n",
" print('\\n✅ MCP Server deployed!')\n",
" print('\\n📋 Runtime configuration saved to SSM Parameter Store')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Verify MCP Server Deployment\n",
"\n",
"The deploy_runtime.py script automatically saves the Runtime ARN to SSM.\n",
"Run this cell to verify the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify MCP Server Runtime configuration in SSM\n",
"print(\"Verifying MCP Server Runtime in SSM...\\n\")\n",
"\n",
"parameters_to_check = [\n",
" '/app/lakehouse-agent/mcp-server-runtime-arn',\n",
" '/app/lakehouse-agent/mcp-server-runtime-id',\n",
"]\n",
"\n",
"all_found = True\n",
"for param_name in parameters_to_check:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name)\n",
" value = response['Parameter']['Value']\n",
" print(f'✅ {param_name}')\n",
" print(f' Value: {value}')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f'❌ {param_name} - NOT FOUND')\n",
" all_found = False\n",
" except Exception as e:\n",
" print(f'⚠️ {param_name} - ERROR: {e}')\n",
" all_found = False\n",
"\n",
"if all_found:\n",
" print('\\n✅ MCP Server Runtime configuration verified in SSM!')\n",
"else:\n",
" print('\\n⚠️ MCP Server Runtime parameters missing.')\n",
" print(' The deploy_runtime.py script should have saved these automatically.')\n",
" print(' Check the deployment output for errors.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"✅ **MCP Server Deployment Complete!**\n",
"\n",
"The MCP Server Runtime has been deployed and configuration saved to SSM Parameter Store.\n",
"\n",
"**Next Steps:**\n",
"Run **04-deploy-gateway.ipynb** to deploy the AgentCore Gateway"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,237 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy Gateway & Interceptor\n",
"\n",
"Deploy the AgentCore Gateway and Interceptor Lambda."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- ✅ Run `03-deploy-mcp-server.ipynb` first\n",
"- ✅ MCP Server Runtime ARN saved to SSM\n",
"\n",
"## What This Notebook Does\n",
"\n",
"1. Deploys Gateway Interceptor Lambda\n",
"2. Creates AgentCore Gateway\n",
"3. Configures OAuth token validation\n",
"4. Saves Gateway ARN to SSM\n",
"\n",
"## Next Notebook\n",
"\n",
"- **05-deploy-agent.ipynb**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"from pathlib import Path\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, AWS_REGION, AWS_ACCOUNT_ID = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"lambda_client = session.client('lambda', region_name=AWS_REGION)\n",
"ssm_client = session.client('ssm', region_name=AWS_REGION)\n",
"\n",
"print('✅ Ready to proceed with AWS operations')\n",
"print(f' Account ID: {AWS_ACCOUNT_ID}')\n",
"print(f' Region: {AWS_REGION}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Deploy Interceptor Lambda"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"\n",
"# Run deploy_interceptor.py to deploy Lambda function\n",
"result = subprocess.run(\n",
" ['bash', 'deploy.sh'],\n",
" cwd='deployment/gateway-setup/interceptor',\n",
" capture_output=True,\n",
" text=True\n",
")\n",
"\n",
"print(result.stdout)\n",
"if result.returncode != 0:\n",
" print('❌ Error:', result.stderr)\n",
"else:\n",
" print('\\n✅ Interceptor Lambda deployed!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Get Required ARNs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get Interceptor Lambda ARN from SSM (saved by deploy_interceptor.py)\n",
"INTERCEPTOR_ARN = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/interceptor-lambda-arn'\n",
")['Parameter']['Value']\n",
"print(f'✅ Interceptor ARN: {INTERCEPTOR_ARN}')\n",
"\n",
"# Get MCP Server Runtime ARN from SSM\n",
"MCP_SERVER_RUNTIME_ARN = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/mcp-server-runtime-arn'\n",
")['Parameter']['Value']\n",
"print(f'✅ MCP Server ARN: {MCP_SERVER_RUNTIME_ARN}')\n",
"\n",
"# Get Cognito User Pool ARN\n",
"COGNITO_USER_POOL_ID = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/cognito-user-pool-id'\n",
")['Parameter']['Value']\n",
"COGNITO_USER_POOL_ARN = f'arn:aws:cognito-idp:{AWS_REGION}:{AWS_ACCOUNT_ID}:userpool/{COGNITO_USER_POOL_ID}'\n",
"print(f'✅ Cognito User Pool ARN: {COGNITO_USER_POOL_ARN}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create AgentCore Gateway\n",
"\n",
"This will create the gateway and automatically configure it with the MCP server and interceptor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create AgentCore Gateway\n",
"result = subprocess.run([\n",
" 'python', 'create_gateway.py',\n",
" '--yes' # Auto-confirm for notebook execution\n",
"], cwd='deployment/gateway-setup', capture_output=True, text=True)\n",
"\n",
"print(result.stdout)\n",
"if result.returncode != 0:\n",
" print('❌ Error:', result.stderr)\n",
"else:\n",
" print('\\n✅ Gateway created!')\n",
" print('\\n📋 Gateway ARN saved to SSM Parameter Store')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Verify Gateway Configuration\n",
"\n",
"The create_gateway.py script automatically saves the Gateway ARN to SSM.\n",
"Run this cell to verify the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify Gateway configuration in SSM\n",
"print(\"Verifying Gateway configuration in SSM...\\n\")\n",
"\n",
"parameters_to_check = [\n",
" '/app/lakehouse-agent/gateway-arn',\n",
" '/app/lakehouse-agent/gateway-id',\n",
" '/app/lakehouse-agent/gateway-url',\n",
"]\n",
"\n",
"all_found = True\n",
"for param_name in parameters_to_check:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name)\n",
" value = response['Parameter']['Value']\n",
" print(f'✅ {param_name}')\n",
" print(f' Value: {value}')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f'❌ {param_name} - NOT FOUND')\n",
" all_found = False\n",
" except Exception as e:\n",
" print(f'⚠️ {param_name} - ERROR: {e}')\n",
" all_found = False\n",
"\n",
"if all_found:\n",
" print('\\n✅ Gateway configuration verified in SSM!')\n",
"else:\n",
" print('\\n⚠️ Gateway parameters missing.')\n",
" print(' The create_gateway.py script should have saved these automatically.')\n",
" print(' Check the deployment output for errors.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"✅ **Gateway & Interceptor Deployment Complete!**\n",
"\n",
"**What was created:**\n",
"- Interceptor Lambda (JWT validation)\n",
"- AgentCore Gateway (routing)\n",
"\n",
"All configuration saved to SSM Parameter Store.\n",
"\n",
"**Next Steps:**\n",
"Run **05-deploy-agent.ipynb**"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,184 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy Lakehouse Agent\n",
"\n",
"Deploy the Lakehouse Agent to AgentCore Runtime."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- ✅ Run `04-deploy-gateway.ipynb` first\n",
"- ✅ Gateway ARN saved to SSM\n",
"- ✅ Docker installed and running\n",
"\n",
"## What This Notebook Does\n",
"\n",
"1. Deploys Lakehouse Agent to AgentCore Runtime\n",
"2. Configures Gateway integration\n",
"3. Saves Agent Runtime ARN to SSM\n",
"\n",
"## Next Notebook\n",
"\n",
"- **06-streamlit-ui-deployment.ipynb**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"from pathlib import Path\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"\n",
"# Load Gateway ARN from SSM\n",
"try:\n",
" gateway_arn = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/gateway-arn'\n",
" )['Parameter']['Value']\n",
" print('✅ Ready to proceed with AWS operations')\n",
" print(f' Account ID: {account_id}')\n",
" print(f' Region: {region}')\n",
" print(f' Gateway ARN: {gateway_arn}')\n",
"except ssm_client.exceptions.ParameterNotFound:\n",
" print('✅ Ready to proceed with AWS operations')\n",
" print(f' Account ID: {account_id}')\n",
" print(f' Region: {region}')\n",
" print('❌ Gateway ARN not found in SSM')\n",
" print(' Please run 04-deploy-gateway.ipynb first')\n",
" gateway_arn = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Deploy Lakehouse Agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"\n",
"# Run deploy_lakehouse_agent.py with --yes flag to skip interactive prompts\n",
"result = subprocess.run(\n",
" ['python', 'deploy_lakehouse_agent.py', '--yes'],\n",
" cwd='deployment/lakehouse-agent',\n",
" capture_output=True,\n",
" text=True\n",
")\n",
"\n",
"print(result.stdout)\n",
"if result.returncode != 0:\n",
" print('❌ Error:', result.stderr)\n",
"else:\n",
" print('\\n✅ Lakehouse Agent deployed!')\n",
" print('\\n📋 Configuration automatically saved to SSM')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Verify Agent Deployment\n",
"\n",
"The deploy_lakehouse_agent.py script automatically saves the Runtime ARN to SSM.\n",
"Run this cell to verify the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify Agent configuration in SSM\n",
"print(\"Verifying Agent Runtime configuration in SSM...\\n\")\n",
"\n",
"parameters_to_check = [\n",
" '/app/lakehouse-agent/agent-runtime-arn',\n",
" '/app/lakehouse-agent/agent-runtime-id',\n",
" '/app/lakehouse-agent/agent-name',\n",
"]\n",
"\n",
"all_found = True\n",
"for param_name in parameters_to_check:\n",
" try:\n",
" response = ssm_client.get_parameter(Name=param_name)\n",
" value = response['Parameter']['Value']\n",
" print(f'✅ {param_name}')\n",
" print(f' Value: {value}')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print(f'⚠️ {param_name} - NOT FOUND (optional)')\n",
" if 'agent-runtime-arn' in param_name:\n",
" all_found = False\n",
" except Exception as e:\n",
" print(f'⚠️ {param_name} - ERROR: {e}')\n",
"\n",
"if all_found:\n",
" print('\\n✅ Agent Runtime configuration verified in SSM!')\n",
"else:\n",
" print('\\n❌ Agent Runtime ARN missing in SSM.')\n",
" print(' The deploy_lakehouse_agent.py script should have saved this automatically.')\n",
" print(' Check the deployment output for errors.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"✅ **Lakehouse Agent Deployment Complete!**\n",
"\n",
"The Agent Runtime has been deployed and configuration saved to SSM Parameter Store.\n",
"\n",
"**Next Steps:**\n",
"Run **06-streamlit-ui-deployment.ipynb** to test the complete system"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,395 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test Deployment and run Streamlit App\n",
"\n",
"Test the complete lakehouse agent system end-to-end."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- ✅ Run `05-deploy-agent.ipynb` first\n",
"- ✅ All components deployed\n",
"\n",
"## What This Notebook Does\n",
"\n",
"1. Tests OAuth token generation from Cognito\n",
"2. Tests agent invocation with bearer token\n",
"3. Validates end-to-end flow (User → Agent → Gateway → MCP)\n",
"4. Verifies agent responses with conversational AI\n",
"5. Launches Streamlit UI for interactive testing\n",
"\n",
"## Important Notes\n",
"\n",
"⚠️ **Run cells in order**: Start with the Setup cell (cell 2) to initialize AWS session and clients before running other cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ============================================================================\n",
"# SETUP CELL - Run this first to initialize AWS session and clients\n",
"# ============================================================================\n",
"\n",
"# AWS Initialization - Load credentials and create session\n",
"from utils.notebook_init import init_aws\n",
"from pathlib import Path\n",
"import json\n",
"import base64\n",
"import requests\n",
"import uuid\n",
"import urllib.parse\n",
"\n",
"# This will:\n",
"# 1. Load credentials from .env file (if it exists)\n",
"# 2. Create and validate AWS session (env vars take precedence over SSO)\n",
"# 3. Return session, region, and account_id for use in this notebook\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"\n",
"print('✅ Ready to proceed with AWS operations')\n",
"print(f' Account ID: {account_id}')\n",
"print(f' Region: {region}')\n",
"print('\\n📝 Architecture: User → Agent Runtime → Gateway → MCP Server')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Get OAuth Token from Cognito"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import base64 # Import here for cell independence\n",
"import requests\n",
"import json\n",
"\n",
"# Get Cognito configuration from SSM\n",
"COGNITO_DOMAIN = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/cognito-domain'\n",
")['Parameter']['Value']\n",
"\n",
"CLIENT_ID = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/cognito-app-client-id'\n",
")['Parameter']['Value']\n",
"\n",
"CLIENT_SECRET = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/cognito-app-client-secret',\n",
" WithDecryption=True\n",
")['Parameter']['Value']\n",
"\n",
"print(f'🔐 Cognito Configuration:')\n",
"print(f' Domain: {COGNITO_DOMAIN}')\n",
"print(f' Client ID: {CLIENT_ID}')\n",
"\n",
"# Request token\n",
"token_url = f'{COGNITO_DOMAIN}/oauth2/token'\n",
"credentials = f'{CLIENT_ID}:{CLIENT_SECRET}'\n",
"encoded_credentials = base64.b64encode(credentials.encode()).decode()\n",
"\n",
"headers = {\n",
" 'Authorization': f'Basic {encoded_credentials}',\n",
" 'Content-Type': 'application/x-www-form-urlencoded'\n",
"}\n",
"\n",
"data = {\n",
" 'grant_type': 'client_credentials',\n",
" 'scope': 'lakehouse-api/claims.query'\n",
"}\n",
"\n",
"print('\\n🔑 Requesting OAuth token...')\n",
"response = requests.post(token_url, headers=headers, data=data)\n",
"\n",
"if response.status_code == 200:\n",
" token_data = response.json()\n",
" ACCESS_TOKEN = token_data['access_token']\n",
" print('✅ OAuth token obtained successfully!')\n",
" print(f' Token type: {token_data.get(\"token_type\")}')\n",
" print(f' Expires in: {token_data.get(\"expires_in\")} seconds')\n",
"else:\n",
" print(f'❌ Failed to get token: {response.status_code}')\n",
" print(response.text)\n",
" ACCESS_TOKEN = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Test Agent Invocation\n",
"\n",
"**Architecture Flow:**\n",
"1. User → Agent Runtime (OAuth token in Authorization header for JWT validation)\n",
"2. Agent receives token in payload (JWT authorizer consumes header, doesn't pass through)\n",
"3. Agent → Gateway (passes token from payload)\n",
"4. Gateway → MCP Server (with user context)\n",
"\n",
"**Note:** The bearer token must be passed in BOTH the Authorization header (for JWT validation) AND the payload (for the agent code to use when calling Gateway). This is because the JWT authorizer consumes the Authorization header and doesn't pass it through to the agent code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import urllib.parse # Import here for cell independence\n",
"import uuid\n",
"import json\n",
"import requests\n",
"\n",
"if ACCESS_TOKEN:\n",
" # Get Agent Runtime ARN from SSM\n",
" try:\n",
" AGENT_RUNTIME_ARN = ssm_client.get_parameter(\n",
" Name='/app/lakehouse-agent/agent-runtime-arn'\n",
" )['Parameter']['Value']\n",
" \n",
" print(f'🤖 Agent Runtime Configuration:')\n",
" print(f' Runtime ARN: {AGENT_RUNTIME_ARN}')\n",
" print(f' Region: {region}')\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" print('❌ Agent Runtime ARN not found in SSM')\n",
" print(' Please run 05-deploy-agent.ipynb first')\n",
" AGENT_RUNTIME_ARN = None\n",
" \n",
" if AGENT_RUNTIME_ARN:\n",
" # Construct the AgentCore Runtime invocation URL\n",
" # URL encode the agent ARN\n",
" escaped_agent_arn = urllib.parse.quote(AGENT_RUNTIME_ARN, safe='')\n",
" AGENT_RUNTIME_URL = f\"https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{escaped_agent_arn}/invocations?qualifier=DEFAULT\"\n",
" \n",
" print(f' Runtime URL: {AGENT_RUNTIME_URL}')\n",
" \n",
" # Generate session ID for this invocation\n",
" session_id = f\"test-session-{uuid.uuid4()}\"\n",
" \n",
" # Prepare payload with bearer token for Gateway calls\n",
" # Note: Token must be in BOTH header (for JWT auth) and payload (for agent to use)\n",
" payload = {\n",
" 'prompt': 'Show me all my claims',\n",
" 'bearer_token': ACCESS_TOKEN # Pass token in payload for agent to use with Gateway\n",
" }\n",
" \n",
" # Prepare headers with OAuth token and session ID\n",
" headers = {\n",
" \"Authorization\": f\"Bearer {ACCESS_TOKEN}\",\n",
" \"Content-Type\": \"application/json\",\n",
" \"X-Amzn-Bedrock-AgentCore-Runtime-Session-Id\": session_id\n",
" }\n",
" \n",
" print(f'\\n🚀 Invoking Agent Runtime...')\n",
" print(f' Prompt: {payload[\"prompt\"]}')\n",
" print(f' Session ID: {session_id}')\n",
" print(f' Auth: Bearer token in header (for JWT validation) and payload (for Gateway)')\n",
" \n",
" try:\n",
" # Call the agent runtime\n",
" response = requests.post(\n",
" AGENT_RUNTIME_URL,\n",
" headers=headers,\n",
" data=json.dumps(payload),\n",
" timeout=60\n",
" )\n",
" \n",
" print(f'\\n📊 Response Status: {response.status_code}')\n",
" \n",
" if response.status_code == 200:\n",
" try:\n",
" result = response.json()\n",
" print(f'\\n✅ Agent Response:')\n",
" print(json.dumps(result, indent=2))\n",
" \n",
" # Display the content if available\n",
" if 'content' in result:\n",
" print(f'\\n📝 Agent Output:')\n",
" print(result['content'])\n",
" \n",
" if 'tool_calls' in result:\n",
" print(f'\\n🔧 Tool Calls: {result[\"tool_calls\"]}')\n",
" \n",
" except json.JSONDecodeError:\n",
" print(f'Response: {response.text[:500]}')\n",
" \n",
" elif response.status_code == 401:\n",
" print(f'❌ Unauthorized - OAuth token validation failed')\n",
" print(f' Check that:')\n",
" print(f' 1. Agent Runtime has JWT authorizer configured')\n",
" print(f' 2. Client ID matches the allowed clients')\n",
" print(f' 3. Token has not expired')\n",
" print(f'\\n Response: {response.text[:500]}')\n",
" \n",
" elif response.status_code == 403:\n",
" print(f'❌ Forbidden - User not authorized')\n",
" print(f' Response: {response.text[:500]}')\n",
" \n",
" elif response.status_code == 424:\n",
" print(f'❌ Failed Dependency - Runtime returned 500 error')\n",
" print(f' Response: {response.text[:500]}')\n",
" print(f'\\n This means the agent code is crashing.')\n",
" print(f' Common causes:')\n",
" print(f' 1. Missing Gateway ARN in SSM (/app/lakehouse-agent/gateway-arn)')\n",
" print(f' 2. Agent runtime IAM role lacks SSM permissions')\n",
" print(f' 3. Agent runtime IAM role lacks bedrock-agentcore-control:GetGateway permission')\n",
" print(f' 4. Bearer token not being passed correctly')\n",
" print(f'\\n 👉 Run the cells below to diagnose:')\n",
" print(f' - \"Verify Configuration\" cell to check SSM parameters')\n",
" print(f' - \"Check CloudWatch Logs\" cell to see agent error logs')\n",
" \n",
" else:\n",
" print(f'❌ Request failed')\n",
" print(f' Response: {response.text[:500]}')\n",
" \n",
" except requests.exceptions.Timeout:\n",
" print(f'\\n❌ Request timed out after 60 seconds')\n",
" print(f' Check CloudWatch logs:')\n",
" print(f' - Agent Runtime: /aws/bedrock-agentcore/runtime/{AGENT_RUNTIME_ARN.split(\"/\")[-1]}')\n",
" print(f' - Gateway Interceptor: /aws/lambda/lakehouse-gateway-interceptor')\n",
" \n",
" except Exception as e:\n",
" print(f'\\n❌ Error: {e}')\n",
" import traceback\n",
" traceback.print_exc()\n",
"else:\n",
" print('⚠️ Skipping agent test - no access token')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Launch Streamlit UI\n",
"\n",
"Launch the interactive Streamlit UI for conversational testing with the agent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"print('🚀 Launching Streamlit UI...')\n",
"print('\\n📝 Instructions:')\n",
"print(' - Streamlit will open in your browser automatically')\n",
"print(' - Login with: user001@example.com / TempPass123!')\n",
"print(' - Try queries like: \"Show me all claims\" or \"Get claims summary\"')\n",
"print(' - Press Ctrl+C in the terminal to stop Streamlit')\n",
"print('\\n⏳ Starting Streamlit server...')\n",
"\n",
"# Change to streamlit-ui directory and run streamlit\n",
"try:\n",
" streamlit_dir = os.path.join(os.getcwd(), 'streamlit-ui')\n",
" subprocess.run(\n",
" ['streamlit', 'run', 'streamlit_app.py'],\n",
" cwd=streamlit_dir,\n",
" check=True\n",
" )\n",
"except KeyboardInterrupt:\n",
" print('\\n\\n✅ Streamlit stopped')\n",
"except FileNotFoundError:\n",
" print('\\n❌ streamlit-ui directory or streamlit_app.py not found')\n",
" print(' Make sure you are running this from the lakehouse-agent directory')\n",
"except Exception as e:\n",
" print(f'\\n❌ Error launching Streamlit: {e}')\n",
" print('\\n💡 Manual launch:')\n",
" print(' cd streamlit-ui')\n",
" print(' streamlit run streamlit_app.py')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"✅ **Testing Complete!**\n",
"\n",
"**Architecture Validated:**\n",
"```\n",
"User (with OAuth token)\n",
" ↓\n",
"AgentCore Runtime (Lakehouse Agent)\n",
" ├─ Validates user OAuth token (JWT authorizer)\n",
" ├─ Extracts token from Authorization header\n",
" ↓\n",
"AgentCore Gateway\n",
" ├─ Receives bearer token from agent\n",
" ├─ Interceptor Lambda validates token\n",
" ├─ Adds user identity (X-User-Principal header)\n",
" ↓\n",
"MCP Athena Server\n",
" └─ Executes queries with user context\n",
"```\n",
"\n",
"**What was tested:**\n",
"- User OAuth token generation from Cognito\n",
"- Agent runtime invocation with bearer token in header\n",
"- Agent → Gateway → MCP Server flow\n",
"- User identity propagation through headers\n",
"- Interactive Streamlit UI for conversational testing\n",
"\n",
"**Additional Testing:**\n",
"\n",
"1. **Test with different users:**\n",
" - user001@example.com / TempPass123!\n",
" - user002@example.com / TempPass123!\n",
"\n",
"2. **Verify User Context:**\n",
" - Check that Gateway interceptor extracts user identity\n",
" - Verify X-User-Principal header is added to MCP requests\n",
" - Confirm user identity appears in CloudWatch logs\n",
"\n",
"**Troubleshooting:**\n",
"- Check CloudWatch logs for:\n",
" - Agent Runtime logs: `/aws/bedrock-agentcore/runtime/<runtime-id>`\n",
" - Gateway Interceptor logs: `/aws/lambda/lakehouse-gateway-interceptor`\n",
" - Look for: \"Bearer token extracted\", \"User: <email>\", \"Request authorized\"\n",
"- Verify SSM parameters are set correctly\n",
"- Ensure all components are deployed in correct order"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@@ -0,0 +1,796 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lakehouse Agent - Optional Cleanup\n",
"\n",
"This notebook helps you clean up all AWS resources created by notebooks 00-06.\n",
"\n",
"**⚠️ WARNING: This will delete all resources created during deployment!**\n",
"\n",
"**What this notebook does:**\n",
"- Deletes Agent Runtime (from notebook 05)\n",
"- Deletes Gateway, Targets, OAuth Providers, and IAM Roles (from notebook 04)\n",
"- Deletes Interceptor Lambda (from notebook 04)\n",
"- Deletes MCP Server Runtime (from notebook 03)\n",
"- Deletes Cognito User Pool and users (from notebook 02)\n",
"- Deletes Athena database and tables (from notebook 01)\n",
"- Optionally deletes S3 bucket and data (from notebook 00)\n",
"- Deletes SSM parameters\n",
"- Deletes local configuration files\n",
"\n",
"**Prerequisites:**\n",
"- AWS credentials configured\n",
"- Python 3.10 or later\n",
"- boto3 installed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json\n",
"import time\n",
"from datetime import datetime\n",
"\n",
"print(\"✅ Imports successful\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize AWS Session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load AWS credentials and initialize session\n",
"from utils.notebook_init import init_aws\n",
"\n",
"session, region, account_id = init_aws()\n",
"\n",
"# Initialize AWS clients\n",
"ssm_client = session.client('ssm', region_name=region)\n",
"s3_client = session.client('s3', region_name=region)\n",
"athena_client = session.client('athena', region_name=region)\n",
"glue_client = session.client('glue', region_name=region)\n",
"cognito_client = session.client('cognito-idp', region_name=region)\n",
"lambda_client = session.client('lambda', region_name=region)\n",
"iam_client = session.client('iam', region_name=region)\n",
"bedrock_agent_client = session.client('bedrock-agentcore-control', region_name=region)\n",
"\n",
"print(f'\\n✅ AWS Session initialized')\n",
"print(f' Account ID: {account_id}')\n",
"print(f' Region: {region}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Load Configuration from SSM\n",
"\n",
"Load all configuration parameters to identify resources to delete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_ssm_parameter(name, default=None):\n",
" \"\"\"Get SSM parameter value, return default if not found\"\"\"\n",
" try:\n",
" response = ssm_client.get_parameter(Name=name)\n",
" return response['Parameter']['Value']\n",
" except ssm_client.exceptions.ParameterNotFound:\n",
" return default\n",
" except Exception as e:\n",
" print(f\"⚠️ Error reading {name}: {e}\")\n",
" return default\n",
"\n",
"print(\"📋 Loading configuration from SSM Parameter Store...\\n\")\n",
"\n",
"# Load all configuration\n",
"config = {\n",
" 'agent_runtime_arn': get_ssm_parameter('/app/lakehouse-agent/agent-runtime-arn'),\n",
" 'agent_runtime_id': get_ssm_parameter('/app/lakehouse-agent/agent-runtime-id'),\n",
" 'gateway_arn': get_ssm_parameter('/app/lakehouse-agent/gateway-arn'),\n",
" 'gateway_id': get_ssm_parameter('/app/lakehouse-agent/gateway-id'),\n",
" 'interceptor_lambda_arn': get_ssm_parameter('/app/lakehouse-agent/interceptor-lambda-arn'),\n",
" 'mcp_server_runtime_arn': get_ssm_parameter('/app/lakehouse-agent/mcp-server-runtime-arn'),\n",
" 'mcp_server_runtime_id': get_ssm_parameter('/app/lakehouse-agent/mcp-server-runtime-id'),\n",
" 'cognito_user_pool_id': get_ssm_parameter('/app/lakehouse-agent/cognito-user-pool-id'),\n",
" 'cognito_domain': get_ssm_parameter('/app/lakehouse-agent/cognito-domain'),\n",
" 'database_name': get_ssm_parameter('/app/lakehouse-agent/database-name'),\n",
" 's3_bucket_name': get_ssm_parameter('/app/lakehouse-agent/s3-bucket-name'),\n",
"}\n",
"\n",
"# Display configuration\n",
"print(\"Resources found:\")\n",
"for key, value in config.items():\n",
" if value:\n",
" display_value = value[:60] + '...' if len(value) > 60 else value\n",
" print(f\" ✅ {key}: {display_value}\")\n",
" else:\n",
" print(f\" ⏭️ {key}: Not found\")\n",
"\n",
"# Count resources\n",
"resource_count = sum(1 for v in config.values() if v)\n",
"print(f\"\\n📊 Found {resource_count} resources to clean up\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Delete Agent Runtime (from notebook 05)\n",
"\n",
"Delete the Lakehouse Agent Runtime."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting Agent Runtime...\\n\")\n",
"\n",
"if config['agent_runtime_id']:\n",
" try:\n",
" bedrock_agent_client.delete_agent_runtime(\n",
" agentRuntimeId=config['agent_runtime_id']\n",
" )\n",
" print(f\"✅ Deleted Agent Runtime: {config['agent_runtime_id']}\")\n",
" print(\" Waiting for deletion to complete...\")\n",
" time.sleep(10)\n",
" except bedrock_agent_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ Agent Runtime not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting Agent Runtime: {e}\")\n",
"else:\n",
" print(\"⏭️ No Agent Runtime found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Delete Gateway (from notebook 04)\n",
"\n",
"Delete the AgentCore Gateway."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting Gateway...\\n\")\n",
"\n",
"if config['gateway_id']:\n",
" try:\n",
" # First, delete all gateway targets\n",
" print(f\"Listing targets for gateway: {config['gateway_id']}\")\n",
" try:\n",
" list_response = bedrock_agent_client.list_gateway_targets(\n",
" gatewayIdentifier=config['gateway_id']\n",
" )\n",
" \n",
" targets = list_response.get('items', [])\n",
" if targets:\n",
" print(f\" Found {len(targets)} target(s) to delete\\n\")\n",
" for target in targets:\n",
" target_id = target['targetId']\n",
" target_name = target.get('name', 'unnamed')\n",
" try:\n",
" bedrock_agent_client.delete_gateway_target(\n",
" gatewayIdentifier=config['gateway_id'],\n",
" targetId=target_id\n",
" )\n",
" print(f\" ✅ Deleted target: {target_name} ({target_id})\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Could not delete target {target_name}: {e}\")\n",
" else:\n",
" print(\" No targets found\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Could not list targets: {e}\")\n",
"\n",
" # Wait for targets to finish deleting\n",
" if targets:\n",
" print(\"\\n⏳ Waiting for targets to finish deleting...\")\n",
" max_attempts = 12 # 12 attempts * 5 seconds = 60 seconds max\n",
" for attempt in range(max_attempts):\n",
" try:\n",
" list_response = bedrock_agent_client.list_gateway_targets(\n",
" gatewayIdentifier=config['gateway_id']\n",
" )\n",
" remaining_targets = list_response.get('items', [])\n",
"\n",
" if not remaining_targets:\n",
" print(\" ✅ All targets deleted successfully\")\n",
" break\n",
"\n",
" print(f\" Still {len(remaining_targets)} target(s) remaining... (attempt {attempt+1}/{max_attempts})\")\n",
" time.sleep(5)\n",
" except bedrock_agent_client.exceptions.ResourceNotFoundException:\n",
" print(\" ✅ Gateway already deleted during target cleanup\")\n",
" break\n",
" except Exception as e:\n",
" print(f\" ⚠️ Error checking targets: {e}\")\n",
" break\n",
" else:\n",
" print(\" ⚠️ Timeout waiting for targets to delete, proceeding anyway...\")\n",
"\n",
" # Now delete the gateway\n",
" print(f\"\\nDeleting gateway: {config['gateway_id']}\")\n",
" bedrock_agent_client.delete_gateway(\n",
" gatewayIdentifier=config['gateway_id']\n",
" )\n",
" print(f\"✅ Deleted Gateway: {config['gateway_id']}\")\n",
" print(\" Waiting for deletion to complete...\")\n",
" time.sleep(10)\n",
" except bedrock_agent_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ Gateway not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting Gateway: {e}\")\n",
"else:\n",
" print(\"⏭️ No Gateway found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3.5: Delete OAuth2 Credential Providers\n",
"\n",
"Delete OAuth2 credential providers created for Gateway-to-Runtime authentication.\n",
"\n",
"**Important:** These providers store Cognito client credentials and must be deleted to prevent stale credentials from being reused."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting OAuth2 Credential Providers...\\n\")\n",
"\n",
"try:\n",
" # List all OAuth2 credential providers\n",
" response = bedrock_agent_client.list_oauth2_credential_providers()\n",
" \n",
" # Try different possible key names for the list\n",
" providers = response.get('oauth2CredentialProviders', \n",
" response.get('credentialProviders', \n",
" response.get('items', [])))\n",
" \n",
" if providers:\n",
" deleted_count = 0\n",
" for provider in providers:\n",
" provider_name = provider.get('name', 'unknown')\n",
" \n",
" # Only delete lakehouse-related providers\n",
" if 'lakehouse' in provider_name.lower():\n",
" try:\n",
" # Delete using the provider name\n",
" bedrock_agent_client.delete_oauth2_credential_provider(\n",
" name=provider_name\n",
" )\n",
" print(f\" ✅ Deleted OAuth provider: {provider_name}\")\n",
" deleted_count += 1\n",
" except Exception as e:\n",
" print(f\" ⚠️ Could not delete provider {provider_name}: {e}\")\n",
" \n",
" if deleted_count > 0:\n",
" print(f\"\\n✅ Deleted {deleted_count} OAuth2 provider(s)\")\n",
" else:\n",
" print(\"\\n⏭️ No lakehouse-related OAuth providers found\")\n",
" else:\n",
" print(\"⏭️ No OAuth2 providers found\")\n",
" \n",
"except Exception as e:\n",
" print(f\"❌ Error listing OAuth2 providers: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3.6: Delete IAM Roles\n",
"\n",
"Delete IAM roles created for the Gateway."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting IAM Roles...\\n\")\n",
"\n",
"# IAM role name for gateway\n",
"role_name = 'agentcore-lakehouse-gateway-role'\n",
"\n",
"try:\n",
" # Check if role exists\n",
" iam_client.get_role(RoleName=role_name)\n",
" \n",
" print(f\"Found IAM role: {role_name}\")\n",
" \n",
" # Delete inline policies\n",
" try:\n",
" policy_names = iam_client.list_role_policies(RoleName=role_name)['PolicyNames']\n",
" for policy_name in policy_names:\n",
" iam_client.delete_role_policy(RoleName=role_name, PolicyName=policy_name)\n",
" print(f\" ✅ Deleted inline policy: {policy_name}\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Error deleting inline policies: {e}\")\n",
" \n",
" # Detach managed policies\n",
" try:\n",
" attached_policies = iam_client.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']\n",
" for policy in attached_policies:\n",
" iam_client.detach_role_policy(RoleName=role_name, PolicyArn=policy['PolicyArn'])\n",
" print(f\" ✅ Detached managed policy: {policy['PolicyName']}\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Error detaching managed policies: {e}\")\n",
" \n",
" # Remove from instance profiles\n",
" try:\n",
" instance_profiles = iam_client.list_instance_profiles_for_role(RoleName=role_name)['InstanceProfiles']\n",
" for profile in instance_profiles:\n",
" iam_client.remove_role_from_instance_profile(\n",
" InstanceProfileName=profile['InstanceProfileName'],\n",
" RoleName=role_name\n",
" )\n",
" print(f\" ✅ Removed from instance profile: {profile['InstanceProfileName']}\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Error removing from instance profiles: {e}\")\n",
" \n",
" # Delete the role\n",
" iam_client.delete_role(RoleName=role_name)\n",
" print(f\"\\n✅ Deleted IAM role: {role_name}\")\n",
" \n",
"except iam_client.exceptions.NoSuchEntityException:\n",
" print(f\"⏭️ IAM role not found: {role_name}\")\n",
"except Exception as e:\n",
" print(f\"❌ Error deleting IAM role: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Delete Interceptor Lambda (from notebook 04)\n",
"\n",
"Delete the Gateway Interceptor Lambda function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting Interceptor Lambda...\\n\")\n",
"\n",
"if config['interceptor_lambda_arn']:\n",
" function_name = config['interceptor_lambda_arn'].split(':')[-1]\n",
" try:\n",
" lambda_client.delete_function(FunctionName=function_name)\n",
" print(f\"✅ Deleted Lambda function: {function_name}\")\n",
" except lambda_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ Lambda function not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting Lambda function: {e}\")\n",
"else:\n",
" print(\"⏭️ No Interceptor Lambda found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Delete MCP Server Runtime (from notebook 03)\n",
"\n",
"Delete the MCP Server Runtime."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting MCP Server Runtime...\\n\")\n",
"\n",
"if config['mcp_server_runtime_id']:\n",
" try:\n",
" bedrock_agent_client.delete_agent_runtime(\n",
" agentRuntimeId=config['mcp_server_runtime_id']\n",
" )\n",
" print(f\"✅ Deleted MCP Server Runtime: {config['mcp_server_runtime_id']}\")\n",
" print(\" Waiting for deletion to complete...\")\n",
" time.sleep(10)\n",
" except bedrock_agent_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ MCP Server Runtime not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting MCP Server Runtime: {e}\")\n",
"else:\n",
" print(\"⏭️ No MCP Server Runtime found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Delete Local Configuration Files\n",
"\n",
"Delete `.bedrock_agentcore.yaml` configuration files created during runtime deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pathlib import Path\n",
"\n",
"print(\"🗑️ Deleting local configuration files...\\n\")\n",
"\n",
"# Get notebook directory to build relative paths\n",
"notebook_dir = Path.cwd()\n",
"\n",
"# Files to clean up\n",
"config_files = [\n",
" notebook_dir / \"deployment\" / \"mcp-lakehouse-server\" / \".bedrock_agentcore.yaml\",\n",
" notebook_dir / \"deployment\" / \"lakehouse-agent\" / \".bedrock_agentcore.yaml\"\n",
"]\n",
"\n",
"deleted_count = 0\n",
"for file_path in config_files:\n",
" if file_path.exists():\n",
" try:\n",
" file_path.unlink()\n",
" print(f\"✅ Deleted: {file_path.relative_to(notebook_dir)}\")\n",
" deleted_count += 1\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting {file_path.relative_to(notebook_dir)}: {e}\")\n",
" else:\n",
" print(f\"⏭️ Not found: {file_path.relative_to(notebook_dir)}\")\n",
"\n",
"if deleted_count > 0:\n",
" print(f\"\\n✅ Deleted {deleted_count} configuration file(s)\")\n",
"else:\n",
" print(\"\\n⏭️ No configuration files to delete\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Delete Cognito Resources (from notebook 02)\n",
"\n",
"Delete Cognito User Pool, domain, and all users."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting Cognito resources...\\n\")\n",
"\n",
"if config['cognito_user_pool_id']:\n",
" try:\n",
" # First, try to get the user pool to check if domain exists\n",
" try:\n",
" pool_info = cognito_client.describe_user_pool(\n",
" UserPoolId=config['cognito_user_pool_id']\n",
" )\n",
" pool_domain = pool_info['UserPool'].get('Domain')\n",
" \n",
" # If domain exists in the pool, delete it\n",
" if pool_domain:\n",
" print(f\"Found domain in user pool: {pool_domain}\")\n",
" try:\n",
" cognito_client.delete_user_pool_domain(\n",
" Domain=pool_domain,\n",
" UserPoolId=config['cognito_user_pool_id']\n",
" )\n",
" print(f\"✅ Deleted Cognito domain: {pool_domain}\")\n",
" time.sleep(5) # Wait for domain deletion to complete\n",
" except cognito_client.exceptions.InvalidParameterException as e:\n",
" if \"No such domain\" in str(e):\n",
" print(f\"⏭️ Domain already deleted\")\n",
" else:\n",
" print(f\"⚠️ Could not delete domain: {e}\")\n",
" except Exception as e:\n",
" print(f\"⚠️ Could not delete domain: {e}\")\n",
" else:\n",
" print(\"No domain configured for this user pool\")\n",
" except cognito_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ User pool not found, skipping domain deletion\")\n",
" \n",
" # Now delete the user pool\n",
" cognito_client.delete_user_pool(\n",
" UserPoolId=config['cognito_user_pool_id']\n",
" )\n",
" print(f\"✅ Deleted Cognito User Pool: {config['cognito_user_pool_id']}\")\n",
" except cognito_client.exceptions.ResourceNotFoundException:\n",
" print(f\"⏭️ Cognito User Pool not found (may have been deleted already)\")\n",
" except cognito_client.exceptions.InvalidParameterException as e:\n",
" if \"domain configured\" in str(e):\n",
" print(f\"❌ User pool still has a domain. Trying to find and delete it...\")\n",
" # Try with the domain from config\n",
" if config.get('cognito_domain'):\n",
" try:\n",
" cognito_client.delete_user_pool_domain(\n",
" Domain=config['cognito_domain'],\n",
" UserPoolId=config['cognito_user_pool_id']\n",
" )\n",
" print(f\"✅ Deleted domain from config: {config['cognito_domain']}\")\n",
" time.sleep(5)\n",
" # Try deleting user pool again\n",
" cognito_client.delete_user_pool(\n",
" UserPoolId=config['cognito_user_pool_id']\n",
" )\n",
" print(f\"✅ Deleted Cognito User Pool: {config['cognito_user_pool_id']}\")\n",
" except Exception as e2:\n",
" print(f\"❌ Still could not delete user pool: {e2}\")\n",
" else:\n",
" print(f\"❌ Error deleting Cognito User Pool: {e}\")\n",
" else:\n",
" print(f\"❌ Error deleting Cognito User Pool: {e}\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting Cognito User Pool: {e}\")\n",
"else:\n",
" print(\"⏭️ No Cognito User Pool found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 7: Delete Athena Database and Tables (from notebook 01)\n",
"\n",
"Delete Athena database and all tables."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting Athena database and tables...\\n\")\n",
"\n",
"if config['database_name']:\n",
" try:\n",
" # Get all tables in the database\n",
" response = glue_client.get_tables(DatabaseName=config['database_name'])\n",
" tables = response['TableList']\n",
" \n",
" # Delete each table\n",
" for table in tables:\n",
" table_name = table['Name']\n",
" try:\n",
" glue_client.delete_table(\n",
" DatabaseName=config['database_name'],\n",
" Name=table_name\n",
" )\n",
" print(f\" ✅ Deleted table: {table_name}\")\n",
" except Exception as e:\n",
" print(f\" ⚠️ Could not delete table {table_name}: {e}\")\n",
" \n",
" # Delete the database\n",
" glue_client.delete_database(Name=config['database_name'])\n",
" print(f\"\\n✅ Deleted database: {config['database_name']}\")\n",
" except glue_client.exceptions.EntityNotFoundException:\n",
" print(f\"⏭️ Database not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting database: {e}\")\n",
"else:\n",
" print(\"⏭️ No Athena database found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 8: Delete S3 Bucket (from notebook 00)\n",
"\n",
"**⚠️ WARNING: This will permanently delete all data in the S3 bucket!**\n",
"\n",
"Set `DELETE_S3_BUCKET = True` to enable S3 bucket deletion."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set to True to delete the S3 bucket and all its contents\n",
"DELETE_S3_BUCKET = False # Change to True to enable deletion\n",
"\n",
"print(\"🗑️ S3 Bucket cleanup...\\n\")\n",
"\n",
"if not DELETE_S3_BUCKET:\n",
" print(\"⏭️ S3 bucket deletion is DISABLED\")\n",
" print(\" Set DELETE_S3_BUCKET = True to enable\")\n",
" print(f\" Bucket: {config['s3_bucket_name']}\")\n",
"elif config['s3_bucket_name']:\n",
" try:\n",
" bucket_name = config['s3_bucket_name']\n",
" \n",
" # List and delete all objects\n",
" print(f\"Deleting all objects in bucket: {bucket_name}\")\n",
" paginator = s3_client.get_paginator('list_objects_v2')\n",
" pages = paginator.paginate(Bucket=bucket_name)\n",
" \n",
" delete_count = 0\n",
" for page in pages:\n",
" if 'Contents' in page:\n",
" objects = [{'Key': obj['Key']} for obj in page['Contents']]\n",
" s3_client.delete_objects(\n",
" Bucket=bucket_name,\n",
" Delete={'Objects': objects}\n",
" )\n",
" delete_count += len(objects)\n",
" \n",
" print(f\" ✅ Deleted {delete_count} objects\")\n",
" \n",
" # Delete the bucket\n",
" s3_client.delete_bucket(Bucket=bucket_name)\n",
" print(f\"\\n✅ Deleted S3 bucket: {bucket_name}\")\n",
" except s3_client.exceptions.NoSuchBucket:\n",
" print(f\"⏭️ S3 bucket not found (may have been deleted already)\")\n",
" except Exception as e:\n",
" print(f\"❌ Error deleting S3 bucket: {e}\")\n",
"else:\n",
" print(\"⏭️ No S3 bucket found in configuration\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 9: Delete SSM Parameters\n",
"\n",
"Delete all SSM parameters created during deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"🗑️ Deleting SSM parameters...\\n\")\n",
"\n",
"# Get all parameters with the lakehouse-agent prefix\n",
"try:\n",
" paginator = ssm_client.get_paginator('describe_parameters')\n",
" pages = paginator.paginate(\n",
" ParameterFilters=[\n",
" {\n",
" 'Key': 'Name',\n",
" 'Option': 'BeginsWith',\n",
" 'Values': ['/app/lakehouse-agent/']\n",
" }\n",
" ]\n",
" )\n",
" \n",
" parameters_to_delete = []\n",
" for page in pages:\n",
" for param in page['Parameters']:\n",
" parameters_to_delete.append(param['Name'])\n",
" \n",
" if parameters_to_delete:\n",
" print(f\"Found {len(parameters_to_delete)} parameters to delete:\\n\")\n",
" \n",
" # Delete parameters in batches of 10 (AWS limit)\n",
" for i in range(0, len(parameters_to_delete), 10):\n",
" batch = parameters_to_delete[i:i+10]\n",
" try:\n",
" ssm_client.delete_parameters(Names=batch)\n",
" for param in batch:\n",
" print(f\" ✅ Deleted: {param}\")\n",
" except Exception as e:\n",
" print(f\" ❌ Error deleting batch: {e}\")\n",
" \n",
" print(f\"\\n✅ Deleted {len(parameters_to_delete)} SSM parameters\")\n",
" else:\n",
" print(\"⏭️ No SSM parameters found with /app/lakehouse-agent/ prefix\")\n",
" \n",
"except Exception as e:\n",
" print(f\"❌ Error deleting SSM parameters: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"Review the cleanup results above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\n\" + \"=\"*70)\n",
"print(\"🎉 CLEANUP COMPLETE\")\n",
"print(\"=\"*70)\n",
"\n",
"print(\"\\n✅ Resources cleaned up:\")\n",
"print(\" • Agent Runtime (notebook 05)\")\n",
"print(\" • Gateway & Targets (notebook 04)\")\n",
"print(\" • OAuth2 Credential Providers\")\n",
"print(\" • IAM Roles (Gateway)\")\n",
"print(\" • Interceptor Lambda (notebook 04)\")\n",
"print(\" • MCP Server Runtime (notebook 03)\")\n",
"print(\" • Local Configuration Files (.bedrock_agentcore.yaml)\")\n",
"print(\" • Cognito User Pool (notebook 02)\")\n",
"print(\" • Athena Database & Tables (notebook 01)\")\n",
"print(\" • SSM Parameters (notebook 00)\")\n",
"\n",
"if DELETE_S3_BUCKET:\n",
" print(\" • S3 Bucket & Data (notebook 00)\")\n",
"else:\n",
" print(\" ⏭️ S3 Bucket (skipped - set DELETE_S3_BUCKET=True to delete)\")\n",
"\n",
"print(\"\\n📋 Manual cleanup (if needed):\")\n",
"print(\" • CloudWatch Log Groups: /aws/bedrock-agentcore/runtime/*\")\n",
"print(\" • CloudWatch Log Groups: /aws/lambda/lakehouse-*\")\n",
"print(\" • ECR Repositories (if created)\")\n",
"\n",
"print(\"\\n\" + \"=\"*70)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

+754
View File
@@ -0,0 +1,754 @@
# Lakehouse Agent with OAuth Authentication
A lakehouse data processing system demonstrating Amazon Bedrock AgentCore capabilities with end-to-end OAuth authentication, row-level security based on federated user identity, and conversational AI for data queries.
## Table of Contents
- [Overview](#overview)
- [Architecture](#architecture)
- [Key Features](#key-features)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Deployment Steps](#deployment-steps)
- [Testing](#testing)
- [Usage Examples](#usage-examples)
- [Troubleshooting](#troubleshooting)
- [Cost Estimate](#cost-estimate)
---
## Overview
This system showcases a lakehouse data processing application with:
- **Streamlit UI** with Cognito OAuth authentication
- **AI-Powered Lakehouse Agent** hosted on AgentCore Runtime using Strands framework
- **AgentCore Gateway** with JWT token validation via interceptor Lambda
- **MCP Server** connecting to AWS Athena for data queries
- **OAuth credentials** propagated through the entire stack (UI → Agent → Gateway → MCP → Athena)
- **Row-Level Security** enforced through federated user identity
### What Makes This Production-Ready
**End-to-End OAuth**: JWT bearer tokens validated at every layer
**Row-Level Security**: Agentcore lambda interceptors translate user tokens to user identity which is passed on to the MCP server to ensure row-level access control
**Conversational AI**: Natural language interface for data queries
**Scalable Architecture**: AgentCore Runtime and Gateway for production workloads
**Full Audit Trail**: CloudTrail logs all data access with user identity
**Secure by Design**: Token validation at multiple checkpoints
---
## Architecture
### High-Level Architecture
![Lakehouse Agent Architecture](Lakehouse-agent-architecture.png)
### Authentication flow
```
┌─────────────────────────────────────────────────────────────────┐
│ User Layer │
│ ┌────────────────┐ │
│ │ Streamlit UI │ OAuth login via Cognito (USER CREDENTIALS) │
│ │ + Cognito Auth │ Client: lakehouse-client │
│ └────────┬───────┘ │
└───────────┼─────────────────────────────────────────────────────┘
│ Bearer Token (JWT with user identity)
┌───────────▼─────────────────────────────────────────────────────┐
│ AI Agent Layer │
│ ┌────────────────┐ │
│ │Lakehouse Agent │ Strands-based conversational agent │
│ │ AgentCore │ Natural language data processing │
│ │ Runtime │ JWT Authorizer validates USER token │
│ └────────┬───────┘ Allowed: lakehouse-client (user auth) │
└───────────┼─────────────────────────────────────────────────────┘
│ Bearer Token + Tool Request
┌───────────▼─────────────────────────────────────────────────────┐
│ Gateway & Policy Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ AgentCore Gateway + Interceptor Lambda │ │
│ │ - Validates JWT tokens (USER token from agent) │ │
│ │ - Extracts user identity (email) │ │
│ │ - Enforces scope-based tool access │ │
│ │ - Adds user identity to request headers │ │
│ │ JWT Inbound: lakehouse-client (user auth) │ │
│ │ │ │
│ │ OAuth Provider: lakehouse-mcp-m2m-oauth-provider │ │
│ │ - Gateway obtains M2M token for MCP Runtime │ │
│ │ - Client: lakehouse-m2m-client (M2M only) │ │
│ └────────┬─────────────────────────────────────────────────┘ │
└───────────┼─────────────────────────────────────────────────────┘
│ M2M Token + User Identity + Tool Request
┌───────────▼─────────────────────────────────────────────────────┐
│ Tool Execution Layer │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ MCP Server (AgentCore Runtime) │ │
│ │ Athena connector for data queries │ │
│ │ JWT Authorizer validates M2M token │ │
│ │ Allowed: lakehouse-m2m-client (M2M only) │ │
│ │ - Receives user_id from Gateway (X-User-Principal) │ │
│ │ - Executes Athena queries │ │
│ │ - Returns query results │ │
│ └────────┬───────────────────────────────────────────────────┘ │
└───────────┼─────────────────────────────────────────────────────┘
│ Athena Query
┌───────────▼────────────────────────────────────────────────────┐
│ Data Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ AWS Athena + Glue Data Catalog │ │
│ │ • lakehouse_db database │ │
│ │ • claims table │ │
│ │ • users table (metadata) │ │
│ │ • Executes queries and returns results │ │
│ │ • S3 backend for data storage │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
```
### Data Flow Example: User Query
```
1. User Login
Streamlit UI → Cognito → Returns JWT with user identity
JWT contains: {
"email": "user001@example.com",
"scope": "lakehouse-api/claims.query"
}
2. Query Submission
User: "Show me all claims"
UI → Agent Runtime
POST /agent-runtime
Headers:
Authorization: Bearer <JWT_token> ← Token in header for JWT validation
Body:
{
"prompt": "Show me all claims",
"bearer_token": "<JWT_token>" ← Token also in body for agent to use
}
3. Agent Runtime Processing
a) JWT Authorizer validates token (signature, expiration, audience)
b) Agent code extracts token from payload (JWT authorizer consumes header)
c) Agent creates MCP client to Gateway with bearer token
d) Agent uses AI to decide which tools to call
Agent → Gateway
POST /gateway
Headers:
Authorization: Bearer <JWT_token> ← Same token passed through
Body:
{"jsonrpc": "2.0", "method": "tools/call", "params": {...}}
4. Gateway Interception
Interceptor Lambda:
- Validates JWT signature ✓
- Checks token expiration ✓
- Extracts user identity: "user001@example.com"
- Validates scope: "claims.query" ✓
- Adds header: X-User-Principal: user001@example.com
Gateway → MCP Server (with user context)
5. Tool Execution
MCP Server:
- Extracts user from X-User-Principal header
- Executes Athena query
- Query: SELECT * FROM claims WHERE status = 'pending'
- Returns results
6. Athena Execution
Athena executes query → Returns results
7. Response Flow
Athena → MCP → Gateway → Agent → UI
Agent formats results naturally
User sees: "I found 3 pending claims..."
Key Points:
✅ Bearer token in Authorization header (for JWT validation at runtime entry)
✅ Bearer token also in payload (for agent code to use with Gateway)
Note: JWT authorizer consumes Authorization header and doesn't pass it through
✅ Token validated at agent entry (JWT authorizer)
✅ Token validated at gateway entry (Interceptor Lambda)
✅ User identity propagated through entire chain
```
---
## Key Features
### Security Features
- **🔒 End-to-End OAuth**: JWT bearer tokens with multi-layer validation
- ** Row-Level Security**: Agentcore Lambda interceptor translates JWT tokens on federated user identity to user principals
- ** Fine-Grained Access Control**: JWT scopes determine which tools users can access
- ** Token Propagation**: User identity flows through entire system
- ** Full AudiIt Trail**: CloudTrail logs all data access with user identity
- **🛡️ Gateway Interceptor**: Policy-based tool access enforcement
### Application Features
- **🏥 Health Insurance Operations**: Query claims data conversationally
- **💬 Conversational AI**: Natural language interface for data queries
- **☁️ AWS Athena Integration**: Scalable data queries
- **🎯 Multi-User Support**: User identity tracked throughout request flow
---
## Prerequisites
### AWS Account Setup
1. **AWS Account**:
- AWS Account ID (e.g., XXXXXXXXXXXX)
- Region: us-east-1 (configurable)
2. **AWS Permissions**:
```
- BedrockAgentCoreFullAccess
- AmazonBedrockFullAccess
- AmazonAthenaFullAccess
- AmazonS3FullAccess
- AWSLambdaFullAccess
- AmazonCognitoPowerUser
- SSMFullAccess
```
3. **AWS Services**:
- Amazon Bedrock (with Claude Sonnet 4.5 access)
- Amazon Bedrock AgentCore
- AWS Lambda
- Amazon Cognito
- AWS Athena
- AWS Glue
- Amazon S3
- AWS Systems Manager (SSM Parameter Store)
### Development Environment
```bash
# Python 3.10 or later
python --version
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
## Quick Start
The fastest way to deploy the complete system is through the provided Jupyter notebooks. Run them in sequence:
### Prerequisites
Ensure you have AWS credentials configured:
```bash
# Option 1: Using .env file (Recommended)
# Create a .env file in this directory with your AWS credentials:
# AWS_ACCESS_KEY_ID=your-access-key-id
# AWS_SECRET_ACCESS_KEY=your-secret-access-key
# AWS_SESSION_TOKEN=your-session-token # Optional, for STS credentials
# AWS_DEFAULT_REGION=us-east-1
# Option 2: If using SSO
export AWS_PROFILE=your-profile-name
aws sso login --profile your-profile-name
# Option 3: If using access keys
aws configure
```
### Deployment via Notebooks
Start Jupyter and run the notebooks in order:
```bash
jupyter notebook
```
**Notebook Sequence**:
1. **00-prerequisites-setup.ipynb** - Configure environment and create S3 bucket
2. **01-deploy-athena.ipynb** - Deploy Athena database with sample data
3. **02-deploy-cognito.ipynb** - Set up OAuth with Cognito user pool
4. **03-deploy-mcp-server.ipynb** - Deploy MCP server on AgentCore Runtime
5. **04-deploy-gateway.ipynb** - Deploy Gateway with JWT interceptor
6. **05-deploy-agent.ipynb** - Deploy conversational AI agent
7. **06-streamlit-ui-deployment.ipynb** - Test end-to-end flow with OAuth
8. **07-optional-cleanup.ipynb** - Clean up all deployed resources (optional)
**Total deployment time**: ~2-3 hours
**Credential Loading**: All notebooks use centralized credential loading that automatically detects and uses credentials from your `.env` file, environment variables, or AWS SSO (in that order of priority). No need to configure credentials separately in each notebook.
Each notebook:
- Explains what it deploys
- Shows progress and outputs
- Saves configuration to SSM Parameter Store
- Can be re-run safely (idempotent where possible)
### What Gets Deployed
- **S3 Bucket**: Data storage for Athena
- **Athena Database**: `lakehouse_db` with `claims` and `users` tables
- **Cognito User Pool**: OAuth authentication with test users
- **MCP Server**: Tool execution layer on AgentCore Runtime
- **Gateway**: Request routing with JWT validation
- **Agent**: Conversational AI on AgentCore Runtime
- **Test Users**: user001@example.com, user002@example.com (password: TempPass123!)
### Cleanup
To remove all deployed resources, run **07-optional-cleanup.ipynb**. This notebook will:
- Delete all AgentCore Runtimes and Gateways
- Delete Lambda functions
- Delete Cognito User Pool
- Delete Athena database and tables
- Optionally delete S3 bucket and data
- Delete all SSM parameters
### Quick Test
After deployment, test with:
```python
# In notebook 06 or programmatically
Query: "Show me all claims"
Expected: Conversational response with claims data
```
### Manual Deployment (Alternative)
If you prefer command-line deployment instead of notebooks, see the [Deployment Steps](#deployment-steps) section below.
---
## Deployment Steps
This section provides manual command-line deployment instructions as an alternative to the notebooks.
### Complete Deployment Roadmap
| Phase | Component | Command | Duration |
|-------|-----------|---------|----------|
| 1 | Athena Database | `python setup_athena.py` | 5 min |
| 2 | Cognito User Pool | `python setup_cognito.py` | 5 min |
| 3 | MCP Server | `python deploy_runtime.py --yes` | 10 min |
| 4 | Gateway & Interceptor | `python deploy_interceptor.py` + `python create_gateway.py --yes` | 5 min |
| 5 | Lakehouse Agent | `python deploy_lakehouse_agent.py --yes` | 5 min |
| 6 | Streamlit UI | `streamlit run streamlit_app.py` | 5 min |
**Total Time**: ~2.5 hours
### Manual Deployment Commands
If deploying via command line instead of notebooks:
```bash
# Step 1: Deploy Athena
cd athena-setup
python setup_athena.py
# Step 2: Deploy Cognito
cd ../cognito-setup
python setup_cognito.py
# Step 3: Deploy MCP Server
cd ../mcp-lakehouse-server
python deploy_runtime.py --yes
# Step 4: Deploy Gateway & Interceptor
cd ../gateway-setup/interceptor
python deploy_interceptor.py
cd ..
python create_gateway.py --yes
# Step 5: Deploy Agent
cd ../lakehouse-agent
python deploy_lakehouse_agent.py --yes
# Step 6: Test
cd ..
streamlit run streamlit_app.py
```
### Key Configuration Features
**All scripts use**:
- ✅ **AWS session utility** (`aws_session_utils.py`) for SSO support
- ✅ **SSM Parameter Store** for sharing configuration
- ✅ **Automatic region detection** from AWS credentials
- ✅ **--yes flags** for notebook automation (skip interactive prompts)
---
## Testing
**Test flow**:
1. Get OAuth token from Cognito
2. Call Agent Runtime with bearer token in header
3. Agent processes natural language query
4. Agent calls Gateway tools (validated by interceptor)
5. MCP Server executes Athena query
6. Results returned through chain
**Expected output**:
```
✅ Token obtained: eyJraWQiOiJxxx...
✅ Agent response received
✅ Tool calls: 1
📝 Agent output: "I found 9 claims in the database..."
```
### Manual Test via Streamlit
```bash
cd streamlit-ui
streamlit run streamlit_app.py
```
Test queries:
- "Show me all claims"
- "Get claims summary"
- "What claims are pending?"
### User-Specific Data Access Demo
The lakehouse agent implements row-level security (RLS) through Agentcore Lambda interceptors, ensuring users only see data they're authorized to access. Based on the logged-in user, you can see how user-specific datasets are shared in the screenshots below:
#### Test User 1 - Limited Access
![Test User 1 - Lakehouse Agent](screenshots/testuser1-lakehouseagent.png)
**User**: `testuser1` - Shows limited dataset access based on user permissions. This user can only see claims and data that they are authorized to view through row-level filters.
#### Test User 2 - Different Data Scope
![Test User 2 - Lakehouse Agent](screenshots/testuser2-lakehouseagent.png)
**User**: `testuser2` - Shows a different set of data based on their specific permissions. Notice how the same query returns different results depending on the authenticated user's access rights.
**Key Security Features Demonstrated**:
- ✅ **Row-Level Security**: Each user sees only their authorized data
- ✅ **OAuth Integration**: User identity flows from Cognito through the entire system
- ✅ **Consistent Experience**: Same interface, personalized data based on user context
---
## Usage Examples
### Example 1: Query All Claims
```
User: "Show me all claims"
Agent Response:
"I found 9 claims in the database:
1. Claim CLM-001 - Amount: $500, Status: Approved
2. Claim CLM-002 - Amount: $1200, Status: Pending
3. Claim CLM-003 - Amount: $300, Status: Approved
...
Would you like details on any specific claim?"
```
### Example 2: Get Claims Summary
```
User: "Give me a summary of all claims"
Agent Response:
"Here's your claims summary:
Total Claims: 9
Total Amount: $8,900
By Status:
- Approved: 4 claims ($3,200)
- Pending: 3 claims ($2,700)
- Denied: 2 claims ($3,000)
By Type:
- Medical: 5 claims ($5,000)
- Prescription: 2 claims ($800)
- Hospital: 2 claims ($3,100)"
```
### Example 3: Check Claim Status
```
User: "What's the status of claim CLM-001?"
Agent Response:
"Claim CLM-001 details:
Status: Approved ✓
Amount: $500
Provider: City Hospital
Type: Medical Visit
Submitted Date: 2024-01-15
Processed Date: 2024-01-18"
```
---
## Troubleshooting
### Common Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| **AWS credentials not found** | Missing .env file or invalid credentials | Create .env file with valid AWS credentials |
| **Token has expired** | STS credentials expired | Update .env with fresh credentials or use SSO |
| **No credentials** | AWS_PROFILE not set (SSO) | `export AWS_PROFILE=your-profile` |
| **Bearer token required** | No token in request | Ensure token in Authorization header |
| **Invalid token** | Token expired or wrong client | Get new token from Cognito |
| **Gateway timeout** | MCP server slow | Increase Lambda timeout to 300s |
| **Athena permission denied** | Missing IAM permissions | Check execution role has Athena access |
### Credential Troubleshooting
#### .env File Issues
**Error: "AWS credentials not found" or "No credentials configured"**
```bash
# Check if .env file exists
ls -la .env
# If missing, create it with your credentials:
cat > .env << EOF
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_SESSION_TOKEN=your-session-token
AWS_DEFAULT_REGION=us-east-1
EOF
```
**Error: "Token has expired" (when using STS credentials)**
```bash
# Get new temporary credentials and update .env file
# Example using assume-role:
aws sts assume-role --role-arn arn:aws:iam::ACCOUNT:role/ROLE --role-session-name session
# Then update .env with the new credentials
```
**Error: "Environment variables not loaded"**
- Restart Jupyter kernel to reload environment variables
- Ensure `.env` file is in the correct directory (`02-use-cases/lakehouse-agent/`)
- Check file permissions: `chmod 600 .env`
### AWS SSO Troubleshooting
**Error: "Token has expired and refresh failed"**
```bash
# Solution: Re-login
aws sso logout
aws sso login --profile your-profile-name
```
**Error: "Profile not found"**
```bash
# Check profiles
aws configure list-profiles
# If missing, reconfigure
aws configure sso
```
**Error: "You must specify a region"**
```bash
# Set region in profile
aws configure set region us-east-1 --profile your-profile
# Or environment variable
export AWS_DEFAULT_REGION=us-east-1
```
### Debug Commands
```bash
# Check configuration in SSM
python test_ssm_validation.py
# Check agent status
python check_agent_status.py
# View CloudWatch logs (replace runtime-id)
aws logs tail /aws/bedrock-agentcore/runtime/runtime-id --follow
# View Gateway interceptor logs
aws logs tail /aws/lambda/lakehouse-gateway-interceptor --follow
# View MCP server logs
aws logs tail /aws/bedrock-agentcore/runtime/mcp-server-id --follow
# Test JWT token
python gateway-setup/test_cognito_login.py
```
### Logs to Check
**Agent Runtime logs**:
```bash
aws logs tail /aws/bedrock-agentcore/runtime/<runtime-id> --follow
```
Expected:
```
✅ Bearer token extracted from Authorization header
✅ Loaded 5 tools from Gateway
⏳ Processing request...
✅ Request processed
```
**Interceptor Lambda logs**:
```bash
aws logs tail /aws/lambda/lakehouse-gateway-interceptor --follow
```
Expected:
```
INFO Bearer token extracted from MCP gateway request
INFO Token validation successful
INFO User: user001@example.com
```
---
## File Structure
```
lakehouse-agent/
├── 📦 Utilities
│ └── aws_session_utils.py # AWS SSO session management
├── 🗄️ Data Layer
│ └── athena-setup/
│ ├── setup_athena.py # Athena database setup
│ ├── create_tables.sql # Table definitions
│ └── sample_data.sql # Sample data
├── 🔐 Identity Layer
│ └── cognito-setup/
│ └── setup_cognito.py # Cognito OAuth setup
├── 🌐 Gateway Layer
│ └── gateway-setup/
│ ├── create_gateway.py # Gateway creation
│ └── interceptor/
│ ├── lambda_function.py # JWT validator
│ └── deploy_interceptor.py # Deployment script
├── 🔧 Tool Layer
│ └── mcp-lakehouse-server/
│ ├── server.py # MCP server
│ ├── athena_tools.py # Athena query tools
│ ├── requirements.txt # Dependencies
│ └── deploy_runtime.py # Deployment script
├── 🤖 Agent Layer
│ └── lakehouse-agent/
│ ├── lakehouse_agent.py # Strands-based agent
│ ├── requirements.txt # Dependencies
│ └── deploy_lakehouse_agent.py # Deployment script
├── 🖥️ UI Layer
│ └── streamlit-ui/
│ ├── streamlit_app.py # Main UI application
│ └── requirements.txt # Dependencies
└── 🧪 Testing
└── tests/
├── test_athena.py # Database tests
├── test_cognito.py # Auth tests
├── test_gateway.py # Gateway tests
├── test_agent.py # Agent tests
└── test_end_to_end.py # Integration tests
```
---
## Cost Estimate
### Monthly Cost Breakdown (Approximate)
```
Component Monthly Cost
─────────────────────────────────────────────
S3 Storage (100GB) $2.30
Athena (1TB scanned/month) $5.00
Lambda (1M invocations) $0.20
Cognito (1000 users) $0.00 (free tier)
AgentCore Runtime (2 runtimes) $50-$100
Bedrock Claude API Variable (per token)
─────────────────────────────────────────────
Total (excluding Bedrock) ~$60-$110/month
```
### Cost Optimization Tips
- Use Parquet format for S3 data (reduces Athena scan costs by 90%)
- Partition data by date (faster queries, lower costs)
- Cache frequent queries in application layer
- Monitor Bedrock token usage with CloudWatch
---
## OAuth Scopes
The system uses JWT scopes for fine-grained access control:
| Scope | Description | Allows |
|-------|-------------|--------|
| `lakehouse-api/claims.query` | Read claims | query_claims, get_claim_details, get_claims_summary |
| `lakehouse-api/claims.submit` | Submit claims | submit_claim |
| `lakehouse-api/claims.update` | Update claims | update_claim_status |
Scopes are validated in the Gateway interceptor Lambda.
---
## Support & Resources
### AWS Documentation
- [Bedrock AgentCore](https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html)
- [Amazon Athena](https://docs.aws.amazon.com/athena/)
- [Amazon Cognito](https://docs.aws.amazon.com/cognito/)
- [AWS Lambda](https://docs.aws.amazon.com/lambda/)
### Community
- AWS Forums: https://forums.aws.amazon.com/
- Stack Overflow tags: `amazon-bedrock`, `amazon-athena`, `aws-lambda`
---
## License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
---
**Status**: Production-Ready ✅
**Authentication**: End-to-End OAuth with JWT
**Last Updated**: January 2025
@@ -0,0 +1,79 @@
-- Health Lakehouse Data Processing System - Athena Table Schema
-- This script creates the database and tables for storing health lakehouse data
-- Create database
CREATE DATABASE IF NOT EXISTS lakehouse_db;
-- Use the database
-- Note: In Athena, you must explicitly use the database in your queries
-- Create claims table with row-level access control support
CREATE EXTERNAL TABLE IF NOT EXISTS lakehouse_db.claims (
claim_id STRING COMMENT 'Unique claim identifier',
user_id STRING COMMENT 'User/patient email for row-level access control',
patient_name STRING COMMENT 'Patient full name',
patient_dob DATE COMMENT 'Patient date of birth',
claim_date DATE COMMENT 'Date when claim occurred',
claim_amount DECIMAL(10,2) COMMENT 'Total claim amount in USD',
claim_type STRING COMMENT 'Type of claim: medical, prescription, hospital, emergency',
claim_status STRING COMMENT 'Current status: pending, approved, denied, in_review, requires_info',
provider_name STRING COMMENT 'Healthcare provider name',
provider_npi STRING COMMENT 'National Provider Identifier',
diagnosis_code STRING COMMENT 'ICD-10 diagnosis code',
procedure_code STRING COMMENT 'CPT procedure code',
submitted_date TIMESTAMP COMMENT 'When claim was submitted',
processed_date TIMESTAMP COMMENT 'When claim was processed',
approved_amount DECIMAL(10,2) COMMENT 'Approved claim amount',
denial_reason STRING COMMENT 'Reason for denial if applicable',
notes STRING COMMENT 'Additional notes or comments',
created_by STRING COMMENT 'User who created the claim',
last_modified_by STRING COMMENT 'User who last modified the claim',
last_modified_date TIMESTAMP COMMENT 'Last modification timestamp'
)
COMMENT 'Health lakehouse data table with row-level access control'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://YOUR_BUCKET_NAME/lakehouse-data/claims/'
TBLPROPERTIES (
'skip.header.line.count'='1',
'classification'='csv'
);
-- Create users table for reference (optional, if you want to store user metadata)
CREATE EXTERNAL TABLE IF NOT EXISTS lakehouse_db.users (
user_id STRING COMMENT 'User email address',
user_name STRING COMMENT 'User full name',
user_role STRING COMMENT 'User role: patient, adjuster, admin',
department STRING COMMENT 'Department or region',
created_date TIMESTAMP COMMENT 'User creation date'
)
COMMENT 'Users table for reference'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://YOUR_BUCKET_NAME/lakehouse-data/users/'
TBLPROPERTIES (
'skip.header.line.count'='1',
'classification'='csv'
);
-- Create audit log table for tracking access (optional)
CREATE EXTERNAL TABLE IF NOT EXISTS lakehouse_db.audit_log (
log_id STRING COMMENT 'Unique log identifier',
user_id STRING COMMENT 'User who performed the action',
action STRING COMMENT 'Action performed: query, insert, update, delete',
claim_id STRING COMMENT 'Claim ID affected',
timestamp TIMESTAMP COMMENT 'When action was performed',
ip_address STRING COMMENT 'User IP address',
details STRING COMMENT 'Additional details about the action'
)
COMMENT 'Audit log for compliance and security'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://YOUR_BUCKET_NAME/lakehouse-data/audit/'
TBLPROPERTIES (
'skip.header.line.count'='1',
'classification'='csv'
);
@@ -0,0 +1,37 @@
-- Sample Health Lakehouse Data Data
-- This file contains realistic sample data for testing row-level access control
-- Sample data for claims table
-- Format: claim_id, user_id, patient_name, patient_dob, claim_date, claim_amount, claim_type, claim_status, provider_name, provider_npi, diagnosis_code, procedure_code, submitted_date, processed_date, approved_amount, denial_reason, notes, created_by, last_modified_by, last_modified_date
-- Claims for user001@example.com (Patient: John Doe)
INSERT INTO lakehouse_db.claims VALUES
('CLM-2024-001', 'user001@example.com', 'John Doe', DATE '1985-03-15', DATE '2024-01-10', 1250.00, 'medical', 'approved', 'City Medical Center', '1234567890', 'J06.9', '99213', TIMESTAMP '2024-01-11 09:30:00', TIMESTAMP '2024-01-15 14:20:00', 1000.00, NULL, 'Annual physical examination and lab work', 'user001@example.com', 'adjuster001@example.com', TIMESTAMP '2024-01-15 14:20:00'),
('CLM-2024-002', 'user001@example.com', 'John Doe', DATE '1985-03-15', DATE '2024-02-05', 85.50, 'prescription', 'approved', 'CVS Pharmacy', '9876543210', 'E11.9', '90670', TIMESTAMP '2024-02-05 16:45:00', TIMESTAMP '2024-02-06 10:15:00', 85.50, NULL, 'Diabetes medication - monthly refill', 'user001@example.com', 'adjuster001@example.com', TIMESTAMP '2024-02-06 10:15:00'),
('CLM-2024-003', 'user001@example.com', 'John Doe', DATE '1985-03-15', DATE '2024-02-20', 3500.00, 'hospital', 'in_review', 'General Hospital', '1122334455', 'M54.5', '22612', TIMESTAMP '2024-02-21 08:00:00', NULL, NULL, NULL, 'Emergency room visit for back pain, including X-rays', 'user001@example.com', 'user001@example.com', TIMESTAMP '2024-02-21 08:00:00'),
('CLM-2024-004', 'user001@example.com', 'John Doe', DATE '1985-03-15', DATE '2024-03-10', 450.00, 'medical', 'pending', 'Downtown Dental Clinic', '2233445566', 'K02.9', 'D0150', TIMESTAMP '2024-03-11 11:20:00', NULL, NULL, NULL, 'Dental examination and cleaning', 'user001@example.com', 'user001@example.com', TIMESTAMP '2024-03-11 11:20:00');
-- Claims for user002@example.com (Patient: Jane Smith)
INSERT INTO lakehouse_db.claims VALUES
('CLM-2024-005', 'user002@example.com', 'Jane Smith', DATE '1990-07-22', DATE '2024-01-15', 850.00, 'medical', 'approved', 'Women''s Health Center', '5544332211', 'Z00.00', '99395', TIMESTAMP '2024-01-16 10:00:00', TIMESTAMP '2024-01-18 15:30:00', 680.00, NULL, 'Annual gynecological exam and preventive care', 'user002@example.com', 'adjuster001@example.com', TIMESTAMP '2024-01-18 15:30:00'),
('CLM-2024-006', 'user002@example.com', 'Jane Smith', DATE '1990-07-22', DATE '2024-02-10', 125.00, 'prescription', 'approved', 'Walgreens Pharmacy', '6655443322', 'H10.9', '90680', TIMESTAMP '2024-02-10 13:15:00', TIMESTAMP '2024-02-11 09:00:00', 125.00, NULL, 'Antibiotic prescription for eye infection', 'user002@example.com', 'adjuster001@example.com', TIMESTAMP '2024-02-11 09:00:00'),
('CLM-2024-007', 'user002@example.com', 'Jane Smith', DATE '1990-07-22', DATE '2024-02-25', 12500.00, 'hospital', 'approved', 'St. Mary''s Hospital', '7766554433', 'O80', '59400', TIMESTAMP '2024-02-26 07:30:00', TIMESTAMP '2024-03-05 16:00:00', 10000.00, NULL, 'Childbirth and postpartum care', 'user002@example.com', 'adjuster002@example.com', TIMESTAMP '2024-03-05 16:00:00'),
('CLM-2024-008', 'user002@example.com', 'Jane Smith', DATE '1990-07-22', DATE '2024-03-15', 200.00, 'medical', 'denied', 'Cosmetic Surgery Center', '8877665544', 'Z41.1', '15780', TIMESTAMP '2024-03-16 14:00:00', TIMESTAMP '2024-03-20 11:00:00', 0.00, 'Cosmetic procedures not covered by policy', 'Facial cosmetic procedure', 'user002@example.com', 'adjuster002@example.com', TIMESTAMP '2024-03-20 11:00:00'),
('CLM-2024-009', 'user002@example.com', 'Jane Smith', DATE '1990-07-22', DATE '2024-03-25', 75.00, 'prescription', 'pending', 'Target Pharmacy', '9988776655', 'Z79.890', '90715', TIMESTAMP '2024-03-26 09:45:00', NULL, NULL, NULL, 'Vitamin supplements and prenatal care', 'user002@example.com', 'user002@example.com', TIMESTAMP '2024-03-26 09:45:00');
-- Claims for adjuster001@example.com (Staff member - testing cross-access)
INSERT INTO lakehouse_db.claims VALUES
('CLM-2024-010', 'adjuster001@example.com', 'Michael Johnson', DATE '1978-11-30', DATE '2024-01-20', 500.00, 'medical', 'approved', 'Quick Care Clinic', '1231231234', 'J20.9', '99214', TIMESTAMP '2024-01-21 08:00:00', TIMESTAMP '2024-01-22 10:00:00', 500.00, NULL, 'Urgent care visit for bronchitis', 'adjuster001@example.com', 'adjuster002@example.com', TIMESTAMP '2024-01-22 10:00:00'),
('CLM-2024-011', 'adjuster001@example.com', 'Michael Johnson', DATE '1978-11-30', DATE '2024-03-01', 2800.00, 'hospital', 'pending', 'Regional Medical Center', '4564564567', 'S52.501A', '25605', TIMESTAMP '2024-03-02 15:30:00', NULL, NULL, NULL, 'Fracture treatment - left wrist, includes casting', 'adjuster001@example.com', 'adjuster001@example.com', TIMESTAMP '2024-03-02 15:30:00');
-- Sample data for users table
INSERT INTO lakehouse_db.users VALUES
('user001@example.com', 'John Doe', 'patient', 'Individual', TIMESTAMP '2023-01-15 00:00:00'),
('user002@example.com', 'Jane Smith', 'patient', 'Individual', TIMESTAMP '2023-02-20 00:00:00'),
('adjuster001@example.com', 'Michael Johnson', 'adjuster', 'Claims Department', TIMESTAMP '2022-06-01 00:00:00'),
('adjuster002@example.com', 'Sarah Williams', 'adjuster', 'Claims Department', TIMESTAMP '2022-08-15 00:00:00'),
('admin@example.com', 'Admin User', 'admin', 'IT Department', TIMESTAMP '2022-01-01 00:00:00');
-- Note: The actual data insertion for Athena requires CSV files in S3
-- This SQL is for reference and documentation purposes
-- The setup_athena.py script will create proper CSV files and upload them to S3
@@ -0,0 +1,611 @@
#!/usr/bin/env python3
"""
Athena Setup Script for Health Lakehouse Data Processing System
This script:
1. Creates an S3 bucket for Athena data and query results
2. Uploads sample claims data to S3
3. Creates Athena database and tables
4. Verifies the setup by running test queries
Usage:
python setup_athena.py --bucket-name BUCKET_NAME
python setup_athena.py # Reads bucket name from SSM
Arguments:
--bucket-name: (Optional) Base name for S3 bucket. Will be prefixed with {account_id}-{region_name}-
If not provided, reads from SSM parameter /app/lakehouse-agent/s3-bucket-name
Example: --bucket-name my-lakehouse creates bucket XXXXXXXXXXXX-us-east-1-my-lakehouse
"""
import boto3
import csv
import io
import time
import sys
import argparse
from datetime import datetime, date
from decimal import Decimal
from typing import List, Dict, Any
class AthenaSetup:
def __init__(self, bucket_base_name: str):
"""
Initialize Athena setup with AWS region and S3 bucket name.
Region is obtained from the boto3 session.
Bucket name is constructed as: {account_id}-{region}-{bucket_base_name}
Args:
bucket_base_name: Base name for S3 bucket (will be prefixed with account_id and region)
"""
# Get region from boto3 session
session = boto3.Session()
self.region = session.region_name
# Get account ID from STS
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()['Account']
# Construct bucket name with prefix
self.bucket_name = f"{account_id}-{self.region}-{bucket_base_name}"
self.database_name = 'lakehouse_db'
# Initialize AWS clients
self.s3_client = boto3.client('s3', region_name=self.region)
self.athena_client = boto3.client('athena', region_name=self.region)
self.ssm_client = boto3.client('ssm', region_name=self.region)
# S3 locations
self.claims_prefix = 'lakehouse-data/claims/'
self.users_prefix = 'lakehouse-data/users/'
self.query_results_prefix = 'athena-results/'
def create_s3_bucket(self):
"""Create S3 bucket if it doesn't exist."""
print(f"\n📦 Checking S3 bucket: {self.bucket_name}")
try:
# Check if bucket exists
self.s3_client.head_bucket(Bucket=self.bucket_name)
print(f"✅ Bucket {self.bucket_name} already exists")
except:
# Bucket doesn't exist, create it
try:
if self.region == 'us-east-1':
self.s3_client.create_bucket(Bucket=self.bucket_name)
else:
self.s3_client.create_bucket(
Bucket=self.bucket_name,
CreateBucketConfiguration={'LocationConstraint': self.region}
)
print(f"✅ Created S3 bucket: {self.bucket_name}")
except Exception as e:
print(f"❌ Error creating bucket: {e}")
sys.exit(1)
def get_sample_claims_data(self) -> List[Dict[str, Any]]:
"""Generate sample claims data."""
return [
# Claims for user001@example.com (John Doe)
{
'claim_id': 'CLM-2024-001',
'user_id': 'user001@example.com',
'patient_name': 'John Doe',
'patient_dob': '1985-03-15',
'claim_date': '2024-01-10',
'claim_amount': '1250.00',
'claim_type': 'medical',
'claim_status': 'approved',
'provider_name': 'City Medical Center',
'provider_npi': '1234567890',
'diagnosis_code': 'J06.9',
'procedure_code': '99213',
'submitted_date': '2024-01-11 09:30:00',
'processed_date': '2024-01-15 14:20:00',
'approved_amount': '1000.00',
'denial_reason': '',
'notes': 'Annual physical examination and lab work',
'created_by': 'user001@example.com',
'last_modified_by': 'adjuster001@example.com',
'last_modified_date': '2024-01-15 14:20:00'
},
{
'claim_id': 'CLM-2024-002',
'user_id': 'user001@example.com',
'patient_name': 'John Doe',
'patient_dob': '1985-03-15',
'claim_date': '2024-02-05',
'claim_amount': '85.50',
'claim_type': 'prescription',
'claim_status': 'approved',
'provider_name': 'CVS Pharmacy',
'provider_npi': '9876543210',
'diagnosis_code': 'E11.9',
'procedure_code': '90670',
'submitted_date': '2024-02-05 16:45:00',
'processed_date': '2024-02-06 10:15:00',
'approved_amount': '85.50',
'denial_reason': '',
'notes': 'Diabetes medication - monthly refill',
'created_by': 'user001@example.com',
'last_modified_by': 'adjuster001@example.com',
'last_modified_date': '2024-02-06 10:15:00'
},
{
'claim_id': 'CLM-2024-003',
'user_id': 'user001@example.com',
'patient_name': 'John Doe',
'patient_dob': '1985-03-15',
'claim_date': '2024-02-20',
'claim_amount': '3500.00',
'claim_type': 'hospital',
'claim_status': 'in_review',
'provider_name': 'General Hospital',
'provider_npi': '1122334455',
'diagnosis_code': 'M54.5',
'procedure_code': '22612',
'submitted_date': '2024-02-21 08:00:00',
'processed_date': '',
'approved_amount': '',
'denial_reason': '',
'notes': 'Emergency room visit for back pain, including X-rays',
'created_by': 'user001@example.com',
'last_modified_by': 'user001@example.com',
'last_modified_date': '2024-02-21 08:00:00'
},
{
'claim_id': 'CLM-2024-004',
'user_id': 'user001@example.com',
'patient_name': 'John Doe',
'patient_dob': '1985-03-15',
'claim_date': '2024-03-10',
'claim_amount': '450.00',
'claim_type': 'medical',
'claim_status': 'pending',
'provider_name': 'Downtown Dental Clinic',
'provider_npi': '2233445566',
'diagnosis_code': 'K02.9',
'procedure_code': 'D0150',
'submitted_date': '2024-03-11 11:20:00',
'processed_date': '',
'approved_amount': '',
'denial_reason': '',
'notes': 'Dental examination and cleaning',
'created_by': 'user001@example.com',
'last_modified_by': 'user001@example.com',
'last_modified_date': '2024-03-11 11:20:00'
},
# Claims for user002@example.com (Jane Smith)
{
'claim_id': 'CLM-2024-005',
'user_id': 'user002@example.com',
'patient_name': 'Jane Smith',
'patient_dob': '1990-07-22',
'claim_date': '2024-01-15',
'claim_amount': '850.00',
'claim_type': 'medical',
'claim_status': 'approved',
'provider_name': 'Womens Health Center',
'provider_npi': '5544332211',
'diagnosis_code': 'Z00.00',
'procedure_code': '99395',
'submitted_date': '2024-01-16 10:00:00',
'processed_date': '2024-01-18 15:30:00',
'approved_amount': '680.00',
'denial_reason': '',
'notes': 'Annual gynecological exam and preventive care',
'created_by': 'user002@example.com',
'last_modified_by': 'adjuster001@example.com',
'last_modified_date': '2024-01-18 15:30:00'
},
{
'claim_id': 'CLM-2024-006',
'user_id': 'user002@example.com',
'patient_name': 'Jane Smith',
'patient_dob': '1990-07-22',
'claim_date': '2024-02-10',
'claim_amount': '125.00',
'claim_type': 'prescription',
'claim_status': 'approved',
'provider_name': 'Walgreens Pharmacy',
'provider_npi': '6655443322',
'diagnosis_code': 'H10.9',
'procedure_code': '90680',
'submitted_date': '2024-02-10 13:15:00',
'processed_date': '2024-02-11 09:00:00',
'approved_amount': '125.00',
'denial_reason': '',
'notes': 'Antibiotic prescription for eye infection',
'created_by': 'user002@example.com',
'last_modified_by': 'adjuster001@example.com',
'last_modified_date': '2024-02-11 09:00:00'
},
{
'claim_id': 'CLM-2024-007',
'user_id': 'user002@example.com',
'patient_name': 'Jane Smith',
'patient_dob': '1990-07-22',
'claim_date': '2024-02-25',
'claim_amount': '12500.00',
'claim_type': 'hospital',
'claim_status': 'approved',
'provider_name': 'St. Marys Hospital',
'provider_npi': '7766554433',
'diagnosis_code': 'O80',
'procedure_code': '59400',
'submitted_date': '2024-02-26 07:30:00',
'processed_date': '2024-03-05 16:00:00',
'approved_amount': '10000.00',
'denial_reason': '',
'notes': 'Childbirth and postpartum care',
'created_by': 'user002@example.com',
'last_modified_by': 'adjuster002@example.com',
'last_modified_date': '2024-03-05 16:00:00'
},
{
'claim_id': 'CLM-2024-008',
'user_id': 'user002@example.com',
'patient_name': 'Jane Smith',
'patient_dob': '1990-07-22',
'claim_date': '2024-03-15',
'claim_amount': '2000.00',
'claim_type': 'medical',
'claim_status': 'denied',
'provider_name': 'Cosmetic Surgery Center',
'provider_npi': '8877665544',
'diagnosis_code': 'Z41.1',
'procedure_code': '15780',
'submitted_date': '2024-03-16 14:00:00',
'processed_date': '2024-03-20 11:00:00',
'approved_amount': '0.00',
'denial_reason': 'Cosmetic procedures not covered by policy',
'notes': 'Facial cosmetic procedure',
'created_by': 'user002@example.com',
'last_modified_by': 'adjuster002@example.com',
'last_modified_date': '2024-03-20 11:00:00'
},
# Claims for adjuster001@example.com
{
'claim_id': 'CLM-2024-009',
'user_id': 'adjuster001@example.com',
'patient_name': 'Michael Johnson',
'patient_dob': '1978-11-30',
'claim_date': '2024-01-20',
'claim_amount': '500.00',
'claim_type': 'medical',
'claim_status': 'approved',
'provider_name': 'Quick Care Clinic',
'provider_npi': '1231231234',
'diagnosis_code': 'J20.9',
'procedure_code': '99214',
'submitted_date': '2024-01-21 08:00:00',
'processed_date': '2024-01-22 10:00:00',
'approved_amount': '500.00',
'denial_reason': '',
'notes': 'Urgent care visit for bronchitis',
'created_by': 'adjuster001@example.com',
'last_modified_by': 'adjuster002@example.com',
'last_modified_date': '2024-01-22 10:00:00'
}
]
def get_sample_users_data(self) -> List[Dict[str, Any]]:
"""Generate sample users data."""
return [
{
'user_id': 'user001@example.com',
'user_name': 'John Doe',
'user_role': 'patient',
'department': 'Individual',
'created_date': '2023-01-15 00:00:00'
},
{
'user_id': 'user002@example.com',
'user_name': 'Jane Smith',
'user_role': 'patient',
'department': 'Individual',
'created_date': '2023-02-20 00:00:00'
},
{
'user_id': 'adjuster001@example.com',
'user_name': 'Michael Johnson',
'user_role': 'adjuster',
'department': 'Claims Department',
'created_date': '2022-06-01 00:00:00'
},
{
'user_id': 'adjuster002@example.com',
'user_name': 'Sarah Williams',
'user_role': 'adjuster',
'department': 'Claims Department',
'created_date': '2022-08-15 00:00:00'
},
{
'user_id': 'admin@example.com',
'user_name': 'Admin User',
'user_role': 'admin',
'department': 'IT Department',
'created_date': '2022-01-01 00:00:00'
}
]
def upload_csv_to_s3(self, data: List[Dict[str, Any]], s3_key: str):
"""Upload data as CSV to S3."""
if not data:
print(f"⚠️ No data to upload for {s3_key}")
return
# Create CSV in memory
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
# Upload to S3
try:
self.s3_client.put_object(
Bucket=self.bucket_name,
Key=s3_key,
Body=output.getvalue().encode('utf-8')
)
print(f"✅ Uploaded {s3_key} to S3")
except Exception as e:
print(f"❌ Error uploading {s3_key}: {e}")
raise
def run_athena_query(self, query: str, wait_for_results: bool = True) -> str:
"""
Execute an Athena query and optionally wait for results.
Args:
query: SQL query to execute
wait_for_results: Whether to wait for query completion
Returns:
Query execution ID
"""
try:
# Prepare query execution parameters
query_params = {
'QueryString': query,
'ResultConfiguration': {
'OutputLocation': f's3://{self.bucket_name}/{self.query_results_prefix}'
}
}
# Only add Database context if not creating a database
if 'CREATE DATABASE' not in query.upper():
query_params['QueryExecutionContext'] = {'Database': self.database_name}
response = self.athena_client.start_query_execution(**query_params)
query_execution_id = response['QueryExecutionId']
if wait_for_results:
# Wait for query to complete
while True:
status_response = self.athena_client.get_query_execution(
QueryExecutionId=query_execution_id
)
status = status_response['QueryExecution']['Status']['State']
if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
if status == 'SUCCEEDED':
return query_execution_id
else:
error = status_response['QueryExecution']['Status'].get('StateChangeReason', 'Unknown error')
raise Exception(f"Query failed: {error}")
time.sleep(1)
return query_execution_id
except Exception as e:
print(f"❌ Error executing query: {e}")
raise
def store_parameters_in_ssm(self):
"""Store S3 bucket name and database name in SSM Parameter Store."""
print("\n💾 Storing configuration in SSM Parameter Store...")
parameters = [
{
'name': '/app/lakehouse-agent/s3-bucket-name',
'value': self.bucket_name,
'description': 'S3 bucket name for lakehouse data storage'
},
{
'name': '/app/lakehouse-agent/database-name',
'value': self.database_name,
'description': 'Athena/Glue database name for lakehouse'
}
]
for param in parameters:
try:
self.ssm_client.put_parameter(
Name=param['name'],
Value=param['value'],
Description=param['description'],
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: {param['name']} = {param['value']}")
except Exception as e:
print(f"❌ Error storing parameter {param['name']}: {e}")
raise
def setup(self):
"""Run the complete Athena setup."""
print("\n🚀 Starting Athena Setup for Health Lakehouse Data Processing")
print(f" Region: {self.region}")
print(f" S3 Bucket: {self.bucket_name}")
# Step 1: Create S3 bucket
self.create_s3_bucket()
# Step 2: Upload sample data
print("\n📤 Uploading sample data to S3...")
claims_data = self.get_sample_claims_data()
users_data = self.get_sample_users_data()
self.upload_csv_to_s3(claims_data, f'{self.claims_prefix}claims.csv')
self.upload_csv_to_s3(users_data, f'{self.users_prefix}users.csv')
# Step 3: Create Athena database
print("\n🗄️ Creating Athena database...")
create_db_query = f"CREATE DATABASE IF NOT EXISTS {self.database_name}"
try:
self.run_athena_query(create_db_query)
print(f"✅ Database {self.database_name} created")
except Exception as e:
print(f"❌ Error creating database: {e}")
return
# Step 4: Create claims table
print("\n📊 Creating claims table...")
create_claims_table_query = f"""
CREATE EXTERNAL TABLE IF NOT EXISTS {self.database_name}.claims (
claim_id STRING,
user_id STRING,
patient_name STRING,
patient_dob STRING,
claim_date STRING,
claim_amount STRING,
claim_type STRING,
claim_status STRING,
provider_name STRING,
provider_npi STRING,
diagnosis_code STRING,
procedure_code STRING,
submitted_date STRING,
processed_date STRING,
approved_amount STRING,
denial_reason STRING,
notes STRING,
created_by STRING,
last_modified_by STRING,
last_modified_date STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://{self.bucket_name}/{self.claims_prefix}'
TBLPROPERTIES ('skip.header.line.count'='1')
"""
try:
self.run_athena_query(create_claims_table_query)
print("✅ Claims table created")
except Exception as e:
print(f"❌ Error creating claims table: {e}")
return
# Step 5: Create users table
print("\n👥 Creating users table...")
create_users_table_query = f"""
CREATE EXTERNAL TABLE IF NOT EXISTS {self.database_name}.users (
user_id STRING,
user_name STRING,
user_role STRING,
department STRING,
created_date STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://{self.bucket_name}/{self.users_prefix}'
TBLPROPERTIES ('skip.header.line.count'='1')
"""
try:
self.run_athena_query(create_users_table_query)
print("✅ Users table created")
except Exception as e:
print(f"❌ Error creating users table: {e}")
return
# Step 6: Verify setup with test query
print("\n🔍 Verifying setup with test queries...")
try:
# Count claims
count_query = f"SELECT COUNT(*) as total_claims FROM {self.database_name}.claims"
self.run_athena_query(count_query)
print("✅ Claims table verification successful")
# Query claims for user001
user_query = f"SELECT claim_id, claim_type, claim_status FROM {self.database_name}.claims WHERE user_id = 'user001@example.com' LIMIT 5"
self.run_athena_query(user_query)
print("✅ User-specific query successful")
except Exception as e:
print(f"⚠️ Verification query failed: {e}")
# Step 7: Store configuration in SSM Parameter Store
self.store_parameters_in_ssm()
print("\n✨ Athena setup completed successfully!")
print(f"\n Database name: {self.database_name}")
print(f"📁 S3 bucket: s3://{self.bucket_name}")
print(f"📊 Tables created: claims, users")
print(f"💾 SSM Parameters:")
print(f" - /app/lakehouse-agent/s3-bucket-name")
print(f" - /app/lakehouse-agent/database-name")
print(f"\n🔐 Row-level access control ready:")
print(f" - user001@example.com: {len([c for c in claims_data if c['user_id'] == 'user001@example.com'])} claims")
print(f" - user002@example.com: {len([c for c in claims_data if c['user_id'] == 'user002@example.com'])} claims")
print(f" - adjuster001@example.com: {len([c for c in claims_data if c['user_id'] == 'adjuster001@example.com'])} claims")
def main():
parser = argparse.ArgumentParser(
description='Setup Athena database and tables for health lakehouse data processing'
)
parser.add_argument(
'--bucket-name',
required=False,
default=None,
help='Base name for S3 bucket (will be prefixed with {account_id}-{region_name}-). '
'If not provided, reads from SSM parameter /app/lakehouse-agent/s3-bucket-name. '
'Example: my-lakehouse'
)
args = parser.parse_args()
bucket_name = args.bucket_name
# If bucket name not provided, try to read from SSM
if not bucket_name:
print("📋 No --bucket-name provided, reading from SSM Parameter Store...")
try:
session = boto3.Session()
ssm = boto3.client('ssm', region_name=session.region_name)
response = ssm.get_parameter(Name='/app/lakehouse-agent/s3-bucket-name')
full_bucket_name = response['Parameter']['Value']
print(f"✅ Found bucket name in SSM: {full_bucket_name}")
# Extract the base name by removing the account-region prefix
# Format is: {account_id}-{region}-{base_name}
sts = boto3.client('sts')
account_id = sts.get_caller_identity()['Account']
region = session.region_name
prefix = f"{account_id}-{region}-"
if full_bucket_name.startswith(prefix):
bucket_name = full_bucket_name[len(prefix):]
print(f" Extracted base name: {bucket_name}")
else:
# Use the full bucket name as-is (might be a custom name)
bucket_name = full_bucket_name
print(f" Using full bucket name: {bucket_name}")
except Exception as e:
print(f"❌ Error reading bucket name from SSM: {e}")
print(" Please provide --bucket-name argument or set SSM parameter /app/lakehouse-agent/s3-bucket-name")
sys.exit(1)
# Run setup
setup = AthenaSetup(bucket_base_name=bucket_name)
setup.setup()
if __name__ == '__main__':
main()
@@ -0,0 +1,166 @@
#!/usr/bin/env python3
"""
Check M2M Client Configuration
This script checks the M2M client configuration and identifies any issues.
Usage:
python check_m2m_client.py
"""
import boto3
import json
import sys
def main():
session = boto3.Session()
region = session.region_name
cognito = boto3.client('cognito-idp', region_name=region)
ssm = boto3.client('ssm', region_name=region)
print("=" * 70)
print("Check M2M Client Configuration")
print("=" * 70)
# Get user pool ID and M2M client ID
try:
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
m2m_client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-id')['Parameter']['Value']
print(f"\n✅ Configuration found:")
print(f" User Pool ID: {user_pool_id}")
print(f" M2M Client ID: {m2m_client_id}")
except Exception as e:
print(f"\n❌ Error: Could not get configuration: {e}")
sys.exit(1)
# Get M2M client details
print(f"\n📋 M2M Client Configuration:")
print("=" * 70)
try:
response = cognito.describe_user_pool_client(
UserPoolId=user_pool_id,
ClientId=m2m_client_id
)
client = response['UserPoolClient']
print(f"\nClient Name: {client.get('ClientName', 'N/A')}")
print(f"\n🔑 Authentication Flows:")
print(f" ExplicitAuthFlows: {client.get('ExplicitAuthFlows', [])}")
print(f" AllowedOAuthFlows: {client.get('AllowedOAuthFlows', [])}")
print(f" AllowedOAuthFlowsUserPoolClient: {client.get('AllowedOAuthFlowsUserPoolClient', False)}")
print(f"\n🔐 OAuth Configuration:")
print(f" AllowedOAuthScopes: {client.get('AllowedOAuthScopes', [])}")
print(f" SupportedIdentityProviders: {client.get('SupportedIdentityProviders', [])}")
print(f" CallbackURLs: {client.get('CallbackURLs', [])}")
print(f" LogoutURLs: {client.get('LogoutURLs', [])}")
print(f"\n🔒 Security:")
print(f" PreventUserExistenceErrors: {client.get('PreventUserExistenceErrors', 'N/A')}")
# Check for issues
print(f"\n🔍 Validation:")
issues = []
if not client.get('AllowedOAuthFlowsUserPoolClient'):
issues.append("❌ AllowedOAuthFlowsUserPoolClient is False (must be True)")
else:
print(f" ✅ AllowedOAuthFlowsUserPoolClient is True")
if 'client_credentials' not in client.get('AllowedOAuthFlows', []):
issues.append("❌ client_credentials not in AllowedOAuthFlows")
else:
print(f" ✅ client_credentials flow is enabled")
if not client.get('AllowedOAuthScopes'):
issues.append("❌ No OAuth scopes configured")
else:
print(f" ✅ OAuth scopes configured: {len(client.get('AllowedOAuthScopes', []))} scopes")
if client.get('ExplicitAuthFlows'):
issues.append(f"⚠️ ExplicitAuthFlows should be empty for M2M-only client: {client.get('ExplicitAuthFlows')}")
else:
print(f" ✅ ExplicitAuthFlows is empty (M2M only)")
# Check for callback URLs (some Cognito configs require this)
if not client.get('CallbackURLs'):
issues.append("⚠️ No CallbackURLs configured (may cause invalid_grant error)")
print(f" ⚠️ CallbackURLs is empty (may need dummy URL)")
else:
print(f" ✅ CallbackURLs configured")
if issues:
print(f"\n❌ Issues Found:")
for issue in issues:
print(f" {issue}")
print(f"\n🔧 Recommended Fix:")
print(f" Run: python copy_m2m_client.py")
print(f" This will reconfigure the M2M client with correct settings")
else:
print(f"\n✅ M2M client configuration looks good!")
# Test token request
print(f"\n🧪 Testing Token Request:")
print("=" * 70)
try:
domain = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-domain')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-secret', WithDecryption=True)['Parameter']['Value']
import requests
import base64
token_endpoint = f"{domain}/oauth2/token"
credentials = f"{m2m_client_id}:{client_secret}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()
print(f" Token Endpoint: {token_endpoint}")
print(f" Client ID: {m2m_client_id}")
print(f" Attempting token request...")
response = requests.post(
token_endpoint,
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Authorization': f'Basic {encoded_credentials}'
},
data={
'grant_type': 'client_credentials',
'scope': ' '.join(client.get('AllowedOAuthScopes', []))
}
)
if response.status_code == 200:
print(f"\n ✅ Token request successful!")
token_data = response.json()
print(f" Access Token: {token_data['access_token'][:50]}...")
print(f" Token Type: {token_data['token_type']}")
print(f" Expires In: {token_data['expires_in']} seconds")
else:
print(f"\n ❌ Token request failed!")
print(f" Status Code: {response.status_code}")
print(f" Response: {response.text}")
if response.status_code == 400 and 'invalid_grant' in response.text:
print(f"\n 💡 Possible causes of invalid_grant:")
print(f" 1. AllowedOAuthFlowsUserPoolClient not set to True")
print(f" 2. Missing CallbackURLs (add dummy URL)")
print(f" 3. Client not properly configured for client_credentials")
print(f"\n 🔧 Fix: python copy_m2m_client.py")
except Exception as e:
print(f"\n ❌ Error testing token request: {e}")
except Exception as e:
print(f"\n❌ Error: Could not get client details: {e}")
sys.exit(1)
print("\n" + "=" * 70)
if __name__ == '__main__':
main()
@@ -0,0 +1,131 @@
#!/usr/bin/env python3
"""
Decode JWT Token to inspect claims
This script gets a token from the M2M client and decodes it to see what's inside.
"""
import boto3
import json
import base64
import requests
def decode_jwt(token):
"""Decode JWT token without verification (for inspection only)."""
parts = token.split('.')
if len(parts) != 3:
print("Invalid JWT token format")
return None
# Decode header
header = json.loads(base64.urlsafe_b64decode(parts[0] + '=='))
# Decode payload
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=='))
return header, payload
def main():
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
print("=" * 70)
print("Decode M2M JWT Token")
print("=" * 70)
# Get M2M client credentials
try:
client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-id')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-secret', WithDecryption=True)['Parameter']['Value']
domain = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-domain')['Parameter']['Value']
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
print(f"\n✅ Configuration loaded:")
print(f" Client ID: {client_id}")
print(f" Domain: {domain}")
print(f" User Pool ID: {user_pool_id}")
except Exception as e:
print(f"\n❌ Error: {e}")
return
# Get token
print(f"\n🔐 Requesting token...")
token_endpoint = f"{domain}/oauth2/token"
credentials = f"{client_id}:{client_secret}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()
# Get the configured scopes
cognito = boto3.client('cognito-idp', region_name=region)
client_details = cognito.describe_user_pool_client(
UserPoolId=user_pool_id,
ClientId=client_id
)
allowed_scopes = client_details['UserPoolClient'].get('AllowedOAuthScopes', [])
scope_string = ' '.join(allowed_scopes)
response = requests.post(
token_endpoint,
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Authorization': f'Basic {encoded_credentials}'
},
data={
'grant_type': 'client_credentials',
'scope': scope_string
}
)
if response.status_code != 200:
print(f"❌ Token request failed: {response.status_code}")
print(f" Response: {response.text}")
return
token_data = response.json()
access_token = token_data['access_token']
print(f"✅ Token received")
# Decode token
print(f"\n📋 Decoding JWT Token...")
header, payload = decode_jwt(access_token)
print(f"\n🔑 JWT Header:")
print(json.dumps(header, indent=2))
print(f"\n📦 JWT Payload:")
print(json.dumps(payload, indent=2))
# Check important claims
print(f"\n🔍 Important Claims:")
print(f" Issuer (iss): {payload.get('iss', 'N/A')}")
print(f" Client ID (client_id): {payload.get('client_id', 'N/A')}")
print(f" Audience (aud): {payload.get('aud', 'N/A')}")
print(f" Scope: {payload.get('scope', 'N/A')}")
print(f" Token Use: {payload.get('token_use', 'N/A')}")
print(f" Expires (exp): {payload.get('exp', 'N/A')}")
# Check if issuer matches expected format
expected_issuer = f"https://cognito-idp.{region}.amazonaws.com/{user_pool_id}"
if payload.get('iss') == expected_issuer:
print(f"\n ✅ Issuer matches expected format")
else:
print(f"\n ⚠️ Issuer mismatch!")
print(f" Expected: {expected_issuer}")
print(f" Got: {payload.get('iss')}")
# Check client_id
if payload.get('client_id') == client_id:
print(f" ✅ Client ID matches")
else:
print(f" ⚠️ Client ID mismatch!")
print(f" Expected: {client_id}")
print(f" Got: {payload.get('client_id')}")
print("\n" + "=" * 70)
if __name__ == '__main__':
main()
@@ -0,0 +1,515 @@
#!/usr/bin/env python3
"""
Cognito Setup for Health Lakehouse Data
Creates User Pool, App Client, Resource Server, and test users with OAuth scopes
Writes configuration to SSM Parameter Store
Usage:
python setup_cognito.py
"""
import boto3
import json
import os
import re
from pathlib import Path
from typing import Dict, Optional
class CognitoSetup:
def __init__(self):
"""Initialize Cognito setup with region from boto3 session."""
# Get region from boto3 session
session = boto3.Session()
self.region = session.region_name
self.cognito = boto3.client('cognito-idp', region_name=self.region)
self.ssm = boto3.client('ssm', region_name=self.region)
self.sts = boto3.client('sts', region_name=self.region)
self.env_file = Path(__file__).parent.parent / '.env'
print(f"Initialized Cognito setup for region: {self.region}")
def find_existing_user_pool(self, pool_name: str) -> Optional[str]:
"""Find existing user pool by name."""
try:
paginator = self.cognito.get_paginator('list_user_pools')
for page in paginator.paginate(MaxResults=60):
for pool in page.get('UserPools', []):
if pool['Name'] == pool_name:
print(f"️ Found existing User Pool: {pool['Id']}")
return pool['Id']
except Exception as e:
print(f"⚠️ Error searching for user pool: {e}")
return None
def get_user_pool_client(self, user_pool_id: str, client_name: str) -> Optional[Dict]:
"""Get existing app client by name."""
try:
paginator = self.cognito.get_paginator('list_user_pool_clients')
for page in paginator.paginate(UserPoolId=user_pool_id, MaxResults=60):
for client in page.get('UserPoolClients', []):
if client['ClientName'] == client_name:
# Get full client details including secret
full_client = self.cognito.describe_user_pool_client(
UserPoolId=user_pool_id,
ClientId=client['ClientId']
)
print(f"️ Found existing App Client: {client['ClientId']}")
return full_client['UserPoolClient']
except Exception as e:
print(f"⚠️ Error searching for app client: {e}")
return None
def get_user_pool_domain(self, user_pool_id: str) -> Optional[str]:
"""Get existing domain for user pool."""
try:
response = self.cognito.describe_user_pool(UserPoolId=user_pool_id)
domain = response['UserPool'].get('Domain')
if domain:
domain_url = f'https://{domain}.auth.{self.region}.amazoncognito.com'
print(f"️ Found existing domain: {domain_url}")
return domain_url
except Exception as e:
print(f"⚠️ Error getting domain: {e}")
return None
def store_parameters_in_ssm(self, config: Dict):
"""
Store Cognito configuration in SSM Parameter Store.
Args:
config: Dictionary with user_pool_id, client_id, domain, m2m_client_id, m2m_client_secret, and optionally client_secret
"""
print("\n💾 Storing configuration in SSM Parameter Store...")
# Get account ID for constructing ARN
account_id = self.sts.get_caller_identity()['Account']
user_pool_arn = f"arn:aws:cognito-idp:{self.region}:{account_id}:userpool/{config['user_pool_id']}"
parameters = [
{
'name': '/app/lakehouse-agent/cognito-user-pool-id',
'value': config['user_pool_id'],
'description': 'Cognito User Pool ID for authentication'
},
{
'name': '/app/lakehouse-agent/cognito-user-pool-arn',
'value': user_pool_arn,
'description': 'Cognito User Pool ARN'
},
{
'name': '/app/lakehouse-agent/cognito-app-client-id',
'value': config['client_id'],
'description': 'Cognito App Client ID (supports user auth and M2M)'
},
{
'name': '/app/lakehouse-agent/cognito-domain',
'value': config['domain'],
'description': 'Cognito domain URL for OAuth'
},
{
'name': '/app/lakehouse-agent/cognito-resource-server-id',
'value': 'lakehouse-api',
'description': 'Cognito Resource Server identifier'
},
{
'name': '/app/lakehouse-agent/cognito-region',
'value': self.region,
'description': 'AWS region for Cognito'
},
{
'name': '/app/lakehouse-agent/cognito-m2m-client-id',
'value': config['m2m_client_id'],
'description': 'Cognito M2M-only App Client ID (client_credentials only)'
}
]
# Store client secret as SecureString if available
if 'client_secret' in config and config['client_secret']:
try:
self.ssm.put_parameter(
Name='/app/lakehouse-agent/cognito-app-client-secret',
Value=config['client_secret'],
Description='Cognito App Client Secret (SecureString)',
Type='SecureString',
Overwrite=True
)
print(f"✅ Stored parameter (SecureString): /app/lakehouse-agent/cognito-app-client-secret")
except Exception as e:
print(f"❌ Error storing client secret: {e}")
raise
# Store M2M client secret as SecureString
if 'm2m_client_secret' in config and config['m2m_client_secret']:
try:
self.ssm.put_parameter(
Name='/app/lakehouse-agent/cognito-m2m-client-secret',
Value=config['m2m_client_secret'],
Description='Cognito M2M App Client Secret (SecureString)',
Type='SecureString',
Overwrite=True
)
print(f"✅ Stored parameter (SecureString): /app/lakehouse-agent/cognito-m2m-client-secret")
except Exception as e:
print(f"❌ Error storing M2M client secret: {e}")
raise
# Store other parameters as String type
for param in parameters:
try:
self.ssm.put_parameter(
Name=param['name'],
Value=param['value'],
Description=param['description'],
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: {param['name']} = {param['value']}")
except Exception as e:
print(f"❌ Error storing parameter {param['name']}: {e}")
raise
def write_to_env(self, config: Dict):
"""
Write configuration to .env file.
Note: This function is deprecated and will be removed in a future version.
Configuration should be managed through SSM Parameter Store.
This is kept temporarily for backward compatibility during migration.
"""
print(f"⚠️ Warning: .env file updates are deprecated. Please migrate to SSM Parameter Store.")
print(f" Run: python ../ssm_migrate.py --migrate")
try:
# Read existing .env file
env_content = {}
if self.env_file.exists():
with open(self.env_file, 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, value = line.split('=', 1)
env_content[key.strip()] = value.strip()
# Update with new values
env_content['COGNITO_USER_POOL_ID'] = config['user_pool_id']
env_content['COGNITO_APP_CLIENT_ID'] = config['client_id']
if 'client_secret' in config:
env_content['COGNITO_APP_CLIENT_SECRET'] = config['client_secret']
env_content['COGNITO_DOMAIN'] = config['domain']
env_content['COGNITO_RESOURCE_SERVER_ID'] = 'lakehouse-api'
# Construct User Pool ARN
user_pool_arn = f"arn:aws:cognito-idp:{self.region}:{self.sts.get_caller_identity()['Account']}:userpool/{config['user_pool_id']}"
env_content['COGNITO_USER_POOL_ARN'] = user_pool_arn
# Write back to .env file
with open(self.env_file, 'w') as f:
for key, value in sorted(env_content.items()):
f.write(f"{key}={value}\n")
print(f"\n✅ Configuration written to {self.env_file} (for backward compatibility)")
except Exception as e:
print(f"❌ Error writing to .env file: {e}")
raise
def setup(self, pool_name: str = 'lakehouse-pool') -> Dict:
# Check for existing User Pool
user_pool_id = self.find_existing_user_pool(pool_name)
if not user_pool_id:
# Create User Pool with username-password authentication enabled
# NOTE: NOT using UsernameAttributes to allow email to be the actual username
pool_response = self.cognito.create_user_pool(
PoolName=pool_name,
Policies={
'PasswordPolicy': {
'MinimumLength': 8,
'RequireUppercase': True,
'RequireLowercase': True,
'RequireNumbers': True,
'RequireSymbols': True
}
},
AutoVerifiedAttributes=['email'],
# NOT setting UsernameAttributes - this allows email to be used as username directly
Schema=[
{
'Name': 'email',
'Required': True,
'Mutable': True
}
],
AdminCreateUserConfig={
'AllowAdminCreateUserOnly': False # Allow users to sign up
}
)
user_pool_id = pool_response['UserPool']['Id']
print(f"✅ User Pool created: {user_pool_id}")
print(f" Note: Email will be used as username (not as alias)")
else:
print(f"✅ Using existing User Pool: {user_pool_id}")
print(f" ⚠️ Warning: If this pool was created with UsernameAttributes=['email'],")
print(f" users will have UUID usernames. Delete the pool and recreate, or")
print(f" run cleanup_test_users.py to delete old users.")
# Create Resource Server with scopes (if not exists)
# Note: Scope names cannot contain '/' - using '.' instead
try:
resource_server = self.cognito.create_resource_server(
UserPoolId=user_pool_id,
Identifier='lakehouse-api',
Name='Lakehouse Data API',
Scopes=[
{'ScopeName': 'claims.query', 'ScopeDescription': 'Query claims'},
{'ScopeName': 'claims.submit', 'ScopeDescription': 'Submit claims'},
{'ScopeName': 'claims.update', 'ScopeDescription': 'Update claims'},
{'ScopeName': 'claims.approve', 'ScopeDescription': 'Approve/deny claims'}
]
)
print("✅ Resource Server created with scopes")
except self.cognito.exceptions.ResourceNotFoundException:
print("️ Resource Server already exists")
except Exception as e:
if 'already exists' in str(e).lower():
print("️ Resource Server already exists")
else:
raise
# Check for existing App Client
existing_client = self.get_user_pool_client(user_pool_id, 'lakehouse-client')
if existing_client:
client_id = existing_client['ClientId']
client_secret = existing_client.get('ClientSecret')
print(f"️ App Client exists: {client_id}")
print(f" Updating to support both client_credentials and user authentication...")
# Update existing client to support both flows
self.cognito.update_user_pool_client(
UserPoolId=user_pool_id,
ClientId=client_id,
ClientName='lakehouse-client',
ExplicitAuthFlows=[
'ALLOW_USER_SRP_AUTH', # Secure Remote Password (SRP) auth
'ALLOW_ADMIN_USER_PASSWORD_AUTH', # Admin user password auth (for testing)
'ALLOW_REFRESH_TOKEN_AUTH' # Refresh token auth
],
AllowedOAuthFlows=['client_credentials'], # Machine-to-machine authentication
AllowedOAuthScopes=[
'lakehouse-api/claims.query',
'lakehouse-api/claims.submit',
'lakehouse-api/claims.update',
'lakehouse-api/claims.approve'
],
AllowedOAuthFlowsUserPoolClient=True,
PreventUserExistenceErrors='ENABLED' # Security best practice
)
print(f"✅ App Client updated to support user authentication and client_credentials")
else:
# Create App Client supporting both user auth and client credentials
client_response = self.cognito.create_user_pool_client(
UserPoolId=user_pool_id,
ClientName='lakehouse-client',
GenerateSecret=True,
ExplicitAuthFlows=[
'ALLOW_USER_SRP_AUTH', # Secure Remote Password (SRP) auth
'ALLOW_ADMIN_USER_PASSWORD_AUTH', # Admin user password auth (for testing)
'ALLOW_REFRESH_TOKEN_AUTH' # Refresh token auth
],
AllowedOAuthFlows=['client_credentials'], # Machine-to-machine authentication
AllowedOAuthScopes=[
'lakehouse-api/claims.query',
'lakehouse-api/claims.submit',
'lakehouse-api/claims.update',
'lakehouse-api/claims.approve'
],
AllowedOAuthFlowsUserPoolClient=True,
PreventUserExistenceErrors='ENABLED' # Security best practice
)
client_id = client_response['UserPoolClient']['ClientId']
client_secret = client_response['UserPoolClient'].get('ClientSecret')
print(f"✅ App Client created: {client_id}")
# Check for existing domain or create new one
domain_url = self.get_user_pool_domain(user_pool_id)
if not domain_url:
# Create domain
# Domain names can only contain lowercase letters, numbers, and hyphens
# Extract only alphanumeric characters from pool ID and convert to lowercase
pool_id_clean = re.sub(r'[^a-zA-Z0-9]', '', user_pool_id).lower()[:8]
domain_name = f'lakehouse-{pool_id_clean}'
try:
self.cognito.create_user_pool_domain(Domain=domain_name, UserPoolId=user_pool_id)
domain_url = f'https://{domain_name}.auth.{self.region}.amazoncognito.com'
print(f"✅ Domain created: {domain_url}")
except Exception as e:
if 'already exists' in str(e).lower() or 'domain' in str(e).lower():
domain_url = f'https://{domain_name}.auth.{self.region}.amazoncognito.com'
print(f"️ Domain already exists: {domain_url}")
else:
raise
else:
print(f"✅ Using existing domain: {domain_url}")
# Create test users with email as username (skip if already exist)
test_users = [
{'email': 'user001@example.com', 'name': 'User 001'},
{'email': 'user002@example.com', 'name': 'User 002'},
{'email': 'adjuster001@example.com', 'name': 'Adjuster 001'}
]
for user in test_users:
email = user['email']
try:
# Create user with email as username
self.cognito.admin_create_user(
UserPoolId=user_pool_id,
Username=email, # Username is the email address
UserAttributes=[
{'Name': 'email', 'Value': email},
{'Name': 'email_verified', 'Value': 'true'}
],
TemporaryPassword='TempPass123!',
MessageAction='SUPPRESS' # Don't send welcome email
)
print(f"✅ Test user created: {email} (username: {email})")
except self.cognito.exceptions.UsernameExistsException:
print(f"️ Test user already exists: {email}")
except Exception as e:
if 'already exists' in str(e).lower():
print(f"️ Test user already exists: {email}")
else:
print(f"⚠️ Error creating user {email}: {e}")
result = {
'user_pool_id': user_pool_id,
'client_id': client_id,
'domain': domain_url
}
if 'client_secret' in locals() and client_secret:
result['client_secret'] = client_secret
# Create M2M-only app client
m2m_client = self.create_m2m_client(user_pool_id)
result['m2m_client_id'] = m2m_client['client_id']
result['m2m_client_secret'] = m2m_client['client_secret']
# Store configuration in SSM Parameter Store
self.store_parameters_in_ssm(result)
# Write to .env file (deprecated, for backward compatibility)
self.write_to_env(result)
return result
def create_m2m_client(self, user_pool_id: str) -> Dict:
"""
Create M2M-only app client with client_credentials OAuth flow.
Args:
user_pool_id: Cognito User Pool ID
Returns:
Dictionary with client_id and client_secret
"""
print(f"\n🤖 Creating M2M-only app client...")
# Check for existing M2M client
existing_m2m_client = self.get_user_pool_client(user_pool_id, 'lakehouse-m2m-client')
# M2M client configuration (client_credentials flow only)
client_config = {
'UserPoolId': user_pool_id,
'ClientName': 'lakehouse-m2m-client',
'GenerateSecret': True,
'ExplicitAuthFlows': [], # No user auth flows for M2M
'AllowedOAuthFlows': ['client_credentials'], # Only client_credentials
'AllowedOAuthScopes': [
'lakehouse-api/claims.query',
'lakehouse-api/claims.submit',
'lakehouse-api/claims.update',
'lakehouse-api/claims.approve'
],
'AllowedOAuthFlowsUserPoolClient': True,
'SupportedIdentityProviders': [], # No identity providers for M2M
'CallbackURLs': ['https://localhost'], # Dummy URL for M2M
'PreventUserExistenceErrors': 'ENABLED'
}
if existing_m2m_client:
client_id = existing_m2m_client['ClientId']
print(f"️ M2M App Client exists: {client_id}")
print(f" Updating configuration...")
# Update with M2M configuration
# Remove GenerateSecret as it's not valid for update_user_pool_client
update_config = {k: v for k, v in client_config.items() if k != 'GenerateSecret'}
update_config['ClientId'] = client_id
self.cognito.update_user_pool_client(**update_config)
# Get updated client to retrieve secret
updated_client = self.cognito.describe_user_pool_client(
UserPoolId=user_pool_id,
ClientId=client_id
)
client_secret = updated_client['UserPoolClient'].get('ClientSecret')
print(f"✅ M2M App Client updated with client_credentials flow")
else:
# Create with M2M configuration
client_response = self.cognito.create_user_pool_client(**client_config)
client_id = client_response['UserPoolClient']['ClientId']
client_secret = client_response['UserPoolClient'].get('ClientSecret')
print(f"✅ M2M App Client created: {client_id}")
print(f" Configuration: client_credentials flow only")
return {
'client_id': client_id,
'client_secret': client_secret
}
if __name__ == '__main__':
setup = CognitoSetup()
result = setup.setup()
print(f"\n📝 Configuration:\n{json.dumps({k: v for k, v in result.items() if 'secret' not in k}, indent=2)}")
if 'client_secret' in result:
print(f"\n🔐 User App Client Secret: {result['client_secret']}")
print(f" (Also stored securely in SSM Parameter Store)")
if 'm2m_client_secret' in result:
print(f"\n🤖 M2M App Client Secret: {result['m2m_client_secret']}")
print(f" (Also stored securely in SSM Parameter Store)")
print(f"\n💾 SSM Parameters Stored:")
print(f" • /app/lakehouse-agent/cognito-user-pool-id")
print(f" • /app/lakehouse-agent/cognito-user-pool-arn")
print(f" • /app/lakehouse-agent/cognito-app-client-id (user auth + M2M)")
print(f" • /app/lakehouse-agent/cognito-app-client-secret (SecureString)")
print(f" • /app/lakehouse-agent/cognito-m2m-client-id (M2M only)")
print(f" • /app/lakehouse-agent/cognito-m2m-client-secret (SecureString)")
print(f" • /app/lakehouse-agent/cognito-domain")
print(f" • /app/lakehouse-agent/cognito-resource-server-id")
print(f" • /app/lakehouse-agent/cognito-region")
print(f"\n👥 Test Users Created:")
print(f" • user001@example.com (username: user001@example.com)")
print(f" • user002@example.com (username: user002@example.com)")
print(f" • adjuster001@example.com (username: adjuster001@example.com)")
print(f" Default password: TempPass123!")
print(f" Note: Users will be prompted to change password on first login")
print(f"\n🔑 App Clients:")
print(f" 1. lakehouse-client (ID: {result['client_id']})")
print(f" - Supports: User authentication (SRP, Admin Password) + M2M")
print(f" - Use for: Streamlit UI, user-facing applications")
print(f" 2. lakehouse-m2m-client (ID: {result['m2m_client_id']})")
print(f" - Supports: M2M only (client_credentials)")
print(f" - Use for: Gateway-to-Runtime, service-to-service, test scripts")
print(f"\n⚠️ If you see UUID usernames instead of emails:")
print(f" 1. Run: python cleanup_test_users.py")
print(f" 2. Delete the User Pool from AWS Console")
print(f" 3. Run this script again to recreate with correct settings")
@@ -0,0 +1,122 @@
#!/usr/bin/env python3
"""
Check Cognito Users
Quick script to check user status and usernames.
Usage:
python check_users.py
"""
import boto3
import sys
def check_users():
"""Check Cognito users."""
# Get region and SSM client
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
cognito = boto3.client('cognito-idp', region_name=region)
print("=" * 70)
print("Cognito Users Check")
print("=" * 70)
# Get User Pool ID from SSM
try:
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
print(f"\n✅ User Pool ID: {user_pool_id}")
except Exception as e:
print(f"❌ Error loading User Pool ID from SSM: {e}")
sys.exit(1)
# Check User Pool configuration
print(f"\n📋 User Pool Configuration:")
try:
pool_response = cognito.describe_user_pool(UserPoolId=user_pool_id)
pool = pool_response['UserPool']
username_attrs = pool.get('UsernameAttributes', [])
alias_attrs = pool.get('AliasAttributes', [])
print(f" Username Attributes: {username_attrs if username_attrs else 'None (email can be username)'}")
print(f" Alias Attributes: {alias_attrs if alias_attrs else 'None'}")
if username_attrs and 'email' in username_attrs:
print(f"\n ⚠️ WARNING: UsernameAttributes includes 'email'")
print(f" This means users have UUID usernames, not email addresses!")
print(f" You need to delete the User Pool and recreate it without UsernameAttributes")
except Exception as e:
print(f"❌ Error describing User Pool: {e}")
# List users
print(f"\n👥 Users in User Pool:")
try:
response = cognito.list_users(UserPoolId=user_pool_id, Limit=20)
users = response.get('Users', [])
if not users:
print(f" No users found")
print(f"\n Run setup_cognito.py to create test users")
else:
for i, user in enumerate(users, 1):
username = user['Username']
status = user['UserStatus']
enabled = user.get('Enabled', True)
email = None
email_verified = None
for attr in user.get('Attributes', []):
if attr['Name'] == 'email':
email = attr['Value']
elif attr['Name'] == 'email_verified':
email_verified = attr['Value']
print(f"\n User {i}:")
print(f" ├─ Username: {username}")
print(f" ├─ Email: {email}")
print(f" ├─ Status: {status}")
print(f" ├─ Enabled: {enabled}")
print(f" └─ Email Verified: {email_verified}")
# Check if username is UUID (indicates UsernameAttributes was set)
if len(username) == 36 and username.count('-') == 4:
print(f" ⚠️ Username is UUID - User Pool has UsernameAttributes=['email']")
print(f" ⚠️ Login with email won't work!")
if status == 'FORCE_CHANGE_PASSWORD':
print(f" ⚠️ User must change temporary password on first login")
except Exception as e:
print(f"❌ Error listing users: {e}")
print("\n" + "=" * 70)
print("Recommendations:")
print("=" * 70)
# Check if any user has UUID username
try:
response = cognito.list_users(UserPoolId=user_pool_id, Limit=1)
if response.get('Users'):
username = response['Users'][0]['Username']
if len(username) == 36 and username.count('-') == 4:
print("\n❌ ISSUE FOUND: Users have UUID usernames")
print("\n Solution:")
print(" 1. Run: python cleanup_test_users.py")
print(" 2. Delete the User Pool from AWS Console")
print(" 3. Run: python setup_cognito.py")
print("\n This will create a new User Pool where email IS the username")
else:
print("\n✅ Users have email as username - configuration is correct")
print("\n If login still fails:")
print(" 1. Run: python test_cognito_login.py")
print(" 2. Try the default password: TempPass123!")
print(" 3. You may need to change password on first login")
except:
pass
print()
if __name__ == '__main__':
check_users()
@@ -0,0 +1,92 @@
#!/usr/bin/env python3
"""
Cleanup Test Users in Cognito User Pool
This script deletes existing test users so they can be recreated with proper usernames.
Usage:
python cleanup_test_users.py
"""
import boto3
import sys
def cleanup_users():
"""Delete test users from Cognito User Pool."""
# Get region and SSM client
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
cognito = boto3.client('cognito-idp', region_name=region)
# Get User Pool ID from SSM
try:
response = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')
user_pool_id = response['Parameter']['Value']
print(f"✅ Found User Pool ID: {user_pool_id}")
except Exception as e:
print(f"❌ Error: Could not find User Pool ID in SSM: {e}")
print(" Please run setup_cognito.py first")
sys.exit(1)
# Test user emails to clean up
test_emails = [
'user001@example.com',
'user002@example.com',
'adjuster001@example.com'
]
print(f"\n🔍 Searching for test users in User Pool...")
# List all users and find test users
deleted_count = 0
try:
paginator = cognito.get_paginator('list_users')
for page in paginator.paginate(UserPoolId=user_pool_id):
for user in page.get('Users', []):
username = user['Username']
# Check if user has one of the test emails
user_email = None
for attr in user.get('Attributes', []):
if attr['Name'] == 'email':
user_email = attr['Value']
break
if user_email in test_emails:
print(f" Found test user: {user_email} (username: {username})")
# Delete the user
try:
cognito.admin_delete_user(
UserPoolId=user_pool_id,
Username=username
)
print(f" ✅ Deleted user: {username}")
deleted_count += 1
except Exception as e:
print(f" ❌ Error deleting user {username}: {e}")
if deleted_count == 0:
print(" ️ No test users found to delete")
else:
print(f"\n✅ Deleted {deleted_count} test user(s)")
print(f"\n📝 Next step: Run setup_cognito.py to recreate users with proper usernames")
print(f" cd gateway-setup")
print(f" python setup_cognito.py")
except Exception as e:
print(f"❌ Error listing users: {e}")
sys.exit(1)
if __name__ == '__main__':
print("=" * 70)
print("Cognito Test Users Cleanup")
print("=" * 70)
response = input("\n⚠️ This will delete test users. Continue? (yes/no): ")
if response.lower() not in ['yes', 'y']:
print("Cleanup cancelled")
sys.exit(0)
cleanup_users()
@@ -0,0 +1,752 @@
#!/usr/bin/env python3
"""
Create AgentCore Gateway for Health Lakehouse Data
This script creates and configures an AgentCore Gateway that:
1. Connects to the MCP Athena server (running on AgentCore Runtime)
2. Uses the Gateway interceptor for JWT validation
3. Enforces fine-grained access control
4. Propagates user identity to the MCP server
Prerequisites:
- MCP server deployed to AgentCore Runtime
- Interceptor Lambda function deployed
- Cognito User Pool configured
- Configuration in SSM Parameter Store
Usage:
python create_gateway.py
"""
import boto3
import sys
import json
from typing import Dict, Any
class SSMConfig:
"""Load configuration from SSM Parameter Store."""
def __init__(self):
"""Initialize and load configuration from SSM."""
# Get region from boto3 session
session = boto3.Session()
self.region = session.region_name
self.ssm = boto3.client('ssm', region_name=self.region)
self.sts = boto3.client('sts', region_name=self.region)
# Get account ID
self.account_id = self.sts.get_caller_identity()['Account']
print(f"✅ Using AWS configuration")
print(f" Region: {self.region}")
print(f" Account: {self.account_id}")
# Load configuration from SSM
print(f"\n🔍 Loading configuration from SSM Parameter Store...")
self.mcp_server_runtime_arn = self._get_parameter('/app/lakehouse-agent/mcp-server-runtime-arn')
self.interceptor_lambda_arn = self._get_parameter('/app/lakehouse-agent/interceptor-lambda-arn')
self.cognito_user_pool_arn = self._get_parameter('/app/lakehouse-agent/cognito-user-pool-arn')
self.cognito_app_client_id = self._get_parameter('/app/lakehouse-agent/cognito-app-client-id')
self.cognito_app_client_secret = self._get_parameter('/app/lakehouse-agent/cognito-app-client-secret', secure=True)
self.cognito_domain = self._get_parameter('/app/lakehouse-agent/cognito-domain')
# Load M2M client credentials for Gateway-to-Runtime authentication
try:
self.cognito_m2m_client_id = self._get_parameter('/app/lakehouse-agent/cognito-m2m-client-id')
self.cognito_m2m_client_secret = self._get_parameter('/app/lakehouse-agent/cognito-m2m-client-secret', secure=True)
print(f" ✅ M2M Client ID: {self.cognito_m2m_client_id}")
print(f" ✅ M2M Client Secret: ****** (loaded)")
self.has_m2m_client = True
except:
print(f" ⚠️ M2M client not found, will use hybrid client for Gateway-to-Runtime auth")
self.cognito_m2m_client_id = self.cognito_app_client_id
self.cognito_m2m_client_secret = self.cognito_app_client_secret
self.has_m2m_client = False
print(f" ✅ MCP Server Runtime ARN: {self.mcp_server_runtime_arn}")
print(f" ✅ Interceptor Lambda ARN: {self.interceptor_lambda_arn}")
print(f" ✅ Cognito User Pool ARN: {self.cognito_user_pool_arn}")
print(f" ✅ Cognito App Client ID: {self.cognito_app_client_id}")
print(f" ✅ Cognito Client Secret: ****** (loaded)")
print(f" ✅ Cognito Domain: {self.cognito_domain}")
def _get_parameter(self, parameter_name: str, secure: bool = False) -> str:
"""Get parameter value from SSM Parameter Store."""
try:
response = self.ssm.get_parameter(Name=parameter_name, WithDecryption=secure)
return response['Parameter']['Value']
except self.ssm.exceptions.ParameterNotFound:
print(f"❌ SSM parameter {parameter_name} not found")
print(f" Please run the setup scripts first")
sys.exit(1)
except Exception as e:
print(f"❌ Error retrieving parameter {parameter_name}: {e}")
sys.exit(1)
def store_gateway_parameters(self, gateway_id: str, gateway_arn: str, gateway_url: str, gateway_name: str):
"""Store Gateway information in SSM Parameter Store."""
print("\n💾 Storing gateway configuration in SSM Parameter Store...")
parameters = [
{
'name': '/app/lakehouse-agent/gateway-id',
'value': gateway_id,
'description': 'AgentCore Gateway ID'
},
{
'name': '/app/lakehouse-agent/gateway-arn',
'value': gateway_arn,
'description': 'AgentCore Gateway ARN'
},
{
'name': '/app/lakehouse-agent/gateway-url',
'value': gateway_url,
'description': 'AgentCore Gateway URL'
},
{
'name': '/app/lakehouse-agent/gateway-name',
'value': gateway_name,
'description': 'AgentCore Gateway Name'
}
]
for param in parameters:
try:
self.ssm.put_parameter(
Name=param['name'],
Value=param['value'],
Description=param['description'],
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: {param['name']} = {param['value']}")
except Exception as e:
print(f"❌ Error storing parameter {param['name']}: {e}")
raise
class GatewaySetup:
def __init__(self, config: SSMConfig):
"""
Initialize Gateway setup.
Args:
config: SSM configuration object
"""
self.config = config
self.client = boto3.client('bedrock-agentcore-control', region_name=config.region)
def create_gateway_role(self, gateway_name: str) -> str:
"""
Create IAM role for Gateway.
Args:
gateway_name: Name for the gateway
Returns:
Role ARN
"""
iam = boto3.client('iam', region_name=self.config.region)
role_name = f'agentcore-{gateway_name}-role'
# Trust policy for AgentCore Gateway
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "bedrock-agentcore.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
# Policy document with all required permissions
policy_document = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "InvokeRuntimeTarget",
"Effect": "Allow",
"Action": [
"bedrock-agentcore:InvokeAgentRuntime",
"bedrock-agentcore:InvokeAgentRuntimeForUser",
"bedrock-agentcore:InvokeGateway"
],
"Resource": "*"
},
{
"Sid": "InvokeLambda",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": f"arn:aws:lambda:{self.config.region}:{self.config.account_id}:function:*"
},
{
"Sid": "WorkloadIdentity",
"Effect": "Allow",
"Action": [
"bedrock-agentcore:GetWorkloadAccessToken",
"bedrock-agentcore:GetWorkloadAccessTokenForJWT",
"bedrock-agentcore:GetWorkloadAccessTokenForUserId",
"bedrock-agentcore:CreateWorkloadIdentity"
],
"Resource": [
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:workload-identity-directory/default",
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:workload-identity-directory/default/workload-identity/*"
]
},
{
"Sid": "OAuth2Credentials",
"Effect": "Allow",
"Action": [
"bedrock-agentcore:GetResourceOauth2Token"
],
"Resource": [
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:token-vault/default",
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:token-vault/*/oauth2credentialprovider/*",
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:workload-identity-directory/default",
f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:workload-identity-directory/default/workload-identity/*"
]
},
{
"Sid": "SecretsManagerAccess",
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": f"arn:aws:secretsmanager:{self.config.region}:{self.config.account_id}:secret:*"
}
]
}
try:
print(f"🔑 Creating IAM role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='IAM role for AgentCore Gateway'
)
role_arn = response['Role']['Arn']
print(f"✅ Created IAM role: {role_arn}")
# Attach policy
iam.put_role_policy(
RoleName=role_name,
PolicyName='GatewayExecutionPolicy',
PolicyDocument=json.dumps(policy_document)
)
print(f"✅ Attached execution policy to role")
return role_arn
except iam.exceptions.EntityAlreadyExistsException:
print(f"️ Role {role_name} already exists, deleting and recreating...")
# Delete inline policies
try:
policy_names = iam.list_role_policies(RoleName=role_name)['PolicyNames']
for policy_name in policy_names:
print(f" Deleting inline policy: {policy_name}")
iam.delete_role_policy(RoleName=role_name, PolicyName=policy_name)
except Exception as e:
print(f" ⚠️ Error deleting inline policies: {e}")
# Detach managed policies
try:
attached_policies = iam.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']
for policy in attached_policies:
print(f" Detaching managed policy: {policy['PolicyArn']}")
iam.detach_role_policy(RoleName=role_name, PolicyArn=policy['PolicyArn'])
except Exception as e:
print(f" ⚠️ Error detaching managed policies: {e}")
# Remove from instance profiles
try:
instance_profiles = iam.list_instance_profiles_for_role(RoleName=role_name)['InstanceProfiles']
for profile in instance_profiles:
print(f" Removing from instance profile: {profile['InstanceProfileName']}")
iam.remove_role_from_instance_profile(
InstanceProfileName=profile['InstanceProfileName'],
RoleName=role_name
)
except Exception as e:
print(f" ⚠️ Error removing from instance profiles: {e}")
# Delete the role
try:
iam.delete_role(RoleName=role_name)
print(f" ✅ Deleted existing role")
except Exception as e:
print(f" ❌ Error deleting role: {e}")
raise
# Wait for IAM propagation
import time
time.sleep(2)
# Recreate the role
print(f" Creating new role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='IAM role for AgentCore Gateway'
)
role_arn = response['Role']['Arn']
# Attach policy
iam.put_role_policy(
RoleName=role_name,
PolicyName='GatewayExecutionPolicy',
PolicyDocument=json.dumps(policy_document)
)
print(f"✅ Recreated IAM role: {role_arn}")
return role_arn
except Exception as e:
print(f"❌ Error creating role: {e}")
raise
def create_gateway(self, gateway_name: str = 'lakehouse-gateway') -> Dict[str, Any]:
"""
Create an AgentCore Gateway with JWT authentication.
Args:
gateway_name: Name for the gateway
Returns:
Gateway creation response
"""
try:
print(f"\n🔧 Creating AgentCore Gateway: {gateway_name}")
# Create IAM role for gateway
role_arn = self.create_gateway_role(gateway_name)
# Extract user pool ID from ARN
user_pool_id = self.config.cognito_user_pool_arn.split('/')[-1]
issuer = f'https://cognito-idp.{self.config.region}.amazonaws.com/{user_pool_id}'
# JWT authorizer configuration
# Cognito OIDC discovery URL
discovery_url = f'{issuer}/.well-known/openid-configuration'
auth_config = {
"customJWTAuthorizer": {
"discoveryUrl": discovery_url,
"allowedClients": [self.config.cognito_app_client_id]
# Note: Not using allowedAudience because Cognito access tokens
# don't include 'aud' claim. We validate via client_id instead.
}
}
# Interceptor configuration for request processing
interceptor_config = [
{
"interceptor": {
"lambda": {
"arn": self.config.interceptor_lambda_arn
}
},
"interceptionPoints": ["REQUEST"],
"inputConfiguration": {
"passRequestHeaders": True
}
}
]
# Create gateway
response = self.client.create_gateway(
name=gateway_name,
roleArn=role_arn,
protocolType='MCP',
protocolConfiguration={
'mcp': {
'supportedVersions': ['2025-03-26', '2025-06-18'],
'searchType': 'SEMANTIC'
}
},
authorizerType='CUSTOM_JWT',
authorizerConfiguration=auth_config,
interceptorConfigurations=interceptor_config,
description='Gateway for Lakehouse Data MCP Server with OAuth-based access control'
)
gateway_id = response['gatewayId']
gateway_url = response['gatewayUrl']
gateway_arn = f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:gateway/{gateway_id}"
print(f"✅ Gateway created successfully!")
print(f" Gateway ID: {gateway_id}")
print(f" Gateway URL: {gateway_url}")
print(f" Gateway ARN: {gateway_arn}")
return {
'gatewayId': gateway_id,
'gatewayUrl': gateway_url,
'gatewayArn': gateway_arn,
'gatewayName': gateway_name
}
except Exception as e:
if "already exists" in str(e):
print(f"️ Gateway {gateway_name} already exists, retrieving details...")
response = self.client.list_gateways()
for gateway in response.get('items', []):
if gateway['name'] == gateway_name:
gateway_id = gateway['gatewayId']
response = self.client.get_gateway(gatewayIdentifier=gateway_id)
gateway_url = response['gatewayUrl']
gateway_arn = f"arn:aws:bedrock-agentcore:{self.config.region}:{self.config.account_id}:gateway/{gateway_id}"
print(f"✅ Using existing gateway: {gateway_id}")
return {
'gatewayId': gateway_id,
'gatewayUrl': gateway_url,
'gatewayArn': gateway_arn,
'gatewayName': gateway_name
}
print(f"❌ Error creating gateway: {str(e)}")
raise
def create_oauth_provider(
self,
provider_name: str,
cognito_client_id: str,
cognito_client_secret: str,
cognito_token_endpoint: str,
cognito_issuer: str
) -> str:
"""
Create an OAuth2 credential provider in AgentCore Identity for Cognito.
This provider is used by the Gateway to authenticate to the MCP server on Runtime.
The Gateway uses client_credentials flow (M2M) to obtain tokens from Cognito,
then includes those tokens when invoking the MCP server.
Authentication Flow:
1. User → Gateway: User's JWT token (from Cognito user authentication)
2. Gateway validates user's token with JWT authorizer
3. Gateway → Cognito: Request M2M token using client_credentials
4. Cognito → Gateway: M2M access token
5. Gateway → MCP Runtime: MCP request with M2M token in Authorization header
6. MCP Runtime validates M2M token with its JWT authorizer
7. MCP Runtime → Gateway: MCP response
Args:
provider_name: Name for the OAuth provider
cognito_client_id: Cognito App Client ID (M2M client preferred)
cognito_client_secret: Cognito App Client Secret
cognito_token_endpoint: Cognito token endpoint URL
cognito_issuer: Cognito issuer URL
Returns:
OAuth provider ARN
"""
try:
print(f"\n🔐 Creating OAuth2 credential provider: {provider_name}")
# For Cognito, we use CustomOauth2 vendor with authorization server metadata
# Cognito doesn't have a .well-known/openid-configuration endpoint for token endpoint
# so we provide the metadata directly
response = self.client.create_oauth2_credential_provider(
name=provider_name,
credentialProviderVendor='CustomOauth2',
oauth2ProviderConfigInput={
'customOauth2ProviderConfig': {
'oauthDiscovery': {
'authorizationServerMetadata': {
'issuer': cognito_issuer,
'authorizationEndpoint': f"{cognito_issuer}/oauth2/authorize",
'tokenEndpoint': cognito_token_endpoint,
'tokenEndpointAuthMethods': ['client_secret_post']
}
},
'clientId': cognito_client_id,
'clientSecret': cognito_client_secret
}
}
)
# Debug: print response to see actual structure
print(f" Debug - Response keys: {list(response.keys())}")
# Try different possible key names
if 'oauth2CredentialProviderArn' in response:
provider_arn = response['oauth2CredentialProviderArn']
elif 'arn' in response:
provider_arn = response['arn']
elif 'credentialProviderArn' in response:
provider_arn = response['credentialProviderArn']
else:
print(f" Debug - Full response: {response}")
raise KeyError(f"Could not find ARN in response. Available keys: {list(response.keys())}")
print(f"✅ OAuth2 provider created: {provider_arn}")
return provider_arn
except Exception as e:
if "already exists" in str(e).lower() or "AlreadyExistsException" in str(e):
print(f"️ OAuth2 provider {provider_name} already exists, retrieving ARN...")
try:
# List providers and find the one with matching name
response = self.client.list_oauth2_credential_providers()
print(f" Debug - List response keys: {list(response.keys())}")
# Try different possible key names for the list
providers = response.get('oauth2CredentialProviders',
response.get('credentialProviders',
response.get('items', [])))
for provider in providers:
if provider.get('name') == provider_name:
# Try different possible ARN key names
provider_arn = (provider.get('oauth2CredentialProviderArn') or
provider.get('arn') or
provider.get('credentialProviderArn'))
if provider_arn:
print(f"✅ Using existing provider: {provider_arn}")
return provider_arn
print(f" ⚠️ Provider {provider_name} not found in list")
except Exception as list_error:
print(f"❌ Error listing providers: {list_error}")
print(f"❌ Error creating OAuth2 provider: {e}")
raise
def create_gateway_target(
self,
gateway_id: str,
target_name: str,
mcp_server_url: str,
oauth_provider_arn: str
) -> Dict[str, Any]:
"""
Create a gateway target pointing to the MCP server runtime with OAuth authentication.
Args:
gateway_id: Gateway ID
target_name: Name for the target
mcp_server_url: URL of the MCP server runtime
oauth_provider_arn: ARN of the OAuth credential provider
Returns:
Target creation response
"""
try:
print(f"\n🎯 Creating gateway target: {target_name}")
print(f" MCP Server URL: {mcp_server_url}")
print(f" Authentication: OAuth2 Client Credentials")
print(f" Provider ARN: {oauth_provider_arn}")
response = self.client.create_gateway_target(
name=target_name,
gatewayIdentifier=gateway_id,
targetConfiguration={
'mcp': {
'mcpServer': {
'endpoint': mcp_server_url
}
}
},
credentialProviderConfigurations=[
{
'credentialProviderType': 'OAUTH',
'credentialProvider': {
'oauthCredentialProvider': {
'providerArn': oauth_provider_arn,
'scopes': [] # Empty scopes for Client Credentials flow
}
}
}
]
)
print(f"✅ Gateway target created successfully with OAuth2 authentication!")
return response
except Exception as e:
if "already exists" in str(e):
print(f"️ Target {target_name} already exists")
return {}
print(f"❌ Error creating gateway target: {str(e)}")
raise
def wait_for_gateway_active(self, gateway_id: str, max_wait_seconds: int = 300) -> bool:
"""
Wait for gateway to be in ACTIVE or READY status.
Args:
gateway_id: Gateway ID
max_wait_seconds: Maximum time to wait in seconds
Returns:
True if gateway is active/ready, False if timeout
"""
import time
print(f"\n⏳ Checking gateway status...")
start_time = time.time()
while time.time() - start_time < max_wait_seconds:
try:
response = self.client.get_gateway(gatewayIdentifier=gateway_id)
status = response.get('status', 'UNKNOWN').strip().upper()
print(f" Status: {status}")
if status in ['ACTIVE', 'READY']:
print(f"✅ Gateway is ready (status: {status})!")
return True
elif status in ['FAILED', 'DELETING', 'DELETED']:
print(f"❌ Gateway is in {status} status")
return False
print(f" Waiting for gateway to be ready...")
time.sleep(10)
except Exception as e:
print(f"⚠️ Error checking gateway status: {e}")
time.sleep(10)
print(f"⚠️ Timeout waiting for gateway to be active")
return False
def get_runtime_mcp_url(runtime_arn: str, region: str) -> str:
"""
Get the MCP endpoint URL for an AgentCore Runtime.
For MCP servers deployed on AgentCore Runtime, the endpoint is:
https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{encoded-arn}/invocations?qualifier=DEFAULT
Args:
runtime_arn: Runtime ARN
region: AWS region
Returns:
MCP endpoint URL
"""
try:
# Encode the runtime ARN (replace : with %3A and / with %2F)
encoded_arn = runtime_arn.replace(':', '%3A').replace('/', '%2F')
# Construct the MCP endpoint URL
mcp_url = f"https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{encoded_arn}/invocations?qualifier=DEFAULT"
print(f"✅ MCP Endpoint URL: {mcp_url}")
return mcp_url
except Exception as e:
print(f"❌ Error constructing MCP URL: {e}")
return ''
def main():
"""Main gateway creation function."""
print("=" * 70)
print("AgentCore Gateway Setup")
print("=" * 70)
# Load configuration from SSM
config = SSMConfig()
# Create gateway setup instance
setup = GatewaySetup(config)
# Gateway name
gateway_name = 'lakehouse-gateway'
print(f"\n📋 Configuration:")
print(f" Gateway Name: {gateway_name}")
print(f" MCP Server: {config.mcp_server_runtime_arn}")
print(f" Interceptor: {config.interceptor_lambda_arn}")
print(f" Cognito User Pool: {config.cognito_user_pool_arn}")
print(f" Client ID: {config.cognito_app_client_id}")
try:
# Create gateway
gateway_response = setup.create_gateway(gateway_name=gateway_name)
# Wait for gateway to be active before creating target
if setup.wait_for_gateway_active(gateway_response['gatewayId']):
# Get MCP endpoint URL for the runtime
mcp_url = get_runtime_mcp_url(config.mcp_server_runtime_arn, config.region)
# Build Cognito endpoints
user_pool_id = config.cognito_user_pool_arn.split('/')[-1]
cognito_issuer = f"https://cognito-idp.{config.region}.amazonaws.com/{user_pool_id}"
cognito_token_endpoint = f"{config.cognito_domain}/oauth2/token"
# Determine which client to use for Gateway-to-Runtime authentication
if config.has_m2m_client:
print(f"\n🔐 Using M2M client for Gateway-to-Runtime authentication")
auth_client_id = config.cognito_m2m_client_id
auth_client_secret = config.cognito_m2m_client_secret
provider_name = 'lakehouse-mcp-m2m-oauth-provider'
else:
print(f"\n🔐 Using hybrid client for Gateway-to-Runtime authentication")
auth_client_id = config.cognito_app_client_id
auth_client_secret = config.cognito_app_client_secret
provider_name = 'lakehouse-mcp-oauth-provider'
# Create OAuth credential provider
oauth_provider_arn = setup.create_oauth_provider(
provider_name=provider_name,
cognito_client_id=auth_client_id,
cognito_client_secret=auth_client_secret,
cognito_token_endpoint=cognito_token_endpoint,
cognito_issuer=cognito_issuer
)
# Create gateway target with OAuth authentication
if mcp_url and oauth_provider_arn:
setup.create_gateway_target(
gateway_id=gateway_response['gatewayId'],
target_name='lakehouse-mcp-target',
mcp_server_url=mcp_url,
oauth_provider_arn=oauth_provider_arn
)
else:
print(f"\n⚠️ Gateway not active yet. You can create the target later.")
# Store gateway configuration in SSM
config.store_gateway_parameters(
gateway_response['gatewayId'],
gateway_response['gatewayArn'],
gateway_response['gatewayUrl'],
gateway_response['gatewayName']
)
print(f"\n" + "=" * 70)
print("Gateway Setup Complete!")
print("=" * 70)
print(f"\n✅ Gateway configuration stored in SSM Parameter Store:")
print(f" /app/lakehouse-agent/gateway-id")
print(f" /app/lakehouse-agent/gateway-arn")
print(f" /app/lakehouse-agent/gateway-url")
print(f" /app/lakehouse-agent/gateway-name")
print(f"\n📋 Next Steps:")
print(f" 1. Deploy the Lakehouse Agent (Step 8)")
print(f" 2. Test the system end-to-end")
print("\n" + "=" * 70)
except Exception as e:
print(f"\n❌ Gateway creation failed: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
main()
@@ -0,0 +1,92 @@
#!/usr/bin/env python3
"""
Create IAM role for Gateway Interceptor Lambda
"""
import boto3
import json
import sys
def create_lambda_role():
"""Create IAM role for Lambda execution."""
session = boto3.Session()
region = session.region_name
iam = boto3.client('iam', region_name=region)
sts = boto3.client('sts', region_name=region)
ssm = boto3.client('ssm', region_name=region)
account_id = sts.get_caller_identity()['Account']
role_name = 'InsuranceClaimsGatewayInterceptorRole'
# Trust policy for Lambda
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
try:
# Create role
print(f"Creating IAM role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='Lambda execution role for Gateway Interceptor'
)
role_arn = response['Role']['Arn']
print(f"✅ Created IAM role: {role_arn}")
# Attach basic Lambda execution policy
iam.attach_role_policy(
RoleName=role_name,
PolicyArn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
)
print(f"✅ Attached AWSLambdaBasicExecutionRole policy")
# Store role ARN in SSM Parameter Store
print(f"💾 Storing role ARN in SSM Parameter Store...")
ssm.put_parameter(
Name='/app/lakehouse-agent/interceptor-lambda-role-arn',
Value=role_arn,
Description='IAM role ARN for Gateway Interceptor Lambda',
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: /app/lakehouse-agent/interceptor-lambda-role-arn")
return role_arn
except iam.exceptions.EntityAlreadyExistsException:
print(f"️ Role {role_name} already exists, retrieving ARN")
response = iam.get_role(RoleName=role_name)
role_arn = response['Role']['Arn']
print(f"✅ Using existing role: {role_arn}")
# Store role ARN in SSM Parameter Store
print(f"💾 Storing role ARN in SSM Parameter Store...")
ssm.put_parameter(
Name='/app/lakehouse-agent/interceptor-lambda-role-arn',
Value=role_arn,
Description='IAM role ARN for Gateway Interceptor Lambda',
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: /app/lakehouse-agent/interceptor-lambda-role-arn")
return role_arn
except Exception as e:
print(f"❌ Error creating role: {e}")
sys.exit(1)
if __name__ == '__main__':
role_arn = create_lambda_role()
print(f"\n✅ Lambda Role ARN stored in SSM Parameter Store")
print(f" /app/lakehouse-agent/interceptor-lambda-role-arn = {role_arn}")
@@ -0,0 +1,237 @@
#!/usr/bin/env python3
"""
Decode User JWT Token and Check Gateway Configuration
This script:
1. Authenticates a user and gets their JWT token
2. Decodes and displays all token claims
3. Checks Gateway JWT authorizer configuration
4. Compares token claims with Gateway expectations
Usage:
python decode_user_token.py --username <username> --password <password>
"""
import boto3
import json
import base64
import argparse
import sys
def decode_jwt(token):
"""Decode JWT token without verification."""
parts = token.split('.')
if len(parts) != 3:
return None, None
# Decode header
header = json.loads(base64.urlsafe_b64decode(parts[0] + '=='))
# Decode payload
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=='))
return header, payload
def get_user_tokens(username, password):
"""Get user tokens from Cognito."""
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
cognito = boto3.client('cognito-idp', region_name=region)
print("=" * 70)
print("Step 1: Authenticate User and Get Tokens")
print("=" * 70)
# Get Cognito configuration
print("\n📋 Loading Cognito configuration...")
client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-id')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-secret', WithDecryption=True)['Parameter']['Value']
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
print(f" Client ID: {client_id}")
print(f" User Pool: {user_pool_id}")
# Authenticate user
print(f"\n🔐 Authenticating user: {username}")
import hmac
import hashlib
message = username + client_id
secret_hash = base64.b64encode(
hmac.new(client_secret.encode(), message.encode(), hashlib.sha256).digest()
).decode()
try:
response = cognito.admin_initiate_auth(
UserPoolId=user_pool_id,
ClientId=client_id,
AuthFlow='ADMIN_USER_PASSWORD_AUTH',
AuthParameters={
'USERNAME': username,
'PASSWORD': password,
'SECRET_HASH': secret_hash
}
)
access_token = response['AuthenticationResult']['AccessToken']
id_token = response['AuthenticationResult']['IdToken']
print(f"✅ User authenticated successfully!")
return access_token, id_token, region, client_id, user_pool_id
except Exception as e:
print(f"❌ Authentication failed: {e}")
return None, None, None, None, None
def check_gateway_config(region):
"""Check Gateway JWT authorizer configuration."""
print("\n" + "=" * 70)
print("Step 2: Check Gateway JWT Authorizer Configuration")
print("=" * 70)
ssm = boto3.client('ssm', region_name=region)
agentcore = boto3.client('bedrock-agentcore-control', region_name=region)
try:
gateway_id = ssm.get_parameter(Name='/app/lakehouse-agent/gateway-id')['Parameter']['Value']
print(f"\n📦 Gateway ID: {gateway_id}")
gateway_details = agentcore.get_gateway(gatewayIdentifier=gateway_id)
auth_config = gateway_details.get('authorizerConfiguration', {})
if 'customJWTAuthorizer' in auth_config:
jwt_config = auth_config['customJWTAuthorizer']
print(f"\n🔐 JWT Authorizer Configuration:")
print(f" Discovery URL: {jwt_config.get('discoveryUrl', 'N/A')}")
print(f" Allowed Audience: {jwt_config.get('allowedAudience', [])}")
print(f" Allowed Clients: {jwt_config.get('allowedClients', [])}")
return jwt_config
else:
print(f"\n⚠️ No JWT authorizer configured")
return None
except Exception as e:
print(f"\n⚠️ Could not get Gateway configuration: {e}")
return None
def main():
parser = argparse.ArgumentParser(description='Decode user JWT token and check Gateway config')
parser.add_argument('--username', required=True, help='Cognito username')
parser.add_argument('--password', required=True, help='User password')
args = parser.parse_args()
# Get user tokens
access_token, id_token, region, client_id, user_pool_id = get_user_tokens(args.username, args.password)
if not access_token:
sys.exit(1)
# Decode tokens
print("\n" + "=" * 70)
print("Step 3: Decode and Inspect Tokens")
print("=" * 70)
print("\n📄 ID TOKEN:")
print("=" * 70)
id_header, id_payload = decode_jwt(id_token)
print(json.dumps(id_payload, indent=2))
print("\n📄 ACCESS TOKEN:")
print("=" * 70)
access_header, access_payload = decode_jwt(access_token)
print(json.dumps(access_payload, indent=2))
# Check Gateway configuration
gateway_config = check_gateway_config(region)
# Compare token with Gateway expectations
print("\n" + "=" * 70)
print("Step 4: Validate Token Against Gateway Configuration")
print("=" * 70)
if gateway_config:
print("\n🔍 Checking token compatibility...")
# Check issuer
expected_issuer = f"https://cognito-idp.{region}.amazonaws.com/{user_pool_id}"
token_issuer = access_payload.get('iss', '')
print(f"\n1. Issuer (iss):")
print(f" Expected: {expected_issuer}")
print(f" Token: {token_issuer}")
if token_issuer == expected_issuer:
print(f" ✅ Match")
else:
print(f" ❌ Mismatch")
# Check client_id
allowed_clients = gateway_config.get('allowedClients', [])
token_client_id = access_payload.get('client_id', '')
print(f"\n2. Client ID:")
print(f" Allowed: {allowed_clients}")
print(f" Token: {token_client_id}")
if token_client_id in allowed_clients:
print(f" ✅ Match")
else:
print(f" ❌ Not in allowed clients")
# Check audience (if configured)
allowed_audience = gateway_config.get('allowedAudience', [])
token_aud = access_payload.get('aud', '')
print(f"\n3. Audience (aud):")
print(f" Allowed: {allowed_audience}")
print(f" Token: {token_aud}")
if not allowed_audience:
print(f" ️ No audience restriction configured")
elif token_aud in allowed_audience:
print(f" ✅ Match")
else:
print(f" ❌ Not in allowed audience")
# Check token_use
token_use = access_payload.get('token_use', '')
print(f"\n4. Token Use:")
print(f" Token: {token_use}")
if token_use == 'access':
print(f" ✅ Correct (should be 'access' for API calls)")
else:
print(f" ⚠️ Unexpected token_use value")
# Summary
print("\n" + "=" * 70)
print("Summary")
print("=" * 70)
issues = []
if token_issuer != expected_issuer:
issues.append("❌ Issuer mismatch")
if token_client_id not in allowed_clients:
issues.append("❌ Client ID not in allowed clients")
if allowed_audience and token_aud not in allowed_audience:
issues.append("❌ Audience not in allowed audience")
if issues:
print("\n❌ Token validation issues found:")
for issue in issues:
print(f" {issue}")
print("\n💡 Possible solutions:")
print(" 1. Redeploy Gateway with correct client ID in allowedClients")
print(" 2. Check that user authenticated with correct Cognito client")
print(" 3. Verify Gateway JWT authorizer configuration")
else:
print("\n✅ Token should be accepted by Gateway!")
print(" All claims match Gateway configuration")
print("\n" + "=" * 70)
if __name__ == '__main__':
main()
@@ -0,0 +1 @@
interceptor-lambda.zip
@@ -0,0 +1,162 @@
#!/bin/bash
# Deploy Gateway Interceptor Lambda Function
set -e
echo "🚀 Deploying Gateway Interceptor Lambda"
# Get AWS region from default configuration
AWS_REGION=$(aws configure get region)
if [ -z "$AWS_REGION" ]; then
echo "❌ Error: AWS region not configured"
echo " Please run: aws configure set region <your-region>"
exit 1
fi
echo " Region: $AWS_REGION"
# Read configuration from SSM Parameter Store
echo ""
echo "🔍 Loading configuration from SSM Parameter Store..."
# Temporarily disable exit on error to capture SSM errors
set +e
COGNITO_USER_POOL_ID=$(aws ssm get-parameter --name /app/lakehouse-agent/cognito-user-pool-id --query 'Parameter.Value' --output text 2>&1)
COGNITO_RESULT=$?
COGNITO_APP_CLIENT_ID=$(aws ssm get-parameter --name /app/lakehouse-agent/cognito-app-client-id --query 'Parameter.Value' --output text 2>&1)
CLIENT_RESULT=$?
set -e
if [ $COGNITO_RESULT -ne 0 ] || [ $CLIENT_RESULT -ne 0 ]; then
echo "❌ Error: Required SSM parameters not found"
echo ""
if [ $COGNITO_RESULT -ne 0 ]; then
echo " Missing: /app/lakehouse-agent/cognito-user-pool-id"
echo " Error: $COGNITO_USER_POOL_ID"
fi
if [ $CLIENT_RESULT -ne 0 ]; then
echo " Missing: /app/lakehouse-agent/cognito-app-client-id"
echo " Error: $COGNITO_APP_CLIENT_ID"
fi
echo ""
echo " Please run setup_cognito.py first:"
echo " cd gateway-setup"
echo " python setup_cognito.py"
exit 1
fi
echo "✅ Configuration loaded from SSM"
echo " Cognito User Pool ID: $COGNITO_USER_POOL_ID"
echo " Cognito App Client ID: $COGNITO_APP_CLIENT_ID"
# Package Lambda function
echo ""
echo "📦 Packaging Lambda function..."
mkdir -p dist
pip install -r requirements.txt -t dist/ --platform manylinux2014_x86_64 --only-binary=:all:
cp lambda_function.py dist/
cd dist
zip -r ../interceptor-lambda.zip .
cd ..
echo "✅ Package created: interceptor-lambda.zip"
# Create Lambda role using Python script
echo ""
echo "🔑 Creating Lambda execution role..."
cd ..
python create_lambda_role.py
cd interceptor
# Get the role ARN from SSM Parameter Store (stored by create_lambda_role.py)
LAMBDA_ROLE_ARN=$(aws ssm get-parameter --name /app/lakehouse-agent/interceptor-lambda-role-arn --query 'Parameter.Value' --output text 2>/dev/null)
# Fallback to direct IAM query if not in SSM yet
if [ -z "$LAMBDA_ROLE_ARN" ]; then
echo " Retrieving role ARN from IAM..."
LAMBDA_ROLE_ARN=$(aws iam get-role --role-name InsuranceClaimsGatewayInterceptorRole --query 'Role.Arn' --output text 2>/dev/null)
fi
if [ -z "$LAMBDA_ROLE_ARN" ]; then
echo "❌ Failed to retrieve Lambda role ARN"
exit 1
fi
echo "✅ Lambda role ready: $LAMBDA_ROLE_ARN"
# Wait for IAM role to propagate (required for new roles)
echo "⏳ Waiting for IAM role to propagate (10 seconds)..."
sleep 10
# Check if Lambda function already exists
echo ""
echo "🔍 Checking if Lambda function exists..."
if aws lambda get-function --function-name lakehouse-gateway-interceptor --region $AWS_REGION 2>/dev/null; then
echo "📝 Updating existing Lambda function..."
aws lambda update-function-code \
--function-name lakehouse-gateway-interceptor \
--zip-file fileb://interceptor-lambda.zip \
--region $AWS_REGION
echo "⚙️ Updating Lambda configuration..."
aws lambda update-function-configuration \
--function-name lakehouse-gateway-interceptor \
--environment "Variables={COGNITO_REGION=$AWS_REGION,COGNITO_USER_POOL_ID=$COGNITO_USER_POOL_ID,COGNITO_APP_CLIENT_ID=$COGNITO_APP_CLIENT_ID}" \
--region $AWS_REGION
echo "✅ Lambda function updated!"
else
echo "📝 Creating new Lambda function..."
# Retry logic for role propagation
MAX_RETRIES=3
RETRY_COUNT=0
while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
if aws lambda create-function \
--function-name lakehouse-gateway-interceptor \
--runtime python3.11 \
--role $LAMBDA_ROLE_ARN \
--handler lambda_function.lambda_handler \
--zip-file fileb://interceptor-lambda.zip \
--timeout 30 \
--memory-size 256 \
--environment "Variables={COGNITO_REGION=$AWS_REGION,COGNITO_USER_POOL_ID=$COGNITO_USER_POOL_ID,COGNITO_APP_CLIENT_ID=$COGNITO_APP_CLIENT_ID}" \
--region $AWS_REGION 2>/dev/null; then
echo "✅ Lambda function created!"
break
else
RETRY_COUNT=$((RETRY_COUNT + 1))
if [ $RETRY_COUNT -lt $MAX_RETRIES ]; then
echo "⏳ Role not ready yet, waiting 5 seconds (attempt $RETRY_COUNT/$MAX_RETRIES)..."
sleep 5
else
echo "❌ Failed to create Lambda function after $MAX_RETRIES attempts"
echo " The IAM role may need more time to propagate"
exit 1
fi
fi
done
fi
# Store Lambda function ARN in SSM Parameter Store
echo ""
echo "💾 Storing Lambda function ARN in SSM Parameter Store..."
LAMBDA_FUNCTION_ARN=$(aws lambda get-function --function-name lakehouse-gateway-interceptor --region $AWS_REGION --query 'Configuration.FunctionArn' --output text)
aws ssm put-parameter \
--name /app/lakehouse-agent/interceptor-lambda-arn \
--value "$LAMBDA_FUNCTION_ARN" \
--type String \
--overwrite \
--region $AWS_REGION
echo "✅ Stored parameter: /app/lakehouse-agent/interceptor-lambda-arn"
echo ""
echo "✨ Deployment complete!"
echo ""
echo "📝 Lambda Function ARN: $LAMBDA_FUNCTION_ARN"
@@ -0,0 +1,426 @@
"""
AgentCore Gateway Interceptor for Health Lakehouse Data
This Lambda function acts as a Gateway Interceptor following the AgentCore MCP protocol:
1. Extracts JWT bearer tokens from MCP gateway request structure
2. Validates JWT tokens against Cognito
3. Extracts user principal (email/username) from JWT claims
4. Adds user identity to request headers for downstream MCP server
5. Returns responses in proper MCP interceptor format
Reference: https://github.com/awslabs/amazon-bedrock-agentcore-samples/blob/main/01-tutorials/02-AgentCore-gateway/14-token-exchange-at-request-interceptor/
OAuth Flow:
Streamlit → lakehouse-agent → Gateway (this interceptor) → MCP server
The interceptor extracts the principal from the JWT token and passes it to the MCP server
for Lake Formation row-level security enforcement.
"""
import json
import logging
import os
import boto3
from typing import Dict, Any, Optional
import urllib.request
import base64
from jose import jwt, JWTError
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Cache for configuration and keys
_config = None
_jwks = None
def get_config() -> Dict[str, str]:
"""
Get Cognito configuration from environment variables or SSM.
Returns:
Dictionary with Cognito configuration
"""
global _config
if _config is not None:
return _config
# First try environment variables
region = os.environ.get('COGNITO_REGION') or os.environ.get('AWS_REGION', 'us-west-2')
user_pool_id = os.environ.get('COGNITO_USER_POOL_ID', '')
app_client_id = os.environ.get('COGNITO_APP_CLIENT_ID', '')
# If not set, try SSM Parameter Store
if not user_pool_id or not app_client_id:
logger.info("Loading Cognito configuration from SSM Parameter Store...")
try:
ssm = boto3.client('ssm', region_name=region)
if not user_pool_id:
response = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')
user_pool_id = response['Parameter']['Value']
logger.info(f"Loaded user_pool_id from SSM: {user_pool_id}")
if not app_client_id:
response = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-id')
app_client_id = response['Parameter']['Value']
logger.info(f"Loaded app_client_id from SSM: {app_client_id}")
except Exception as e:
logger.error(f"Error loading configuration from SSM: {e}")
raise
_config = {
'region': region,
'user_pool_id': user_pool_id,
'app_client_id': app_client_id,
'issuer': f'https://cognito-idp.{region}.amazonaws.com/{user_pool_id}'
}
logger.info(f"Cognito configuration loaded: region={region}, user_pool_id={user_pool_id}")
return _config
def get_cognito_public_keys() -> Dict[str, Any]:
"""
Fetch Cognito public keys for JWT validation.
Returns:
Dictionary of public keys
"""
global _jwks
if _jwks is not None:
return _jwks
try:
config = get_config()
jwks_url = f"{config['issuer']}/.well-known/jwks.json"
logger.info(f"Fetching JWKS from: {jwks_url}")
with urllib.request.urlopen(jwks_url) as response:
_jwks = json.loads(response.read())
logger.info("Successfully fetched Cognito public keys")
return _jwks
except Exception as e:
logger.error(f"Error fetching Cognito public keys: {str(e)}")
raise
def validate_and_decode_jwt(token: str) -> Optional[Dict[str, Any]]:
"""
Validate JWT token and decode claims.
Args:
token: JWT bearer token
Returns:
Decoded JWT claims or None if invalid
"""
try:
config = get_config()
# Get Cognito public keys
jwks = get_cognito_public_keys()
# Decode token header to get key ID
unverified_headers = jwt.get_unverified_header(token)
kid = unverified_headers.get('kid')
# Find the correct public key
key = None
for k in jwks.get('keys', []):
if k.get('kid') == kid:
key = k
break
if not key:
logger.error("Public key not found for token")
return None
# Validate and decode JWT
# Note: For access tokens, we don't validate audience since Cognito
# access tokens don't have 'aud' claim. We validate client_id instead.
try:
claims = jwt.decode(
token,
key,
algorithms=['RS256'],
audience=config['app_client_id'],
issuer=config['issuer']
)
except JWTError as e:
# If audience validation fails, try without audience (for access tokens)
if 'audience' in str(e).lower() or 'aud' in str(e).lower():
logger.info("Retrying JWT validation without audience check (access token)")
claims = jwt.decode(
token,
key,
algorithms=['RS256'],
issuer=config['issuer'],
options={'verify_aud': False}
)
# Manually verify client_id for access tokens
if claims.get('client_id') != config['app_client_id']:
logger.error(f"Client ID mismatch: {claims.get('client_id')} != {config['app_client_id']}")
return None
else:
raise
logger.info(f"Successfully validated JWT for user: {claims.get('username', claims.get('sub'))}")
return claims
except JWTError as e:
logger.error(f"JWT validation error: {str(e)}")
return None
except Exception as e:
logger.error(f"Error validating JWT: {str(e)}")
return None
def extract_bearer_token_from_mcp(event: Dict[str, Any]) -> Optional[str]:
"""
Extract bearer token from MCP gateway request structure.
Following AgentCore Gateway MCP protocol, the event structure is:
{
"mcp": {
"gatewayRequest": {
"headers": {"Authorization": "Bearer <token>"},
"body": {...}
}
}
}
Args:
event: Lambda event with MCP structure
Returns:
Bearer token (without 'Bearer ' prefix) or None if not found
"""
try:
# Extract from MCP structure
mcp_data = event.get('mcp', {})
gateway_request = mcp_data.get('gatewayRequest', {})
headers = gateway_request.get('headers', {})
# Check Authorization header (case-insensitive)
auth_header = headers.get('Authorization') or headers.get('authorization')
if auth_header:
# Remove 'Bearer ' prefix if present
if auth_header.startswith('Bearer '):
token = auth_header.replace('Bearer ', '', 1)
elif auth_header.startswith('bearer '):
token = auth_header.replace('bearer ', '', 1)
else:
token = auth_header
logger.info(f"✅ Bearer token extracted from MCP gateway request")
return token
logger.warning("⚠️ Bearer token not found in MCP gateway request headers")
return None
except Exception as e:
logger.error(f"❌ Error extracting bearer token from MCP structure: {str(e)}")
return None
def extract_user_principal(claims: Dict[str, Any]) -> Optional[str]:
"""
Extract user principal (identity) from JWT claims.
The principal is used for Lake Formation row-level security.
Priority order:
1. email (preferred for user identification)
2. username
3. cognito:username
4. sub (user ID as fallback)
Args:
claims: Decoded JWT claims
Returns:
User principal (email/username) or None
"""
# Try multiple claim fields in priority order
principal = (
claims.get('email') or
claims.get('username') or
claims.get('cognito:username') or
claims.get('sub')
)
if principal:
logger.info(f"✅ Extracted user principal: {principal}")
return principal
logger.warning("⚠️ User principal not found in JWT claims")
return None
def get_user_scopes(claims: Dict[str, Any]) -> list:
"""
Extract OAuth scopes from JWT claims for logging and context.
Args:
claims: Decoded JWT claims
Returns:
List of scopes
"""
# Scopes can be in 'scope' claim (space-separated) or 'cognito:groups'
scope_string = claims.get('scope', '')
scopes = scope_string.split() if scope_string else []
# Add groups as scopes
groups = claims.get('cognito:groups', [])
if isinstance(groups, list):
scopes.extend(groups)
return scopes
def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
"""
Main Lambda handler for AgentCore Gateway interceptor.
Follows the MCP protocol for request interception:
1. Extracts JWT token from MCP gateway request structure
2. Validates JWT and extracts user principal
3. Adds user identity to request for downstream MCP server
4. Returns transformed request in MCP format
Event Structure (Input):
{
"mcp": {
"gatewayRequest": {
"headers": {"Authorization": "Bearer <token>"},
"body": {...}
}
}
}
Response Structure (Output):
{
"interceptorOutputVersion": "1.0",
"mcp": {
"transformedGatewayRequest": {
"headers": {...},
"body": {...}
}
}
}
Args:
event: Lambda event with MCP structure
context: Lambda context
Returns:
Transformed request in MCP format or error response
"""
logger.info("🔍 Gateway interceptor invoked")
logger.info(f"📦 Event structure: {json.dumps(event, default=str)[:500]}...")
try:
# Extract MCP gateway request
mcp_data = event.get('mcp', {})
gateway_request = mcp_data.get('gatewayRequest', {})
headers = gateway_request.get('headers', {})
body = gateway_request.get('body', {})
logger.info(f"📋 Headers present: {list(headers.keys())}")
logger.info(f"📋 Body keys: {list(body.keys())}")
# Extract bearer token from MCP structure
token = extract_bearer_token_from_mcp(event)
if not token:
logger.error("❌ No bearer token found in request")
return {
'statusCode': 401,
'body': json.dumps({
'error': 'Unauthorized',
'message': 'Bearer token required in Authorization header'
})
}
# Validate and decode JWT
claims = validate_and_decode_jwt(token)
if not claims:
logger.error("❌ JWT validation failed")
return {
'statusCode': 401,
'body': json.dumps({
'error': 'Unauthorized',
'message': 'Invalid or expired JWT token'
})
}
# Extract user principal from JWT claims
user_principal = extract_user_principal(claims)
if not user_principal:
logger.error("❌ User principal not found in JWT claims")
return {
'statusCode': 401,
'body': json.dumps({
'error': 'Unauthorized',
'message': 'User principal not found in token claims'
})
}
# Get user scopes for logging
scopes = get_user_scopes(claims)
logger.info(f"👤 User: {user_principal}, Scopes: {scopes}")
# Add user identity to headers for downstream MCP server
# The MCP server will use X-User-Identity for Lake Formation RLS
transformed_headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-User-Identity': user_principal,
'X-User-Scopes': ','.join(scopes) if scopes else ''
}
# Also add user context to body if it has params/arguments
# This ensures the MCP server can access user identity
transformed_body = body.copy()
if 'params' in transformed_body and 'arguments' in transformed_body['params']:
if 'context' not in transformed_body['params']['arguments']:
transformed_body['params']['arguments']['context'] = {}
transformed_body['params']['arguments']['context']['user_id'] = user_principal
transformed_body['params']['arguments']['context']['scopes'] = scopes
# Return transformed request in MCP format
response = {
"interceptorOutputVersion": "1.0",
"mcp": {
"transformedGatewayRequest": {
"headers": transformed_headers,
"body": transformed_body
}
}
}
logger.info(f"✅ Request authorized for user: {user_principal}")
logger.info(f"📤 Returning transformed request")
return response
except Exception as e:
logger.error(f"❌ Error in gateway interceptor: {str(e)}")
import traceback
logger.error(f"Stack trace: {traceback.format_exc()}")
return {
'statusCode': 500,
'body': json.dumps({
'error': 'Internal Server Error',
'message': f'Error processing request: {str(e)}'
})
}
@@ -0,0 +1,2 @@
python-jose[cryptography]>=3.4.0
cryptography>=41.0.0
@@ -0,0 +1,411 @@
#!/usr/bin/env python3
"""
Test AgentCore Gateway with User Authentication
This script tests the complete authentication flow:
1. User authenticates with Cognito (gets user JWT token)
2. User sends request to Gateway with JWT token
3. Gateway validates user JWT token
4. Gateway gets M2M token from Cognito (automatic via OAuth provider)
5. Gateway forwards request to MCP Runtime with M2M token
6. Runtime validates M2M token and processes request
7. Gateway returns response to user
Usage:
python test_gateway.py --username <username> --password <password>
python test_gateway.py --username testuser --password TestPass123!
"""
import boto3
import requests
import json
import base64
import argparse
import sys
def get_user_token(username: str, password: str):
"""
Authenticate user with Cognito and get JWT token.
Args:
username: Cognito username
password: User password
Returns:
Tuple of (access_token, id_token, region)
"""
print("=" * 70)
print("Step 1: User Authentication with Cognito")
print("=" * 70)
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
cognito = boto3.client('cognito-idp', region_name=region)
# Get Cognito configuration
print("\n📋 Loading Cognito configuration...")
try:
client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-id')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-secret', WithDecryption=True)['Parameter']['Value']
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
print(f" Client ID: {client_id}")
print(f" User Pool: {user_pool_id}")
except Exception as e:
print(f"❌ Error loading configuration: {e}")
return None, None, region
# Authenticate user using ADMIN_USER_PASSWORD_AUTH
# This flow doesn't require USER_PASSWORD_AUTH to be enabled on the client
print(f"\n🔐 Authenticating user: {username}")
try:
# Calculate SECRET_HASH
import hmac
import hashlib
message = username + client_id
secret_hash = base64.b64encode(
hmac.new(
client_secret.encode(),
message.encode(),
hashlib.sha256
).digest()
).decode()
response = cognito.admin_initiate_auth(
UserPoolId=user_pool_id,
ClientId=client_id,
AuthFlow='ADMIN_USER_PASSWORD_AUTH',
AuthParameters={
'USERNAME': username,
'PASSWORD': password,
'SECRET_HASH': secret_hash
}
)
access_token = response['AuthenticationResult']['AccessToken']
id_token = response['AuthenticationResult']['IdToken']
print(f"✅ User authenticated successfully!")
print(f" Token type: Bearer")
print(f" Expires in: {response['AuthenticationResult']['ExpiresIn']} seconds")
print("Access Token", access_token)
print("ID Token", id_token)
# Decode and print token claims
print(f"\n📄 User Token Claims:")
parts = id_token.split('.')
if len(parts) == 3:
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=='))
print(f" Username: {payload.get('cognito:username', 'N/A')}")
print(f" Email: {payload.get('email', 'N/A')}")
print(f" Token Use: {payload.get('token_use', 'N/A')}")
print(f" Audience (aud): {payload.get('aud', 'N/A')}")
print(f" Issuer (iss): {payload.get('iss', 'N/A')}")
# Also decode access token to see its claims
print(f"\n📄 Access Token Claims:")
parts = access_token.split('.')
if len(parts) == 3:
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=='))
print(f" Client ID: {payload.get('client_id', 'N/A')}")
print(f" Token Use: {payload.get('token_use', 'N/A')}")
print(f" Scope: {payload.get('scope', 'N/A')}")
print(f" Username: {payload.get('username', 'N/A')}")
print(f" Issuer (iss): {payload.get('iss', 'N/A')}")
print(f"\n🔑 Access Token (first 100 chars):")
print(f" {access_token[:100]}...")
return access_token, id_token, region
except cognito.exceptions.NotAuthorizedException:
print(f"❌ Authentication failed: Invalid username or password")
return None, None, region
except cognito.exceptions.UserNotFoundException:
print(f"❌ User not found: {username}")
return None, None, region
except Exception as e:
print(f"❌ Authentication error: {e}")
return None, None, region
def test_gateway(access_token: str, region: str):
"""
Test Gateway by sending MCP requests with user token.
Args:
access_token: User's access token from Cognito
region: AWS region
"""
print("\n" + "=" * 70)
print("Step 2: Test Gateway with User Token")
print("=" * 70)
ssm = boto3.client('ssm', region_name=region)
# Get Gateway URL
print("\n📦 Loading Gateway configuration...")
try:
gateway_url = ssm.get_parameter(Name='/app/lakehouse-agent/gateway-url')['Parameter']['Value']
gateway_id = ssm.get_parameter(Name='/app/lakehouse-agent/gateway-id')['Parameter']['Value']
print(f" Gateway URL: {gateway_url}")
print(f" Gateway ID: {gateway_id}")
# Get Gateway configuration to check JWT authorizer settings
print(f"\n🔍 Checking Gateway JWT authorizer configuration...")
agentcore = boto3.client('bedrock-agentcore-control', region_name=region)
try:
gateway_details = agentcore.get_gateway(gatewayIdentifier=gateway_id)
auth_config = gateway_details.get('authorizerConfiguration', {})
if 'customJWTAuthorizer' in auth_config:
jwt_config = auth_config['customJWTAuthorizer']
print(f" Discovery URL: {jwt_config.get('discoveryUrl', 'N/A')}")
print(f" Allowed Audience: {jwt_config.get('allowedAudience', [])}")
print(f" Allowed Clients: {jwt_config.get('allowedClients', [])}")
except Exception as e:
print(f" ⚠️ Could not get Gateway details: {e}")
except ssm.exceptions.ParameterNotFound:
print(f" ❌ Gateway URL not found in SSM")
print(f" Please deploy the Gateway first:")
print(f" cd gateway-setup && python create_gateway.py")
return
except Exception as e:
print(f" ❌ Error loading Gateway URL: {e}")
return
# Prepare headers with user token
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream"
}
# Test 1: Initialize MCP session
print("\n📤 Test 1: Initialize MCP session through Gateway")
init_request = {
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {
"name": "gateway-test-client",
"version": "1.0.0"
}
}
}
try:
response = requests.post(gateway_url, headers=headers, json=init_request, timeout=30)
print(f" Status: {response.status_code}")
print(f" Response length: {len(response.text)} bytes")
print(f" Content-Type: {response.headers.get('Content-Type', 'N/A')}")
if response.status_code == 200:
if response.text:
# Parse SSE format
if response.headers.get('Content-Type') == 'text/event-stream':
print(f" ✅ Received SSE response")
lines = response.text.split('\n')
for line in lines:
if line.startswith('data: '):
json_str = line[6:]
try:
data = json.loads(json_str)
print(f" ✅ Initialize successful!")
if 'result' in data:
server_info = data['result'].get('serverInfo', {})
print(f" Server Name: {server_info.get('name', 'N/A')}")
print(f" Server Version: {server_info.get('version', 'N/A')}")
print(f" Protocol Version: {data['result'].get('protocolVersion', 'N/A')}")
break
except json.JSONDecodeError:
continue
else:
try:
data = response.json()
print(f" ✅ Initialize successful!")
if 'result' in data:
server_info = data['result'].get('serverInfo', {})
print(f" Server Name: {server_info.get('name', 'N/A')}")
print(f" Server Version: {server_info.get('version', 'N/A')}")
except json.JSONDecodeError:
print(f" ⚠️ Response is not valid JSON")
print(f" Raw response: {response.text[:200]}")
else:
print(f" ⚠️ Response body is empty")
return
elif response.status_code == 401:
print(f" ❌ Unauthorized - User token validation failed")
print(f" Response: {response.text[:500]}")
return
elif response.status_code == 403:
print(f" ❌ Forbidden - User not authorized")
print(f" Response: {response.text[:500]}")
return
else:
print(f" ❌ Initialize failed")
print(f" Response: {response.text[:500]}")
return
except requests.exceptions.Timeout:
print(f" ❌ Request timed out")
print(f" This may indicate Gateway-to-Runtime authentication issues")
return
except Exception as e:
print(f" ❌ Error: {e}")
return
# Test 2: Get tool list
print("\n📤 Test 2: Get tool list through Gateway")
tools_request = {
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}
try:
response = requests.post(gateway_url, headers=headers, json=tools_request, timeout=30)
print(f" Status: {response.status_code}")
print(f" Response length: {len(response.text)} bytes")
if response.status_code == 200:
if response.text:
# Parse SSE format
if response.headers.get('Content-Type') == 'text/event-stream':
print(f" ✅ Received SSE response")
lines = response.text.split('\n')
for line in lines:
if line.startswith('data: '):
json_str = line[6:]
try:
data = json.loads(json_str)
print(f" ✅ Tool list retrieved!")
if 'result' in data and 'tools' in data['result']:
tools = data['result']['tools']
print(f"\n 📋 Available Tools ({len(tools)}):")
print(" " + "=" * 66)
for i, tool in enumerate(tools, 1):
print(f"\n {i}. {tool.get('name', 'N/A')}")
print(f" Description: {tool.get('description', 'N/A')}")
if 'inputSchema' in tool and 'properties' in tool['inputSchema']:
props = tool['inputSchema']['properties']
if props:
print(f" Parameters: {', '.join(props.keys())}")
break
except json.JSONDecodeError:
continue
else:
try:
data = response.json()
print(f" ✅ Tool list retrieved!")
if 'result' in data and 'tools' in data['result']:
tools = data['result']['tools']
print(f"\n 📋 Available Tools ({len(tools)}):")
for i, tool in enumerate(tools, 1):
print(f" {i}. {tool.get('name', 'N/A')}")
except json.JSONDecodeError:
print(f" ⚠️ Response is not valid JSON")
else:
print(f" ⚠️ Response body is empty")
else:
print(f" ❌ Tool list failed")
print(f" Response: {response.text[:500]}")
except requests.exceptions.Timeout:
print(f" ❌ Request timed out")
except Exception as e:
print(f" ❌ Error: {e}")
# Test 3: Query claims (if available)
print("\n📤 Test 3: Query claims (user-specific data)")
query_request = {
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "get_claims_summary",
"arguments": {}
}
}
try:
response = requests.post(gateway_url, headers=headers, json=query_request, timeout=30)
print(f" Status: {response.status_code}")
if response.status_code == 200:
if response.text:
# Parse SSE format
if response.headers.get('Content-Type') == 'text/event-stream':
lines = response.text.split('\n')
for line in lines:
if line.startswith('data: '):
json_str = line[6:]
try:
data = json.loads(json_str)
if 'result' in data:
print(f" ✅ Query successful!")
# Try to parse the content
if 'content' in data['result']:
for content in data['result']['content']:
if content.get('type') == 'text':
try:
result_data = json.loads(content['text'])
if result_data.get('success'):
summary = result_data.get('summary', {})
print(f" Total Claims: {summary.get('total_claims', 0)}")
print(f" Total Amount: ${summary.get('total_amount', 0):,.2f}")
except:
print(f" Response: {content['text'][:200]}")
break
except json.JSONDecodeError:
continue
else:
print(f" ⚠️ Query failed or tool not available")
except Exception as e:
print(f" ⚠️ Query test skipped: {e}")
def main():
parser = argparse.ArgumentParser(description='Test AgentCore Gateway with user authentication')
parser.add_argument('--username', required=True, help='Cognito username')
parser.add_argument('--password', required=True, help='User password')
args = parser.parse_args()
print("\n" + "=" * 70)
print("AgentCore Gateway Test with User Authentication")
print("=" * 70 + "\n")
# Step 1: Authenticate user and get token
access_token, id_token, region = get_user_token(args.username, args.password)
if not access_token:
print("\n❌ Failed to authenticate user. Exiting.")
sys.exit(1)
# Step 2: Test Gateway with user token
test_gateway(access_token, region)
print("\n" + "=" * 70)
print("Test Complete")
print("=" * 70)
print("\n✅ Authentication Flow Validated:")
print(" 1. User authenticated with Cognito ✓")
print(" 2. User token sent to Gateway ✓")
print(" 3. Gateway validated user token ✓")
print(" 4. Gateway obtained M2M token (automatic) ✓")
print(" 5. Gateway forwarded request to Runtime ✓")
print(" 6. Runtime validated M2M token ✓")
print(" 7. Response returned to user ✓")
print("\n" + "=" * 70 + "\n")
if __name__ == '__main__':
main()
@@ -0,0 +1,103 @@
#!/usr/bin/env python3
"""
Update Interceptor Lambda Environment Variables
This script updates the interceptor Lambda function's environment variables
to use the correct Cognito configuration from SSM Parameter Store.
Usage:
python update_interceptor_env.py
"""
import boto3
import sys
def main():
print("=" * 70)
print("Update Interceptor Lambda Environment Variables")
print("=" * 70)
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
lambda_client = boto3.client('lambda', region_name=region)
# Get correct Cognito configuration from SSM
print("\n📋 Loading correct Cognito configuration from SSM...")
try:
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
app_client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-id')['Parameter']['Value']
print(f" User Pool ID: {user_pool_id}")
print(f" App Client ID: {app_client_id}")
print(f" Region: {region}")
except Exception as e:
print(f"❌ Error loading Cognito configuration: {e}")
sys.exit(1)
# Get interceptor Lambda ARN
print("\n🔍 Finding interceptor Lambda function...")
try:
interceptor_arn = ssm.get_parameter(Name='/app/lakehouse-agent/interceptor-lambda-arn')['Parameter']['Value']
# Extract function name from ARN
function_name = interceptor_arn.split(':')[-1]
print(f" Lambda ARN: {interceptor_arn}")
print(f" Function Name: {function_name}")
except ssm.exceptions.ParameterNotFound:
print(" ⚠️ Interceptor Lambda ARN not found in SSM")
print(" Please enter the Lambda function name manually:")
function_name = input(" Lambda function name: ").strip()
if not function_name:
print("❌ No function name provided")
sys.exit(1)
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
# Get current Lambda configuration
print(f"\n🔍 Getting current Lambda configuration...")
try:
response = lambda_client.get_function_configuration(FunctionName=function_name)
current_env = response.get('Environment', {}).get('Variables', {})
print(f" Current environment variables:")
for key, value in current_env.items():
print(f" {key}: {value}")
except Exception as e:
print(f"❌ Error getting Lambda configuration: {e}")
sys.exit(1)
# Update environment variables
print(f"\n🔧 Updating Lambda environment variables...")
new_env = current_env.copy()
new_env['COGNITO_REGION'] = region
new_env['COGNITO_USER_POOL_ID'] = user_pool_id
new_env['COGNITO_APP_CLIENT_ID'] = app_client_id
try:
lambda_client.update_function_configuration(
FunctionName=function_name,
Environment={'Variables': new_env}
)
print(f"✅ Lambda environment variables updated!")
print(f" New configuration:")
print(f" COGNITO_REGION: {region}")
print(f" COGNITO_USER_POOL_ID: {user_pool_id}")
print(f" COGNITO_APP_CLIENT_ID: {app_client_id}")
except Exception as e:
print(f"❌ Error updating Lambda: {e}")
sys.exit(1)
print("\n" + "=" * 70)
print("✅ Update Complete")
print("=" * 70)
print("\nYou can now test the Gateway:")
print(" python test_gateway.py --username <username> --password <password>")
print("\n" + "=" * 70)
if __name__ == '__main__':
main()
@@ -0,0 +1,347 @@
# IAM Policy Templates for SSM Parameter Store Access
This directory contains IAM policy templates for managing access to lakehouse-agent SSM parameters.
## Policy Files
### 1. lakehouse-ssm-read-policy.json
**Purpose**: Read-only access to SSM parameters for application runtime
**Use Cases**:
- Lambda function execution roles
- ECS task execution roles
- EC2 instance profiles
- AgentCore Runtime roles
- Any service that needs to read configuration
**Permissions Granted**:
- `ssm:GetParameter` - Read individual parameters
- `ssm:GetParametersByPath` - Bulk read parameters with lh_ prefix
- `kms:Decrypt` - Decrypt SecureString parameters
- `sts:GetCallerIdentity` - Get AWS account ID for auto-detection
**Security Features**:
- Restricted to `lh_*` parameters only
- KMS decrypt only via SSM service
- Region-restricted (uses ${AWS_REGION} placeholder)
### 2. lakehouse-ssm-admin-policy.json
**Purpose**: Full management access for DevOps and migration operations
**Use Cases**:
- DevOps engineers managing configuration
- CI/CD pipelines deploying infrastructure
- Migration utility execution
- Parameter backup and restore operations
**Permissions Granted**:
- All read permissions from read-only policy
- `ssm:PutParameter` - Create/update parameters
- `ssm:DeleteParameter` - Remove parameters
- `ssm:GetParameterHistory` - View parameter versions
- `ssm:AddTagsToResource` - Tag parameters
- `ssm:DescribeParameters` - List all parameters
**Security Features**:
- Restricted to `lh_*` parameters only
- Region-restricted (uses ${AWS_REGION} placeholder)
- Includes tagging permissions for organization
## Usage Instructions
### Creating Policies in AWS
**Option 1: Using AWS CLI**
```bash
# Set your AWS region
export AWS_REGION=us-east-1
# Create read-only policy
aws iam create-policy \
--policy-name LakehouseSSMReadPolicy \
--policy-document file://lakehouse-ssm-read-policy.json \
--description "Read-only access to lakehouse-agent SSM parameters"
# Create admin policy
aws iam create-policy \
--policy-name LakehouseSSMAdminPolicy \
--policy-document file://lakehouse-ssm-admin-policy.json \
--description "Full management access to lakehouse-agent SSM parameters"
```
**Option 2: Using AWS Console**
1. Navigate to IAM → Policies → Create policy
2. Click "JSON" tab
3. Copy contents of policy file
4. Replace `${AWS_REGION}` with your region (e.g., `us-east-1`)
5. Click "Next: Tags"
6. Add tags (optional):
- Key: `Application`, Value: `lakehouse-agent`
- Key: `Environment`, Value: `production`
7. Click "Next: Review"
8. Enter policy name and description
9. Click "Create policy"
### Attaching Policies to Roles
**Attach to Lambda Execution Role**:
```bash
# For application runtime (read-only)
aws iam attach-role-policy \
--role-name lakehouse-mcp-server-role \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMReadPolicy
# For migration utility (admin)
aws iam attach-role-policy \
--role-name lakehouse-admin-role \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMAdminPolicy
```
**Attach to IAM User**:
```bash
# For DevOps engineer
aws iam attach-user-policy \
--user-name devops-engineer \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMAdminPolicy
```
**Attach to IAM Group**:
```bash
# Create group for lakehouse admins
aws iam create-group --group-name lakehouse-admins
# Attach policy to group
aws iam attach-group-policy \
--group-name lakehouse-admins \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMAdminPolicy
# Add users to group
aws iam add-user-to-group \
--group-name lakehouse-admins \
--user-name devops-engineer
```
### Verifying Policy Attachment
```bash
# List policies attached to a role
aws iam list-attached-role-policies --role-name lakehouse-mcp-server-role
# List policies attached to a user
aws iam list-attached-user-policies --user-name devops-engineer
# Get policy details
aws iam get-policy \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMReadPolicy
# Get policy version (to see actual permissions)
aws iam get-policy-version \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMReadPolicy \
--version-id v1
```
## Policy Customization
### Restricting to Specific Region
Replace `${AWS_REGION}` placeholder with your specific region:
```json
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
```
### Adding MFA Requirement
Add MFA requirement for admin operations:
```json
{
"Sid": "SSMParameterManagement",
"Effect": "Allow",
"Action": ["ssm:PutParameter", "ssm:DeleteParameter"],
"Resource": "arn:aws:ssm:*:*:parameter/lh_*",
"Condition": {
"Bool": {
"aws:MultiFactorAuthPresent": "true"
}
}
}
```
### Restricting by Environment Tag
Allow access only to parameters tagged with specific environment:
```json
{
"Sid": "SSMParameterRead",
"Effect": "Allow",
"Action": ["ssm:GetParameter"],
"Resource": "arn:aws:ssm:*:*:parameter/lh_*",
"Condition": {
"StringEquals": {
"ssm:ResourceTag/Environment": "production"
}
}
}
```
### Time-Based Access
Restrict access to business hours:
```json
{
"Sid": "SSMParameterManagement",
"Effect": "Allow",
"Action": ["ssm:PutParameter"],
"Resource": "arn:aws:ssm:*:*:parameter/lh_*",
"Condition": {
"DateGreaterThan": {"aws:CurrentTime": "2024-01-01T09:00:00Z"},
"DateLessThan": {"aws:CurrentTime": "2024-12-31T17:00:00Z"}
}
}
```
## Testing Policies
### Test Read Access
```bash
# Assume role with read-only policy
aws sts assume-role \
--role-arn arn:aws:iam::XXXXXXXXXXXX:role/lakehouse-mcp-server-role \
--role-session-name test-session
# Export credentials
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
# Test reading parameter
aws ssm get-parameter --name lh_s3_bucket_name
# Test reading all parameters
aws ssm get-parameters-by-path --path /lh_ --recursive
# Test writing (should fail)
aws ssm put-parameter \
--name lh_test \
--value "test" \
--type String
# Expected: AccessDeniedException
```
### Test Admin Access
```bash
# Assume role with admin policy
aws sts assume-role \
--role-arn arn:aws:iam::XXXXXXXXXXXX:role/lakehouse-admin-role \
--role-session-name admin-session
# Test creating parameter
aws ssm put-parameter \
--name lh_test_param \
--value "test-value" \
--type String
# Test updating parameter
aws ssm put-parameter \
--name lh_test_param \
--value "updated-value" \
--type String \
--overwrite
# Test deleting parameter
aws ssm delete-parameter --name lh_test_param
```
## Troubleshooting
### AccessDeniedException
**Error**: `User: arn:aws:iam::XXXXXXXXXXXX:role/MyRole is not authorized to perform: ssm:GetParameter`
**Solutions**:
1. Verify policy is attached to role:
```bash
aws iam list-attached-role-policies --role-name MyRole
```
2. Check policy document has correct permissions:
```bash
aws iam get-policy-version \
--policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/LakehouseSSMReadPolicy \
--version-id v1
```
3. Verify parameter name starts with `lh_`:
```bash
aws ssm describe-parameters --filters "Key=Name,Values=lh_"
```
### KMS Decrypt Error
**Error**: `User is not authorized to perform: kms:Decrypt`
**Solutions**:
1. Verify KMS permission in policy includes condition:
```json
"Condition": {
"StringEquals": {
"kms:ViaService": "ssm.*.amazonaws.com"
}
}
```
2. Check KMS key policy allows your role:
```bash
aws kms get-key-policy \
--key-id alias/aws/ssm \
--policy-name default
```
### Region Mismatch
**Error**: Parameters not found or access denied
**Solutions**:
1. Verify you're in the correct region:
```bash
aws configure get region
```
2. Check parameter exists in that region:
```bash
aws ssm get-parameter --name lh_s3_bucket_name --region us-east-1
```
## Best Practices
1. **Use Read-Only Policy by Default**: Grant admin access only when necessary
2. **Implement Least Privilege**: Start with minimal permissions and add as needed
3. **Use IAM Groups**: Manage permissions via groups, not individual users
4. **Enable MFA**: Require MFA for admin operations
5. **Regular Audits**: Review policy attachments quarterly
6. **Tag Resources**: Use tags for organization and conditional access
7. **Monitor Access**: Set up CloudWatch alarms for unauthorized access
8. **Document Changes**: Keep change log for policy modifications
9. **Test Policies**: Always test in non-production first
10. **Version Control**: Store policy files in git for change tracking
## Related Documentation
- [SSM Configuration Guide](../README.md#configuration-management-with-aws-systems-manager-ssm)
- [Security Setup](../SECURITY_SETUP.md#ssm-parameter-store-security)
- [Migration Guide](../README.md#migration-from-env-to-ssm)
- [AWS IAM Best Practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html)
- [AWS SSM Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)
@@ -0,0 +1,56 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SSMParameterManagement",
"Effect": "Allow",
"Action": [
"ssm:PutParameter",
"ssm:GetParameter",
"ssm:GetParametersByPath",
"ssm:DescribeParameters",
"ssm:DeleteParameter",
"ssm:GetParameterHistory",
"ssm:AddTagsToResource",
"ssm:RemoveTagsFromResource",
"ssm:ListTagsForResource"
],
"Resource": "arn:aws:ssm:*:*:parameter/lh_*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "${AWS_REGION}"
}
}
},
{
"Sid": "KMSDecryptForSSM",
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:DescribeKey"
],
"Resource": "arn:aws:kms:*:*:key/*",
"Condition": {
"StringEquals": {
"kms:ViaService": "ssm.*.amazonaws.com"
}
}
},
{
"Sid": "STSGetCallerIdentity",
"Effect": "Allow",
"Action": [
"sts:GetCallerIdentity"
],
"Resource": "*"
},
{
"Sid": "SSMListParameters",
"Effect": "Allow",
"Action": [
"ssm:DescribeParameters"
],
"Resource": "*"
}
]
}
@@ -0,0 +1,40 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SSMParameterRead",
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ssm:GetParametersByPath"
],
"Resource": "arn:aws:ssm:*:*:parameter/lh_*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "${AWS_REGION}"
}
}
},
{
"Sid": "KMSDecryptForSSM",
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": "arn:aws:kms:*:*:key/*",
"Condition": {
"StringEquals": {
"kms:ViaService": "ssm.*.amazonaws.com"
}
}
},
{
"Sid": "STSGetCallerIdentity",
"Effect": "Allow",
"Action": [
"sts:GetCallerIdentity"
],
"Resource": "*"
}
]
}
@@ -0,0 +1,72 @@
# Build artifacts
build/
dist/
*.egg-info/
*.egg
# Python cache
__pycache__/
__pycache__*
*.py[cod]
*$py.class
*.so
.Python
# Virtual environments
.venv/
.env
venv/
env/
ENV/
# Testing
.pytest_cache/
.coverage
.coverage*
htmlcov/
.tox/
*.cover
.hypothesis/
.mypy_cache/
.ruff_cache/
# Development
*.log
*.bak
*.swp
*.swo
*~
.DS_Store
# IDEs
.vscode/
.idea/
# Version control
.git/
.gitignore
.gitattributes
# Documentation
docs/
# CI/CD
.github/
.gitlab-ci.yml
.travis.yml
# Project specific
tests/
# Bedrock AgentCore specific - keep config but exclude runtime files
.bedrock_agentcore.yaml
.dockerignore
.bedrock_agentcore/
# Keep wheelhouse for offline installations
# wheelhouse/
# Monorepo directories
cdk/
terraform/
mcp/lambda/
@@ -0,0 +1,41 @@
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
WORKDIR /app
# All environment variables in one layer
ENV UV_SYSTEM_PYTHON=1 \
UV_COMPILE_BYTECODE=1 \
UV_NO_PROGRESS=1 \
PYTHONUNBUFFERED=1 \
DOCKER_CONTAINER=1 \
AWS_REGION=us-east-1 \
AWS_DEFAULT_REGION=us-east-1
COPY requirements.txt requirements.txt
# Install from requirements file
RUN uv pip install -r requirements.txt
RUN uv pip install aws-opentelemetry-distro==0.12.2
# Signal that this is running in Docker for host binding logic
ENV DOCKER_CONTAINER=1
# Create non-root user
RUN useradd -m -u 1000 bedrock_agentcore
USER bedrock_agentcore
EXPOSE 9000
EXPOSE 8000
EXPOSE 8080
# Copy entire project (respecting .dockerignore)
COPY . .
# Use the full module path
CMD ["opentelemetry-instrument", "python", "-m", "lakehouse_agent"]
@@ -0,0 +1,406 @@
#!/usr/bin/env python3
"""
Deploy Lakehouse Agent to AgentCore Runtime
This script deploys the health lakehouse data agent to Amazon Bedrock AgentCore Runtime
using the Bedrock AgentCore Starter Toolkit.
Prerequisites:
- AWS credentials configured
- Docker running
- Gateway configured (run create_gateway.py)
- Configuration in SSM Parameter Store (see README.md)
- bedrock-agentcore-starter-toolkit installed
Usage:
python deploy_lakehouse_agent.py
"""
import sys
import boto3
import json
try:
from bedrock_agentcore_starter_toolkit import Runtime
except ImportError:
print("\n❌ Error: bedrock-agentcore-starter-toolkit not installed")
print(" Please install it with: pip install bedrock-agentcore-starter-toolkit")
sys.exit(1)
class SSMConfig:
"""Load configuration from SSM Parameter Store."""
def __init__(self):
"""Initialize and load configuration from SSM."""
# Get region from boto3 session
session = boto3.Session()
self.region = session.region_name
self.ssm = boto3.client('ssm', region_name=self.region)
self.sts = boto3.client('sts', region_name=self.region)
# Get account ID
self.account_id = self.sts.get_caller_identity()['Account']
print(f"✅ Using AWS configuration")
print(f" Region: {self.region}")
print(f" Account: {self.account_id}")
# Load configuration from SSM
print(f"\n🔍 Loading configuration from SSM Parameter Store...")
self.gateway_arn = self._get_parameter('/app/lakehouse-agent/gateway-arn', required=False)
self.cognito_user_pool_id = self._get_parameter('/app/lakehouse-agent/cognito-user-pool-id', required=False)
self.cognito_app_client_id = self._get_parameter('/app/lakehouse-agent/cognito-app-client-id', required=False)
if self.gateway_arn:
print(f" ✅ Gateway ARN: {self.gateway_arn}")
else:
print(f" ⚠️ Gateway ARN not configured")
if self.cognito_user_pool_id and self.cognito_app_client_id:
print(f" ✅ Cognito configured")
else:
print(f" ⚠️ Cognito not configured - will use IAM authentication")
def _get_parameter(self, parameter_name: str, required: bool = True) -> str:
"""Get parameter value from SSM Parameter Store."""
try:
response = self.ssm.get_parameter(Name=parameter_name)
return response['Parameter']['Value']
except self.ssm.exceptions.ParameterNotFound:
if required:
print(f"❌ SSM parameter {parameter_name} not found")
print(f" Please run the setup scripts first")
sys.exit(1)
return None
except Exception as e:
if required:
print(f"❌ Error retrieving parameter {parameter_name}: {e}")
sys.exit(1)
return None
def store_agent_parameters(self, runtime_arn: str, runtime_id: str):
"""Store Lakehouse Agent runtime information in SSM Parameter Store."""
print("\n💾 Storing agent configuration in SSM Parameter Store...")
parameters = [
{
'name': '/app/lakehouse-agent/agent-runtime-arn',
'value': runtime_arn,
'description': 'Lakehouse Agent runtime ARN on AgentCore'
},
{
'name': '/app/lakehouse-agent/agent-runtime-id',
'value': runtime_id,
'description': 'Lakehouse Agent runtime ID on AgentCore'
},
{
'name': '/app/lakehouse-agent/agent-name',
'value': 'lakehouse_agent',
'description': 'Lakehouse Agent name'
}
]
for param in parameters:
try:
self.ssm.put_parameter(
Name=param['name'],
Value=param['value'],
Description=param['description'],
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: {param['name']} = {param['value']}")
except Exception as e:
print(f"❌ Error storing parameter {param['name']}: {e}")
raise
def create_agent_role(config: SSMConfig):
"""Create IAM role for Lakehouse Agent Runtime execution."""
iam = boto3.client('iam', region_name=config.region)
role_name = 'AgentCoreRuntimeRole-lakehouse-agent'
# Trust policy for AgentCore Runtime
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "bedrock-agentcore.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
# Permissions policy
permissions_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"bedrock-agentcore:InvokeGateway",
"bedrock-agentcore:GetGateway"
],
"Resource": f"arn:aws:bedrock-agentcore:{config.region}:{config.account_id}:gateway/*"
},
{
"Effect": "Allow",
"Action": [
"logs:*",
"xray:*"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ssm:GetParameters"
],
"Resource": f"arn:aws:ssm:{config.region}:{config.account_id}:parameter/app/lakehouse-agent/*"
}
]
}
try:
# Create role
print(f"Creating IAM role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='AgentCore Runtime execution role for lakehouse data agent'
)
role_arn = response['Role']['Arn']
# Attach inline policy
iam.put_role_policy(
RoleName=role_name,
PolicyName='AgentCoreRuntimePermissions',
PolicyDocument=json.dumps(permissions_policy)
)
print(f"✅ Created IAM role: {role_arn}")
return role_arn
except iam.exceptions.EntityAlreadyExistsException:
print(f"️ Role {role_name} already exists, retrieving ARN")
response = iam.get_role(RoleName=role_name)
role_arn = response['Role']['Arn']
# Update the role policy to ensure it has all required permissions
print(f" Updating role policy with latest permissions...")
iam.put_role_policy(
RoleName=role_name,
PolicyName='AgentCoreRuntimePermissions',
PolicyDocument=json.dumps(permissions_policy)
)
print(f" ✅ Role policy updated")
return role_arn
def deploy_to_runtime(config: SSMConfig, role_arn: str):
"""Deploy lakehouse agent to AgentCore Runtime using starter toolkit."""
runtime_name = 'lakehouse_agent' # Must use underscores, not hyphens
try:
print(f"\n🚀 Deploying Lakehouse Agent to AgentCore Runtime...")
print(f" Name: {runtime_name}")
print(f" Region: {config.region}")
print(f" This will build a Docker container and deploy it...")
# Build environment variables
env_vars = {
'AWS_REGION': config.region
}
if config.gateway_arn:
env_vars['GATEWAY_ARN'] = config.gateway_arn
print(f"\n📋 Environment variables:")
for key, value in env_vars.items():
print(f" {key}: {value}")
# Initialize Runtime from starter toolkit
agentcore_runtime = Runtime()
# Configure the runtime
print(f"\n🔧 Configuring AgentCore Runtime...")
# Extract role name from ARN (format: arn:aws:iam::account:role/RoleName)
role_name = role_arn.split('/')[-1]
# Build configuration parameters
config_params = {
'entrypoint': "lakehouse_agent.py",
'execution_role': role_name, # Use role name, not ARN
'auto_create_ecr': True,
'requirements_file': "requirements.txt",
'region': config.region,
# Note: Not specifying protocol - will use default HTTP protocol for JWT auth
'agent_name': runtime_name
}
# Add JWT authentication configuration if Cognito is configured
if config.cognito_user_pool_id and config.cognito_app_client_id:
print(f" Configuring JWT authentication...")
issuer = f'https://cognito-idp.{config.region}.amazonaws.com/{config.cognito_user_pool_id}'
discovery_url = f'{issuer}/.well-known/openid-configuration'
print(f" Discovery URL: {discovery_url}")
print(f" Allowed Clients: {config.cognito_app_client_id}")
config_params['authorizer_configuration'] = {
'customJWTAuthorizer': {
'allowedClients': [config.cognito_app_client_id],
'discoveryUrl': discovery_url
}
}
# Add Authorization header to allowlist for OAuth token propagation
config_params['request_header_configuration'] = {
'requestHeaderAllowlist': ['Authorization']
}
print(f"✅ JWT authentication will be configured")
else:
print(f"⚠️ Cognito not configured - runtime will use IAM authentication")
agentcore_runtime.configure(**config_params)
print(f"✅ Configuration complete")
# Launch the runtime (builds Docker image and deploys)
print(f"\n🚀 Launching to AgentCore Runtime...")
print(f" This may take several minutes...")
launch_result = agentcore_runtime.launch()
runtime_arn = launch_result.agent_arn
runtime_id = launch_result.agent_id
print(f"\n✅ Lakehouse Agent deployed successfully!")
print(f" Runtime ARN: {runtime_arn}")
print(f" Runtime ID: {runtime_id}")
return {
'runtime_arn': runtime_arn,
'runtime_id': runtime_id,
'role_arn': role_arn
}
except Exception as e:
print(f"\n❌ Error deploying runtime: {str(e)}")
import traceback
traceback.print_exc()
raise
def main():
"""Main deployment function."""
print("=" * 70)
print("Lakehouse Data Agent Deployment to AgentCore Runtime")
print("=" * 70)
# Load configuration from SSM
config = SSMConfig()
# Validate configuration
print("\n🔍 Validating configuration...")
if not config.gateway_arn:
print("\n⚠️ Warning: GATEWAY_ARN not set in SSM Parameter Store")
print(" The agent will not be able to access Gateway tools")
response = input("\nProceed anyway? (yes/no): ")
if response.lower() not in ['yes', 'y']:
print("Deployment cancelled")
sys.exit(0)
print("✅ Configuration validated")
# Print configuration summary
print(f"\n📋 Configuration:")
print(f" Region: {config.region}")
print(f" Gateway ARN: {config.gateway_arn or 'Not configured'}")
try:
# Step 1: Create IAM role
print("\n" + "=" * 70)
print("Step 1: Creating IAM Role")
print("=" * 70)
role_arn = create_agent_role(config)
# Step 2: Deploy to runtime
print("\n" + "=" * 70)
print("Step 2: Deploying to AgentCore Runtime")
print("=" * 70)
result = deploy_to_runtime(config, role_arn)
# Step 3: Store agent parameters in SSM
print("\n" + "=" * 70)
print("Step 3: Storing Agent Configuration")
print("=" * 70)
config.store_agent_parameters(result['runtime_arn'], result['runtime_id'])
# Print summary
print("\n" + "=" * 70)
print("Deployment Complete!")
print("=" * 70)
print("\n✅ Agent configuration stored in SSM Parameter Store:")
print(f" /app/lakehouse-agent/agent-runtime-arn")
print(f" /app/lakehouse-agent/agent-runtime-id")
print(f" /app/lakehouse-agent/agent-name")
# Print JWT configuration status
if config.cognito_user_pool_id and config.cognito_app_client_id:
print("\n✅ JWT Authentication Configured:")
print(f" Discovery URL: https://cognito-idp.{config.region}.amazonaws.com/{config.cognito_user_pool_id}/.well-known/openid-configuration")
print(f" Allowed Clients: {config.cognito_app_client_id}")
print(f" Authorization header: Enabled for OAuth token propagation")
else:
print("\n⚠️ JWT Authentication Not Configured:")
print(" Runtime deployed with IAM authentication")
print(" To enable JWT auth, set COGNITO_USER_POOL_ID and COGNITO_APP_CLIENT_ID in SSM and redeploy")
print("\n📋 Next Steps:")
print(" 1. Test the agent: python ../test_agent_simple.py")
print(" 2. Test E2E flow: python ../test_e2e_flow.py")
print(" 3. Deploy the Streamlit UI: cd ../streamlit-ui && streamlit run streamlit_app.py")
print("\n" + "=" * 70)
except Exception as e:
print(f"\n❌ Deployment failed: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
main()
@@ -0,0 +1,191 @@
#!/usr/bin/env python3
"""
Health Lakehouse Data Agent using Strands and AgentCore Gateway
Connects to Gateway tools for querying and managing lakehouse data with OAuth-based access control
"""
import os
import logging
from strands import Agent
from strands.models import BedrockModel
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
from bedrock_agentcore import BedrockAgentCoreApp
from typing import Dict, Any, Optional
import boto3
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Bypass tool consent for AgentCore deployment
os.environ["BYPASS_TOOL_CONSENT"] = "true"
# Initialize AgentCore App
app = BedrockAgentCoreApp()
# System prompt for lakehouse data agent
CLAIMS_SYSTEM_PROMPT = """
You are a helpful lakehouse data assistant that provides tools to help users query and update data in the lakehouse.
**Technical Context**:
You have access to tools that query an Athena database with row-level security
Users can only see and manage their own claims
**Communication Guidelines**:
Be professional, empathetic, and clear
Explain insurance terms in simple language
When helping with claims, gather all necessary information before submission
**DO NOT MAKE UP ANSWERS. YOUR RESPONSES SHOULD BE BASED ON SOLID FACTS ONLY. DO NOT ANSWER WHEN YOU DO NOT KNOW**
"""
# Default model ID
MODEL_ID = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"
def get_config() -> Dict[str, Optional[str]]:
"""
Load configuration from environment variables and SSM Parameter Store.
Priority:
1. Environment variables (set by AgentCore Runtime)
2. SSM Parameter Store
3. Defaults
Returns:
Dictionary with configuration values
"""
config = {}
# Get region from boto3 session with proper fallback
try:
session = boto3.Session()
config['region'] = (
session.region_name or
os.environ.get('AWS_REGION') or
os.environ.get('AWS_DEFAULT_REGION') or
'us-east-1'
)
if not session.region_name:
logger.warning("⚠️ No region in AWS config, using fallback")
logger.info(f"✅ Region: {config['region']}")
except Exception as e:
logger.warning(f"⚠️ Could not detect region: {e}")
config['region'] = 'us-east-1'
# Try to get Gateway ARN from environment variable first
config['gateway_arn'] = os.environ.get('GATEWAY_ARN')
# If not in environment, try SSM Parameter Store
if not config['gateway_arn']:
try:
ssm = boto3.client('ssm', region_name=config['region'])
response = ssm.get_parameter(Name='/app/lakehouse-agent/gateway-arn')
config['gateway_arn'] = response['Parameter']['Value']
logger.info(f"✅ Gateway ARN from SSM: {config['gateway_arn']}")
except Exception as e:
logger.warning(f"⚠️ Gateway ARN not found in SSM: {e}")
config['gateway_arn'] = None
else:
logger.info(f"✅ Gateway ARN from environment: {config['gateway_arn']}")
return config
def get_gateway_url(gateway_arn: str, region: str) -> str:
"""Convert Gateway ARN to URL using AgentCore API."""
try:
# Extract gateway ID from ARN
# Format: arn:aws:bedrock-agentcore:region:account:gateway/gateway-id
gateway_id = gateway_arn.split('/')[-1]
# Get gateway details
agentcore_client = boto3.client('bedrock-agentcore-control', region_name=region)
response = agentcore_client.get_gateway(gatewayIdentifier=gateway_id)
gateway_url = response['gatewayUrl']
logger.info(f"✅ Gateway URL: {gateway_url}")
return gateway_url
except Exception as e:
logger.error(f"❌ Error getting gateway URL: {e}")
return ''
@app.entrypoint
def handle_request(payload: Dict[str, Any]) -> Dict[str, Any]:
"""
Handle requests to the lakehouse agent.
Args:
payload: Request with prompt and bearer token
Returns:
Agent response
"""
user_prompt = payload.get('prompt', 'Hello')
bearer_token = payload.get('bearer_token', '')
logger.info(f"📥 Received request: {user_prompt[:100]}...")
logger.info(f"🔑 Bearer token present: {bool(bearer_token)} {bearer_token}")
# Load configuration
config = get_config()
gateway_arn = config['gateway_arn']
region = config['region']
# Get tools from Gateway if configured
tools = []
logger.info(f"🔗 Connecting to Gateway: {gateway_arn}")
gateway_url = get_gateway_url(gateway_arn, region)
# Create auth headers with bearer token
auth_headers = {'Authorization': f'Bearer {bearer_token}'}
# Create MCP client with authentication
mcp_client = MCPClient(
lambda: streamablehttp_client(gateway_url, headers=auth_headers),
prefix="claims"
)
# Open connection and get tools
mcp_client.__enter__()
tools = mcp_client.list_tools_sync()
logger.info(f"✅ Loaded {len(tools)} tools from Gateway")
# Create Bedrock model
model = BedrockModel(
model_id=MODEL_ID,
region_name=region
)
# Create agent with Gateway tools (if available)
agent = Agent(
model=model,
tools=tools,
system_prompt=CLAIMS_SYSTEM_PROMPT
)
# Process request
logger.info("⏳ Processing request...")
response = agent(user_prompt)
logger.info("✅ Request processed")
# Extract response content
response_text = ""
if hasattr(response, 'message') and 'content' in response.message:
for content in response.message['content']:
if isinstance(content, dict) and 'text' in content:
response_text += content['text']
else:
response_text = str(response)
return {
"content": response_text,
"tool_calls": len(response.tool_calls) if hasattr(response, 'tool_calls') else 0
}
if __name__ == "__main__":
app.run()
@@ -0,0 +1,4 @@
bedrock-agentcore>=1.0.0
strands-agents>=1.0.0
boto3>=1.34.0
mcp>=1.0.0
@@ -0,0 +1,72 @@
# Build artifacts
build/
dist/
*.egg-info/
*.egg
# Python cache
__pycache__/
__pycache__*
*.py[cod]
*$py.class
*.so
.Python
# Virtual environments
.venv/
.env
venv/
env/
ENV/
# Testing
.pytest_cache/
.coverage
.coverage*
htmlcov/
.tox/
*.cover
.hypothesis/
.mypy_cache/
.ruff_cache/
# Development
*.log
*.bak
*.swp
*.swo
*~
.DS_Store
# IDEs
.vscode/
.idea/
# Version control
.git/
.gitignore
.gitattributes
# Documentation
docs/
# CI/CD
.github/
.gitlab-ci.yml
.travis.yml
# Project specific
tests/
# Bedrock AgentCore specific - keep config but exclude runtime files
.bedrock_agentcore.yaml
.dockerignore
.bedrock_agentcore/
# Keep wheelhouse for offline installations
# wheelhouse/
# Monorepo directories
cdk/
terraform/
mcp/lambda/
@@ -0,0 +1 @@
Dockerfile
@@ -0,0 +1,109 @@
# M2M Authentication Test Results
## Test Script: simple_mcp_test.py
### ✅ SUCCESS: M2M Token Acquisition
The test successfully obtains an M2M access token from Cognito:
```
Client ID: 1o7qt3g8mc071403me6sn99nho
Domain: https://lakehouse-uswest2f.auth.us-west-2.amazoncognito.com
User Pool: us-west-2_F9ClCY8Bk
Scopes: default-m2m-resource-server-vccgqz/read
Token obtained successfully!
Token type: Bearer
Expires in: 3600 seconds
Token Claims:
Issuer: https://cognito-idp.us-west-2.amazonaws.com/us-west-2_F9ClCY8Bk
Client ID: 1o7qt3g8mc071403me6sn99nho
Scope: default-m2m-resource-server-vccgqz/read
Token Use: access
```
### ✅ SUCCESS: JWT Authentication
The JWT token is accepted by AgentCore Runtime:
- No 401 Unauthorized errors
- JWT authorizer validates the token
- Client ID matches the allowedClients configuration
### ❌ ISSUE: MCP Protocol Handshake
The MCP initialize request times out:
- Request doesn't reach the MCP server (no logs)
- AgentCore Runtime expects streaming protocol
- Simple HTTP POST requests don't work for MCP protocol
## Root Cause
AgentCore Runtime's MCP implementation requires:
1. **Bidirectional streaming** - Not supported by simple HTTP POST
2. **Server-Sent Events (SSE)** or **WebSocket** - For MCP protocol messages
3. **MCP client library** - Which has compatibility issues
## Conclusion
**M2M Authentication: 100% Working**
- Token acquisition: ✅
- JWT validation: ✅
- Authorization: ✅
**MCP Protocol: Not Working with HTTP POST**
- Requires streaming connection
- Simple HTTP requests insufficient
- Need proper MCP client or Gateway
## Recommendations
### Option 1: Use AgentCore Gateway (RECOMMENDED)
Deploy the MCP server behind AgentCore Gateway:
```bash
cd gateway-setup
python create_gateway.py
```
Benefits:
- Gateway handles MCP protocol complexity
- Production-ready pattern
- Supports OAuth/JWT authentication
- Agent connects to Gateway, not directly to Runtime
### Option 2: Use Lakehouse Agent
The lakehouse agent has proper MCP client implementation:
```bash
cd lakehouse-agent
python deploy_lakehouse_agent.py
```
The agent's MCP client may handle the streaming protocol correctly.
### Option 3: Debug MCP Client Library
Investigate why the Python MCP client library (`mcp` package) hangs:
- Check library version compatibility
- Review streaming implementation
- Test with different configurations
## Files
- **simple_mcp_test.py** - Working test for M2M token + JWT auth
- **test_http_endpoint.py** - HTTP endpoint test (shows 406 with correct headers)
- **test_mcp_server.py** - Full MCP test (hangs during initialization)
## Usage
To test M2M authentication:
```bash
cd mcp-lakehouse-server
AWS_REGION=us-west-2 python simple_mcp_test.py
```
Expected output:
- ✅ Token obtained
- ✅ Token claims validated
- ❌ MCP initialize times out (expected - requires streaming)
@@ -0,0 +1,370 @@
"""
Secure Athena Tools
This implementation uses user based filtering for row-level security:
- User identity passed as session tags when assuming IAM role
- NO application-level SQL manipulation
- NO SQL injection risk
Security Flow:
1. Gateway interceptor extracts user_id from JWT
2. MCP server receives user_id in headers
3. MCP server assumes IAM role WITH session tag: user_id=<actual_user>
4. Athena queries use those credentials
"""
import boto3
import time
from typing import List, Dict, Any, Optional
from botocore.exceptions import ClientError
class SecureAthenaClaimsTools:
"""
Secure tools for querying health lakehouse data with Lake Formation RLS.
"""
def __init__(
self,
region: str,
database_name: str,
s3_output_location: str,
rls_role_arn: str
):
"""
Initialize secure Athena tools.
Args:
region: AWS region
database_name: Athena database name
s3_output_location: S3 location for query results
rls_role_arn: IAM role ARN with Lake Formation data filter permissions if setup
"""
self.region = region
self.database_name = database_name
self.s3_output_location = s3_output_location
self.rls_role_arn = rls_role_arn
self.sts_client = boto3.client('sts', region_name=region)
def _get_credentials_with_session_tag(self, user_id: str) -> Dict[str, str]:
"""
Assume IAM role with session tag containing user identity.
This is the KEY security mechanism:
- User identity is passed as a session tag
- Lake Formation uses this tag to filter data
- Filtering happens at AWS query engine, not application
Args:
user_id: User email/ID from OAuth token
Returns:
Temporary AWS credentials with session tag
"""
try:
# Assume role with session tags
response = self.sts_client.assume_role(
RoleArn=self.rls_role_arn,
RoleSessionName=f"claims-query-{user_id.replace('@', '-').replace('.', '-')}",
Tags=[
{
'Key': 'user_id',
'Value': user_id
}
],
DurationSeconds=3600 # 1 hour
)
credentials = response['Credentials']
return {
'aws_access_key_id': credentials['AccessKeyId'],
'aws_secret_access_key': credentials['SecretAccessKey'],
'aws_session_token': credentials['SessionToken']
}
except ClientError as e:
raise Exception(f"Error assuming role with session tags: {str(e)}")
def _get_athena_client(self, user_id: str):
"""
Get Athena client with user-specific credentials (session tags).
Args:
user_id: User email/ID
Returns:
Athena client with scoped credentials
"""
if self.rls_role_arn:
credentials = self._get_credentials_with_session_tag(user_id)
else:
credentials = {}
return boto3.client(
'athena',
region_name=self.region,
**credentials
)
def _execute_query(
self,
user_id: str,
query: str,
wait_for_results: bool = True
) -> Optional[List[Dict[str, Any]]]:
"""
Execute Athena query with user-scoped credentials.
IMPORTANT: This query does NOT include user_id filter in SQL!
The filtering is applied by Lake Formation based on session tags.
Args:
user_id: User email/ID (for session tag)
query: SQL query WITHOUT user filtering
wait_for_results: Whether to wait for completion
Returns:
Query results
"""
try:
# Get Athena client with user credentials
athena_client = self._get_athena_client(user_id)
# Execute query - Lake Formation will automatically apply row filter
response = athena_client.start_query_execution(
QueryString=query,
QueryExecutionContext={'Database': self.database_name},
ResultConfiguration={'OutputLocation': self.s3_output_location}
)
query_execution_id = response['QueryExecutionId']
if not wait_for_results:
return None
# Wait for query completion
max_wait_time = 30
start_time = time.time()
while time.time() - start_time < max_wait_time:
status_response = athena_client.get_query_execution(
QueryExecutionId=query_execution_id
)
status = status_response['QueryExecution']['Status']['State']
if status == 'SUCCEEDED':
break
elif status in ['FAILED', 'CANCELLED']:
error = status_response['QueryExecution']['Status'].get(
'StateChangeReason', 'Unknown error'
)
raise Exception(f"Query failed: {error}")
time.sleep(0.5)
# Get results
results_response = athena_client.get_query_results(
QueryExecutionId=query_execution_id,
MaxResults=100
)
# Parse results
rows = results_response['ResultSet']['Rows']
if len(rows) == 0:
return []
columns = [col['VarCharValue'] for col in rows[0]['Data']]
data = []
for row in rows[1:]:
row_data = {}
for i, col in enumerate(row['Data']):
row_data[columns[i]] = col.get('VarCharValue', '')
data.append(row_data)
return data
except Exception as e:
raise Exception(f"Error executing secure Athena query: {str(e)}")
# TODO Lakeformation as of now does not support dynamic query filters. https://docs.aws.amazon.com/lake-formation/latest/dg/data-filtering-notes.html
# https://repost.aws/questions/QUjGeTaN2US8mjiON0nzDJzw/dynamic-filter-on-lake-formation
# The below mechanism can be used for static filters in the query if required. Retaining this method for future use
def query_claims(
self,
user_id: str,
filters: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Query claims
NOTICE: No user_id in WHERE clause! Lake Formation adds it automatically.
Args:
user_id: User email (passed as session tag, not SQL parameter)
filters: Optional additional filters
Returns:
User's claims (automatically filtered by Lake Formation)
"""
try:
# Query WITHOUT user_id filter - Lake Formation adds it!
query = f"""
SELECT
claim_id,
patient_name,
claim_date,
claim_amount,
claim_type,
claim_status,
provider_name,
diagnosis_code,
submitted_date,
approved_amount,
notes
FROM {self.database_name}.claims
WHERE 1=1
AND user_id='{user_id}'
"""
# Add optional filters (safely)
if filters:
if 'claim_status' in filters and filters['claim_status']:
# Use parameterization instead of string interpolation
query += f" AND claim_status = '{filters['claim_status']}'"
if 'claim_type' in filters and filters['claim_type']:
query += f" AND claim_type = '{filters['claim_type']}'"
query += " ORDER BY submitted_date DESC LIMIT 50"
# Execute with user-scoped credentials
# Lake Formation will add: AND user_id = <session_tag[user_id]>
results = self._execute_query(user_id, query)
return {
"success": True,
"user_id": user_id,
"claims": results or [],
"count": len(results) if results else 0,
"message": f"Found {len(results) if results else 0} claims",
"security": "Row-level filtering enforced by AWS Lake Formation"
}
except Exception as e:
return {
"success": False,
"error": str(e),
"message": f"Error querying claims: {str(e)}"
}
def get_claim_details(self, user_id: str, claim_id: str) -> Dict[str, Any]:
"""
Get claim details - Lake Formation ensures user can only see their claims.
Args:
user_id: User email (for session tag)
claim_id: Claim ID
Returns:
Claim details (only if user owns it)
"""
try:
# Query without user_id check - Lake Formation handles it!
query = f"""
SELECT *
FROM {self.database_name}.claims
WHERE claim_id = '{claim_id}'
AND user_id='{user_id}'
"""
results = self._execute_query(user_id, query)
if results and len(results) > 0:
return {
"success": True,
"claim": results[0],
"message": f"Retrieved claim {claim_id}",
"security": "Access validated by AWS Lake Formation"
}
else:
return {
"success": False,
"message": f"Claim {claim_id} not found or access denied",
"security": "Lake Formation filtered this claim (not owned by user)"
}
except Exception as e:
return {
"success": False,
"error": str(e),
"message": f"Error retrieving claim: {str(e)}"
}
def get_claims_summary(self, user_id: str) -> Dict[str, Any]:
"""
Get claims summary - automatically scoped to user by Lake Formation.
Args:
user_id: User email
Returns:
Summary statistics (only for user's claims)
"""
try:
# Summary query without user_id filter
query = f"""
SELECT
COUNT(*) as total_claims,
SUM(CAST(claim_amount AS DECIMAL(10,2))) as total_amount,
SUM(CASE WHEN approved_amount != ''
THEN CAST(approved_amount AS DECIMAL(10,2))
ELSE 0 END) as total_approved,
COUNT(CASE WHEN claim_status = 'pending' THEN 1 END) as pending_claims,
COUNT(CASE WHEN claim_status = 'approved' THEN 1 END) as approved_claims,
COUNT(CASE WHEN claim_status = 'denied' THEN 1 END) as denied_claims
FROM {self.database_name}.claims
WHERE 1=1
AND user_id='{user_id}'
"""
results = self._execute_query(user_id, query)
if results and len(results) > 0:
summary = results[0]
return {
"success": True,
"user_id": user_id,
"summary": {
"total_claims": int(summary.get('total_claims', 0)),
"total_amount_claimed": float(summary.get('total_amount', 0) or 0),
"total_amount_approved": float(summary.get('total_approved', 0) or 0),
"pending_claims": int(summary.get('pending_claims', 0)),
"approved_claims": int(summary.get('approved_claims', 0)),
"denied_claims": int(summary.get('denied_claims', 0))
},
"message": "Claims summary retrieved successfully",
"security": "Automatically scoped to user by Lake Formation"
}
return {
"success": True,
"user_id": user_id,
"summary": {
"total_claims": 0,
"total_amount_claimed": 0.0,
"total_amount_approved": 0.0,
"pending_claims": 0,
"approved_claims": 0,
"denied_claims": 0
},
"message": "No claims found",
"security": "Lake Formation enforced row-level security"
}
except Exception as e:
return {
"success": False,
"error": str(e),
"message": f"Error retrieving summary: {str(e)}"
}
@@ -0,0 +1,48 @@
#!/bin/bash
# Check CloudWatch Logs for MCP Runtime
#
# Usage: ./check_logs.sh [--follow]
set -e
# Get runtime ARN from SSM
echo "📋 Getting runtime ARN from SSM..."
RUNTIME_ARN=$(aws ssm get-parameter --name /app/lakehouse-agent/mcp-server-runtime-arn --query 'Parameter.Value' --output text)
if [ -z "$RUNTIME_ARN" ]; then
echo "❌ Error: Could not get runtime ARN from SSM"
echo " Parameter: /app/lakehouse-agent/mcp-server-runtime-arn"
exit 1
fi
# Extract runtime ID from ARN
RUNTIME_ID=$(echo "$RUNTIME_ARN" | cut -d'/' -f2)
LOG_GROUP="/aws/bedrock-agentcore/runtime/$RUNTIME_ID"
echo "✅ Runtime ARN: $RUNTIME_ARN"
echo "✅ Runtime ID: $RUNTIME_ID"
echo "✅ Log Group: $LOG_GROUP"
echo ""
# Check if log group exists
if ! aws logs describe-log-groups --log-group-name-prefix "$LOG_GROUP" --query 'logGroups[0].logGroupName' --output text 2>/dev/null | grep -q "$LOG_GROUP"; then
echo "⚠️ Warning: Log group does not exist yet"
echo " This is normal if the runtime hasn't been invoked yet"
echo " Try invoking the runtime first, then check logs again"
exit 0
fi
echo "📊 Log group exists!"
echo ""
# Check for --follow flag
if [ "$1" = "--follow" ]; then
echo "🔄 Following logs (press Ctrl+C to stop)..."
echo ""
aws logs tail "$LOG_GROUP" --follow
else
echo "📜 Recent logs (last 50 lines):"
echo " Use './check_logs.sh --follow' to follow logs in real-time"
echo ""
aws logs tail "$LOG_GROUP" --since 1h | tail -50
fi
@@ -0,0 +1,143 @@
#!/usr/bin/env python3
"""
Check AgentCore Runtime CloudWatch Logs
This script retrieves recent logs from the MCP server runtime.
"""
import boto3
import sys
from datetime import datetime, timedelta
def get_logs_from_group(logs, log_group_name, minutes=10, limit=50):
"""Get recent logs from a log group."""
print(f"\n📋 Getting recent log streams from: {log_group_name}")
try:
response = logs.describe_log_streams(
logGroupName=log_group_name,
orderBy='LastEventTime',
descending=True,
limit=5
)
if not response['logStreams']:
print(f" ⚠️ No log streams found")
return
print(f" Found {len(response['logStreams'])} recent streams")
# Get logs from the most recent stream
stream_name = response['logStreams'][0]['logStreamName']
print(f"\n📄 Latest log stream: {stream_name}")
# Get logs from last N minutes
start_time = int((datetime.now() - timedelta(minutes=minutes)).timestamp() * 1000)
log_response = logs.get_log_events(
logGroupName=log_group_name,
logStreamName=stream_name,
startTime=start_time,
limit=limit
)
events = log_response['events']
if not events:
print(f"\n ⚠️ No recent log events in last {minutes} minutes")
else:
print(f"\n📝 Recent Log Events ({len(events)} events):")
print("=" * 70)
for event in events:
timestamp = datetime.fromtimestamp(event['timestamp'] / 1000)
message = event['message'].strip()
print(f"[{timestamp.strftime('%H:%M:%S')}] {message}")
except Exception as e:
print(f" ❌ Error: {e}")
def main():
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
logs = boto3.client('logs', region_name=region)
print("=" * 70)
print("Check AgentCore Runtime Logs")
print("=" * 70)
# Get runtime info from SSM
print("\n🔍 Loading runtime configuration from SSM...")
try:
runtime_arn = ssm.get_parameter(Name='/app/lakehouse-agent/mcp-server-runtime-arn')['Parameter']['Value']
runtime_id = ssm.get_parameter(Name='/app/lakehouse-agent/mcp-server-runtime-id')['Parameter']['Value']
print(f" MCP Server Runtime ARN: {runtime_arn}")
print(f" MCP Server Runtime ID: {runtime_id}")
except Exception as e:
print(f" ⚠️ MCP Server runtime not found: {e}")
runtime_id = None
# List ALL AgentCore log groups
print(f"\n🔍 Listing all AgentCore log groups...")
try:
response = logs.describe_log_groups(logGroupNamePrefix="/aws/bedrock-agentcore")
if response['logGroups']:
print(f"\n Available log groups ({len(response['logGroups'])}):")
for lg in response['logGroups']:
name = lg['logGroupName']
# Highlight MCP server log groups
if 'lakehouse_mcp_server' in name.lower() or 'mcp' in name.lower():
print(f" 🎯 {name} <-- MCP SERVER")
elif 'lakehouse_agent' in name.lower():
print(f" 🤖 {name} <-- AGENT")
else:
print(f" - {name}")
else:
print(f" No AgentCore log groups found")
sys.exit(1)
except Exception as e:
print(f" ❌ Error listing log groups: {e}")
sys.exit(1)
# Find MCP server log group
mcp_log_group = None
agent_log_group = None
for lg in response['logGroups']:
name = lg['logGroupName']
if 'lakehouse_mcp_server' in name.lower():
mcp_log_group = name
elif 'lakehouse_agent' in name.lower():
agent_log_group = name
# Show MCP server logs
if mcp_log_group:
print("\n" + "=" * 70)
print("🎯 MCP SERVER LOGS")
print("=" * 70)
get_logs_from_group(logs, mcp_log_group, minutes=15, limit=100)
else:
print("\n⚠️ MCP Server log group not found!")
print(" Expected pattern: /aws/bedrock-agentcore/runtimes/lakehouse_mcp_server-*")
# Optionally show agent logs
if agent_log_group:
print("\n" + "=" * 70)
print("🤖 AGENT LOGS (for comparison)")
print("=" * 70)
get_logs_from_group(logs, agent_log_group, minutes=5, limit=20)
print("\n" + "=" * 70)
print("💡 TIP: If MCP server logs don't show tool invocations, check:")
print(" 1. Gateway is routing to the correct MCP server runtime")
print(" 2. Gateway target configuration points to MCP server")
print(" 3. M2M authentication between Gateway and MCP server")
print("=" * 70)
if __name__ == '__main__':
main()
@@ -0,0 +1,56 @@
#!/bin/bash
# Deploy MCP Athena Server to AWS Lambda or AgentCore Runtime
set -e
echo "🚀 Deploying Health Lakehouse Data MCP Server"
# Check environment variables
if [ -z "$AWS_REGION" ]; then
AWS_REGION="us-east-1"
fi
if [ -z "$S3_OUTPUT_BUCKET" ]; then
echo "❌ Error: S3_OUTPUT_BUCKET environment variable is required"
echo " Set it to the bucket name for Athena query results"
exit 1
fi
echo " Region: $AWS_REGION"
echo " S3 Bucket: $S3_OUTPUT_BUCKET"
# Option 1: Deploy as AgentCore Gateway Target (MCP Server via Lambda)
echo ""
echo "📦 Packaging MCP server..."
# Create deployment package
mkdir -p dist
pip install -r requirements.txt -t dist/
cp server.py dist/
cp athena_tools.py dist/
cd dist
zip -r ../mcp-server.zip .
cd ..
echo "✅ Package created: mcp-server.zip"
# Option 2: Deploy using agentcore CLI (if using Runtime)
echo ""
echo "To deploy using agentcore CLI:"
echo " agentcore configure -e server.py"
echo " agentcore launch"
echo ""
echo "To deploy as Lambda function:"
echo " aws lambda create-function \\"
echo " --function-name lakehouse-mcp-server \\"
echo " --runtime python3.11 \\"
echo " --role YOUR_LAMBDA_ROLE_ARN \\"
echo " --handler server.handle_request \\"
echo " --zip-file fileb://mcp-server.zip \\"
echo " --environment Variables={AWS_REGION=$AWS_REGION,ATHENA_DATABASE=lakehouse_db,S3_OUTPUT_BUCKET=$S3_OUTPUT_BUCKET} \\"
echo " --timeout 60 \\"
echo " --memory-size 512"
echo ""
echo "✨ Deployment package ready!"
@@ -0,0 +1,494 @@
#!/usr/bin/env python3
"""
Deploy MCP Athena Server to AgentCore Runtime
This script deploys the MCP server to Amazon Bedrock AgentCore Runtime using
the Bedrock AgentCore Starter Toolkit. The server provides secure Athena query
tools with Lake Formation RLS.
Prerequisites:
- AWS credentials configured
- Docker running
- Lake Formation RLS configured (run setup_lake_formation.py)
- Configuration in SSM Parameter Store
- bedrock-agentcore-starter-toolkit installed
Usage:
python deploy_runtime.py
"""
import boto3
import json
import sys
try:
from bedrock_agentcore_starter_toolkit import Runtime
except ImportError:
print("\n❌ Error: bedrock-agentcore-starter-toolkit not installed")
print(" Please install it with: pip install bedrock-agentcore-starter-toolkit")
sys.exit(1)
class SSMConfig:
"""Load configuration from SSM Parameter Store."""
def __init__(self):
"""Initialize and load configuration from SSM."""
# Get region from boto3 session
session = boto3.Session()
self.region = session.region_name
self.ssm = boto3.client('ssm', region_name=self.region)
self.sts = boto3.client('sts', region_name=self.region)
# Get account ID
self.account_id = self.sts.get_caller_identity()['Account']
# Load configuration from SSM
self.s3_bucket_name = self._get_parameter('/app/lakehouse-agent/s3-bucket-name')
self.database_name = self._get_parameter('/app/lakehouse-agent/database-name')
self.cognito_user_pool_arn = self._get_parameter('/app/lakehouse-agent/cognito-user-pool-arn')
self.rls_role_arn = self._get_parameter('/app/lakehouse-agent/rls-role-arn', required=False)
if not self.rls_role_arn:
print("⚠️ Deploying without LakeFormation RLS.")
self.rls_role_arn = None
# Constants
self.security_mode = 'lakeformation'
self.log_level = 'DEBUG'
print(f"✅ Configuration loaded from SSM Parameter Store")
print(f" Region: {self.region}")
print(f" Account: {self.account_id}")
def _get_parameter(self, parameter_name: str, required: bool = True) -> str:
"""Get parameter value from SSM Parameter Store."""
try:
response = self.ssm.get_parameter(Name=parameter_name)
return response['Parameter']['Value']
except self.ssm.exceptions.ParameterNotFound:
if required:
print(f"❌ SSM parameter {parameter_name} not found")
print(f" Please run the setup scripts first")
sys.exit(1)
return None
except Exception as e:
if required:
print(f"❌ Error retrieving parameter {parameter_name}: {e}")
sys.exit(1)
return None
def is_valid(self) -> bool:
"""Check if all required configuration is present."""
return all([
self.s3_bucket_name,
self.database_name,
self.region,
self.account_id
])
def print_status(self):
"""Print configuration status."""
print(f"\n📋 Configuration Status:")
print(f" AWS Account: {self.account_id}")
print(f" Region: {self.region}")
print(f" S3 Bucket: {self.s3_bucket_name}")
print(f" Database: {self.database_name}")
print(f" RLS Role ARN: {self.rls_role_arn}")
print(f" Cognito User Pool ARN: {self.cognito_user_pool_arn}")
print(f" Security Mode: {self.security_mode}")
print(f" Log Level: {self.log_level}")
def store_runtime_parameters(self, runtime_arn: str, runtime_id: str):
"""Store MCP server runtime information in SSM Parameter Store."""
print("\n💾 Storing runtime configuration in SSM Parameter Store...")
parameters = [
{
'name': '/app/lakehouse-agent/mcp-server-runtime-arn',
'value': runtime_arn,
'description': 'MCP Athena Server runtime ARN on AgentCore'
},
{
'name': '/app/lakehouse-agent/mcp-server-runtime-id',
'value': runtime_id,
'description': 'MCP Athena Server runtime ID on AgentCore'
}
]
for param in parameters:
try:
self.ssm.put_parameter(
Name=param['name'],
Value=param['value'],
Description=param['description'],
Type='String',
Overwrite=True
)
print(f"✅ Stored parameter: {param['name']} = {param['value']}")
except Exception as e:
print(f"❌ Error storing parameter {param['name']}: {e}")
raise
def create_runtime_role(config: SSMConfig):
"""Create IAM role for AgentCore Runtime execution."""
iam = boto3.client('iam', region_name=config.region)
role_name = 'AgentCoreRuntimeRole-lakehouse-mcp'
# Trust policy for AgentCore Runtime
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "bedrock-agentcore.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
# Permissions policy - base statements
statements = [
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:GetWorkGroup"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:GetBucketLocation"
],
"Resource": [
f"arn:aws:s3:::{config.s3_bucket_name}/*",
f"arn:aws:s3:::{config.s3_bucket_name}"
]
},
{
"Effect": "Allow",
"Action": [
"lakeformation:GetDataAccess"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": [
f"arn:aws:logs:{config.region}:{config.account_id}:log-group:/aws/bedrock-agentcore/*",
f"arn:aws:logs:{config.region}:{config.account_id}:log-group:/aws/agentcore/*"
]
},
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ssm:GetParameters"
],
"Resource": f"arn:aws:ssm:{config.region}:{config.account_id}:parameter/app/lakehouse-agent/*"
}
]
# Add STS AssumeRole permission only if rls_role_arn is set
if config.rls_role_arn:
statements.append({
"Effect": "Allow",
"Action": [
"sts:AssumeRole",
"sts:TagSession"
],
"Resource": config.rls_role_arn
})
permissions_policy = {
"Version": "2012-10-17",
"Statement": statements
}
try:
# Create role
print(f"Creating IAM role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='AgentCore Runtime execution role for lakehouse data MCP server'
)
role_arn = response['Role']['Arn']
print(json.dumps(permissions_policy))
# Attach inline policy
iam.put_role_policy(
RoleName=role_name,
PolicyName='AgentCoreRuntimePermissions',
PolicyDocument=json.dumps(permissions_policy)
)
print(f"✅ Created IAM role: {role_arn}")
return role_arn
except iam.exceptions.EntityAlreadyExistsException:
print(f"️ Role {role_name} already exists, deleting and recreating...")
# Delete inline policies
try:
policy_names = iam.list_role_policies(RoleName=role_name)['PolicyNames']
for policy_name in policy_names:
print(f" Deleting inline policy: {policy_name}")
iam.delete_role_policy(RoleName=role_name, PolicyName=policy_name)
except Exception as e:
print(f" ⚠️ Error deleting inline policies: {e}")
# Detach managed policies
try:
attached_policies = iam.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']
for policy in attached_policies:
print(f" Detaching managed policy: {policy['PolicyArn']}")
iam.detach_role_policy(RoleName=role_name, PolicyArn=policy['PolicyArn'])
except Exception as e:
print(f" ⚠️ Error detaching managed policies: {e}")
# Remove from instance profiles
try:
instance_profiles = iam.list_instance_profiles_for_role(RoleName=role_name)['InstanceProfiles']
for profile in instance_profiles:
print(f" Removing from instance profile: {profile['InstanceProfileName']}")
iam.remove_role_from_instance_profile(
InstanceProfileName=profile['InstanceProfileName'],
RoleName=role_name
)
except Exception as e:
print(f" ⚠️ Error removing from instance profiles: {e}")
# Delete the role
try:
iam.delete_role(RoleName=role_name)
print(f" ✅ Deleted existing role")
except Exception as e:
print(f" ❌ Error deleting role: {e}")
raise
# Wait a moment for IAM to propagate
import time
time.sleep(2)
# Recreate the role
print(f" Creating new role: {role_name}")
response = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(trust_policy),
Description='AgentCore Runtime execution role for lakehouse data MCP server'
)
role_arn = response['Role']['Arn']
# Attach inline policy
iam.put_role_policy(
RoleName=role_name,
PolicyName='AgentCoreRuntimePermissions',
PolicyDocument=json.dumps(permissions_policy)
)
print(f"✅ Recreated IAM role: {role_arn}")
return role_arn
def deploy_to_runtime(config: SSMConfig, role_arn: str):
"""Deploy MCP server to AgentCore Runtime using starter toolkit."""
runtime_name = 'lakehouse_mcp_server' # Must use underscores, not hyphens
try:
print(f"\n🚀 Deploying MCP server to AgentCore Runtime...")
print(f" Name: {runtime_name}")
print(f" Region: {config.region}")
print(f" This will build a Docker container and deploy it...")
# Build environment variables
env_vars = {
'AWS_REGION': config.region,
'S3_BUCKET_NAME': config.s3_bucket_name,
'ATHENA_DATABASE_NAME': config.database_name,
'RLS_ROLE_ARN': config.rls_role_arn,
'SECURITY_MODE': config.security_mode,
'LOG_LEVEL': config.log_level
}
print(f"\n📋 Environment variables:")
for key, value in env_vars.items():
print(f" {key}: {value}")
# Initialize Runtime from starter toolkit
agentcore_runtime = Runtime()
# Configure the runtime
print(f"\n🔧 Configuring AgentCore Runtime...")
# Extract role name from ARN (format: arn:aws:iam::account:role/RoleName)
role_name = role_arn.split('/')[-1]
# Extract user pool ID from ARN and build JWT configuration
user_pool_id = config.cognito_user_pool_arn.split('/')[-1]
issuer = f"https://cognito-idp.{config.region}.amazonaws.com/{user_pool_id}"
discovery_url = f"{issuer}/.well-known/openid-configuration"
# Get M2M client ID
response = config.ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-id')
cognito_m2m_client_id = response['Parameter']['Value']
allowed_clients = [cognito_m2m_client_id]
print(f"\n🔐 JWT Authentication Configuration:")
print(f" Discovery URL: {discovery_url}")
print(f" Allowed Clients:")
print(f" - {cognito_m2m_client_id} (M2M only)")
auth_config = {
"customJWTAuthorizer": {
"allowedClients": allowed_clients,
"discoveryUrl": discovery_url
}
}
# Note: Environment variables are read from SSM Parameter Store by the MCP server
# The starter toolkit will package the entire directory
agentcore_runtime.configure(
entrypoint="server.py",
execution_role=role_name, # Use role name, not ARN
auto_create_ecr=True,
requirements_file="requirements.txt",
region=config.region,
protocol="MCP",
agent_name=runtime_name,
authorizer_configuration=auth_config
)
print(f"✅ Configuration complete with JWT authentication")
# Launch the runtime (builds Docker image and deploys)
print(f"\n🚀 Launching to AgentCore Runtime...")
print(f" This may take several minutes...")
launch_result = agentcore_runtime.launch()
runtime_arn = launch_result.agent_arn
runtime_id = launch_result.agent_id
print(f"\n✅ MCP Server deployed successfully!")
print(f" Runtime ARN: {runtime_arn}")
print(f" Runtime ID: {runtime_id}")
# Note about JWT authentication
print(f"\n⚠️ Important: Configure JWT Authentication")
print(f" The runtime is deployed but needs JWT authentication configured.")
print(f" Run the configuration script:")
print(f" cd mcp-lakehouse-server")
print(f" python configure_runtime_auth.py")
return {
'runtime_arn': runtime_arn,
'runtime_id': runtime_id,
'role_arn': role_arn
}
except Exception as e:
print(f"\n❌ Error deploying runtime: {str(e)}")
import traceback
traceback.print_exc()
raise
def main():
"""Main deployment function."""
print("=" * 70)
print("MCP Athena Server Deployment to AgentCore Runtime")
print("=" * 70)
# Load configuration from SSM
print("\n🔍 Loading configuration from SSM Parameter Store...")
config = SSMConfig()
# Validate configuration
if not config.is_valid():
print("\n❌ Configuration is invalid!")
config.print_status()
print("\n📝 Please run the setup scripts first.")
sys.exit(1)
print("✅ Configuration validated")
# Print configuration summary
config.print_status()
try:
# Step 1: Create IAM role
print("\n" + "=" * 70)
print("Step 1: Creating IAM Role")
print("=" * 70)
role_arn = create_runtime_role(config)
# Step 2: Deploy to runtime
print("\n" + "=" * 70)
print("Step 2: Deploying to AgentCore Runtime")
print("=" * 70)
result = deploy_to_runtime(config, role_arn)
# Step 3: Store runtime parameters in SSM
print("\n" + "=" * 70)
print("Step 3: Storing Runtime Configuration")
print("=" * 70)
config.store_runtime_parameters(result['runtime_arn'], result['runtime_id'])
# Print summary
print("\n" + "=" * 70)
print("Deployment Complete!")
print("=" * 70)
print("\n✅ Runtime configuration stored in SSM Parameter Store:")
print(f" /app/lakehouse-agent/mcp-server-runtime-arn")
print(f" /app/lakehouse-agent/mcp-server-runtime-id")
print("\n📋 Next Steps:")
print(" 1. Deploy the Gateway and Interceptor (Step 7)")
print(" 2. Deploy the Lakehouse Agent (Step 8)")
print(" 3. Test the system end-to-end")
print("\n" + "=" * 70)
except Exception as e:
print(f"\n❌ Deployment failed: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
main()
@@ -0,0 +1,5 @@
bedrock-agentcore>=1.0.0
bedrock-agentcore-starter-toolkit
boto3>=1.34.0
mcp>=1.9.0
requests>=2.31.0
@@ -0,0 +1,361 @@
"""
MCP Server for Health Lakehouse Data - Production Security with Lake Formation
This MCP server provides tools for querying and managing health lakehouse data
with enterprise-grade row-level security enforced by AWS Lake Formation.
Security Architecture:
- OAuth authentication (Cognito JWT tokens)
- User identity extraction from Gateway interceptor
- Lake Formation session tag-based row-level security
- No SQL string interpolation (eliminates SQL injection risk)
IMPORTANT: This server ONLY supports Lake Formation security mode.
Application-level SQL filtering has been removed for security reasons.
Configuration:
- Reads from SSM Parameter Store
- Auto-detects region from boto3 session
- Requires SECURITY_MODE=lakeformation
- Optional RLS_ROLE_ARN to be set
"""
import sys
import os
from typing import Any, Dict, Optional
import boto3
from mcp.server.fastmcp import FastMCP
# Initialize MCP server
mcp = FastMCP(host="0.0.0.0", stateless_http=True)
# PRODUCTION ONLY: Use Lake Formation row-level security
from athena_tools_secure import SecureAthenaClaimsTools as AthenaTools
print("🔒 Using Lake Formation row-level security (production mode)")
# Global Athena tools instance
athena_tools = None
# Configuration cache
_config_cache = None
def get_config() -> Dict[str, Optional[str]]:
"""
Load configuration from environment variables and SSM Parameter Store.
"""
global _config_cache
if _config_cache is not None:
return _config_cache
config = {}
# Get region from boto3 session with proper fallback
try:
session = boto3.Session()
config['region'] = (
session.region_name or
os.environ.get('AWS_REGION') or
os.environ.get('AWS_DEFAULT_REGION') or
'us-east-1'
)
if not session.region_name:
print("⚠️ No region in AWS config, using fallback")
print(f"✅ Region: {config['region']}")
except Exception as e:
print(f"⚠️ Could not detect region: {e}")
config['region'] = 'us-east-1'
# Get account ID
try:
sts = boto3.client('sts', region_name=config['region'])
config['account_id'] = sts.get_caller_identity()['Account']
except Exception as e:
print(f"⚠️ Could not get account ID: {e}")
config['account_id'] = None
ssm = boto3.client('ssm', region_name=config['region'])
def get_param(name: str, env_var: str = None, default: str = None) -> Optional[str]:
if env_var and env_var in os.environ:
value = os.environ[env_var]
print(f"{name} from environment: {value}")
return value
try:
response = ssm.get_parameter(Name=f'/app/lakehouse-agent/{name}')
value = response['Parameter']['Value']
print(f"{name} from SSM: {value}")
return value
except ssm.exceptions.ParameterNotFound:
if default:
print(f"{name} using default: {default}")
return default
print(f"⚠️ {name} not found")
return None
except Exception as e:
print(f"❌ Error getting {name}: {e}")
return default
config['s3_bucket_name'] = get_param('s3-bucket-name', 'S3_BUCKET_NAME')
config['database_name'] = get_param('database-name', 'ATHENA_DATABASE_NAME')
config['rls_role_arn'] = get_param('rls-role-arn', None)
config['security_mode'] = get_param('security-mode', 'SECURITY_MODE', 'lakeformation')
config['log_level'] = os.environ.get('LOG_LEVEL', 'INFO')
if config['s3_bucket_name']:
config['s3_output_location'] = f"s3://{config['s3_bucket_name']}/athena-results/"
else:
config['s3_output_location'] = None
config['test_user'] = os.environ.get('TEST_USER_1', 'user001@example.com')
config['local_development'] = os.environ.get('LOCAL_DEVELOPMENT', 'false').lower() == 'true'
_config_cache = config
return config
def validate_config(config: Dict[str, Optional[str]]) -> bool:
required_params = [
('region', 'AWS Region'),
('s3_bucket_name', 'S3 Bucket Name'),
('database_name', 'Athena Database Name'),
('security_mode', 'Security Mode')
]
missing = []
for param, display_name in required_params:
if not config.get(param):
missing.append(display_name)
if missing:
print(f"❌ Missing required configuration: {', '.join(missing)}")
return False
if config['security_mode'] != 'lakeformation':
print(f"❌ Invalid security mode: {config['security_mode']}")
print(" Only 'lakeformation' is supported")
return False
return True
def get_athena_tools():
global athena_tools
if athena_tools is None:
config = get_config()
print("Initializing Athena tools with Lake Formation RLS...")
print(f" Region: {config['region']}")
print(f" Database: {config['database_name']}")
print(f" S3 Output: {config['s3_output_location']}")
print(f" RLS Role: {config['rls_role_arn']}")
athena_tools = AthenaTools(
region=config['region'],
database_name=config['database_name'],
s3_output_location=config['s3_output_location'],
rls_role_arn=config['rls_role_arn']
)
print("✅ Athena tools initialized with Lake Formation RLS")
return athena_tools
def get_user_id_with_fallback(context_arg: Dict[str, Any] = None) -> str:
"""Get user ID from context argument or fallback to test user."""
config = get_config()
user_id = None
if context_arg:
print(f"📋 Context argument received: {context_arg}")
user_id = context_arg.get('user_id')
if user_id:
print(f" Got user_id from context argument: {user_id}")
return user_id
if config['local_development']:
user_id = config['test_user']
print(f"⚠️ Using test user for local development: {user_id}")
return user_id
print("❌ User identity not found in request")
return None
@mcp.tool(
name="query_claims",
description="Query health lakehouse data for the authenticated user with optional filters"
)
def query_claims(
claim_status: str = None,
claim_type: str = None,
start_date: str = None,
end_date: str = None,
context: Dict[str, Any] = None
) -> Dict[str, Any]:
"""Query lakehouse data for the authenticated user."""
print("=" * 60)
print("🔧 TOOL INVOKED: query_claims")
print("=" * 60)
print("📥 INPUT PARAMETERS:")
print(f" claim_status: {claim_status}")
print(f" claim_type: {claim_type}")
print(f" start_date: {start_date}")
print(f" end_date: {end_date}")
print(f" context: {context}")
try:
user_id = get_user_id_with_fallback(context)
print(f"👤 USER ID: {user_id}")
if not user_id:
return {"success": False, "error": "User identity not found in request"}
filters = {k: v for k, v in {
'claim_status': claim_status,
'claim_type': claim_type,
'start_date': start_date,
'end_date': end_date
}.items() if v is not None}
print(f"🔍 FILTERS: {filters}")
tools = get_athena_tools()
result = tools.query_claims(user_id, filters if filters else None)
print("📤 OUTPUT:")
print(f" success: {result.get('success', 'N/A')}")
if result.get('success'):
claims_count = len(result.get('claims', []))
print(f" claims_count: {claims_count}")
else:
print(f" error: {result.get('error', 'N/A')}")
print("=" * 60)
return result
except Exception as e:
print(f"❌ ERROR in query_claims: {str(e)}")
import traceback
print(f" Stack trace: {traceback.format_exc()}")
print("=" * 60)
return {"success": False, "error": str(e)}
@mcp.tool(
name="get_claim_details",
description="Get detailed information about a specific claim by ID"
)
def get_claim_details(claim_id: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
"""Get details of a specific claim."""
print("=" * 60)
print("🔧 TOOL INVOKED: get_claim_details")
print("=" * 60)
print("📥 INPUT PARAMETERS:")
print(f" claim_id: {claim_id}")
print(f" context: {context}")
try:
user_id = get_user_id_with_fallback(context)
print(f"👤 USER ID: {user_id}")
if not user_id:
return {"success": False, "error": "User identity not found in request"}
tools = get_athena_tools()
result = tools.get_claim_details(user_id, claim_id)
print("📤 OUTPUT:")
print(f" success: {result.get('success', 'N/A')}")
if result.get('success'):
claim_data = result.get('claim', {})
print(f" claim_id: {claim_data.get('claim_id', 'N/A')}")
print(f" claim_status: {claim_data.get('claim_status', 'N/A')}")
else:
print(f" error: {result.get('error', 'N/A')}")
print("=" * 60)
return result
except Exception as e:
print(f"❌ ERROR in get_claim_details: {str(e)}")
import traceback
print(f" Stack trace: {traceback.format_exc()}")
print("=" * 60)
return {"success": False, "error": str(e)}
@mcp.tool(
name="get_claims_summary",
description="Get summary statistics of all claims for the authenticated user"
)
def get_claims_summary(context: Dict[str, Any] = None) -> Dict[str, Any]:
"""Get claims summary for the user."""
print("=" * 60)
print("🔧 TOOL INVOKED: get_claims_summary")
print("=" * 60)
print("📥 INPUT PARAMETERS:")
print(f" context: {context}")
try:
user_id = get_user_id_with_fallback(context)
print(f"👤 USER ID: {user_id}")
if not user_id:
return {"success": False, "error": "User identity not found in request"}
tools = get_athena_tools()
result = tools.get_claims_summary(user_id)
print("📤 OUTPUT:")
print(f" success: {result.get('success', 'N/A')}")
if result.get('success'):
summary = result.get('summary', {})
print(f" total_claims: {summary.get('total_claims', 'N/A')}")
print(f" total_amount: {summary.get('total_amount', 'N/A')}")
print(f" by_status: {summary.get('by_status', 'N/A')}")
else:
print(f" error: {result.get('error', 'N/A')}")
print("=" * 60)
return result
except Exception as e:
print(f"❌ ERROR in get_claims_summary: {str(e)}")
import traceback
print(f" Stack trace: {traceback.format_exc()}")
print("=" * 60)
return {"success": False, "error": str(e)}
if __name__ == "__main__":
print("\n🔍 Validating configuration...")
config = get_config()
if config['security_mode'] != 'lakeformation':
print("\n❌ Error: Only Lake Formation security mode is supported!")
print(f" Current SECURITY_MODE: {config['security_mode']}")
sys.exit(1)
if not validate_config(config):
print("\n❌ Configuration is invalid!")
sys.exit(1)
print("✅ Configuration validated")
print("🔒 Lake Formation row-level security enabled")
print(f"Starting MCP Server with Lake Formation RLS:")
print(f" Region: {config['region']}")
print(f" Database: {config['database_name']}")
print(f" S3 Output: {config['s3_output_location']}")
print(f" RLS Role: {config['rls_role_arn']}")
mcp.run(transport="streamable-http")
@@ -0,0 +1,307 @@
#!/usr/bin/env python3
"""
Simple MCP Server Test
This script:
1. Gets M2M token from Cognito
2. Prints token for validation
3. Uses token to invoke MCP server and get tool list
"""
import boto3
import requests
import json
import base64
def get_m2m_token():
"""Get M2M access token from Cognito."""
print("=" * 70)
print("Step 1: Get M2M Token from Cognito")
print("=" * 70)
try:
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
cognito = boto3.client('cognito-idp', region_name=region)
# Get M2M client credentials from SSM
print("\n📋 Loading M2M client credentials from SSM...")
client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-id')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-m2m-client-secret', WithDecryption=True)['Parameter']['Value']
domain = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-domain')['Parameter']['Value']
user_pool_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-user-pool-id')['Parameter']['Value']
except Exception as e:
print(f"❌ Error loading configuration: {e}")
print(" Please check your AWS credentials and SSM parameters")
return None, None
print(f" Client ID: {client_id}")
print(f" Domain: {domain}")
print(f" User Pool: {user_pool_id}")
# Get configured OAuth scopes
print("\n🔍 Getting OAuth scopes from client configuration...")
client_details = cognito.describe_user_pool_client(
UserPoolId=user_pool_id,
ClientId=client_id
)
allowed_scopes = client_details['UserPoolClient'].get('AllowedOAuthScopes', [])
scope_string = ' '.join(allowed_scopes)
print(f" Scopes: {scope_string}")
# Request token
print("\n🔐 Requesting access token...")
token_endpoint = f"{domain}/oauth2/token"
credentials = f"{client_id}:{client_secret}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()
response = requests.post(
token_endpoint,
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Authorization': f'Basic {encoded_credentials}'
},
data={
'grant_type': 'client_credentials',
'scope': scope_string
}
)
if response.status_code != 200:
print(f"❌ Token request failed: {response.status_code}")
print(f" Response: {response.text}")
return None, region
token_data = response.json()
access_token = token_data['access_token']
print(f"✅ Token obtained successfully!")
print(f" Token type: {token_data.get('token_type', 'N/A')}")
print(f" Expires in: {token_data.get('expires_in', 'N/A')} seconds")
# Decode and print token claims
print("\n📄 Token Claims:")
parts = access_token.split('.')
if len(parts) == 3:
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=='))
print(f" Issuer: {payload.get('iss', 'N/A')}")
print(f" Client ID: {payload.get('client_id', 'N/A')}")
print(f" Scope: {payload.get('scope', 'N/A')}")
print(f" Token Use: {payload.get('token_use', 'N/A')}")
print(f"\n🔑 Access Token (first 100 chars):")
print(f" {access_token[:100]}...")
return access_token, region
def test_mcp_server(access_token, region):
"""Test MCP server by getting tool list."""
print("\n" + "=" * 70)
print("Step 2: Invoke MCP Server to Get Tool List")
print("=" * 70)
ssm = boto3.client('ssm', region_name=region)
# Get runtime ARN
print("\n📦 Loading runtime configuration...")
try:
runtime_arn = ssm.get_parameter(Name='/app/lakehouse-agent/mcp-server-runtime-arn')['Parameter']['Value']
print(f" Runtime ARN: {runtime_arn}")
except ssm.exceptions.ParameterNotFound:
print(f" ❌ Runtime ARN not found in region {region}")
print(f" Please deploy the MCP server first or check your region")
return
except Exception as e:
print(f" ❌ Error loading runtime ARN: {e}")
return
# Build MCP endpoint URL
encoded_arn = runtime_arn.replace(':', '%3A').replace('/', '%2F')
url = f"https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{encoded_arn}/invocations?qualifier=DEFAULT"
print(f" Endpoint: {url}")
# Prepare headers
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream"
}
# Test 1: Initialize MCP session
print("\n📤 Test 1: Initialize MCP session")
init_request = {
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {
"name": "simple-test-client",
"version": "1.0.0"
}
}
}
try:
response = requests.post(url, headers=headers, json=init_request, timeout=10)
print(f" Status: {response.status_code}")
print(f" Response length: {len(response.text)} bytes")
print(f" Content-Type: {response.headers.get('Content-Type', 'N/A')}")
if response.status_code == 200:
if response.text:
# Parse SSE format
if response.headers.get('Content-Type') == 'text/event-stream':
print(f" ✅ Received SSE response")
# Extract JSON from SSE format
lines = response.text.split('\n')
for line in lines:
if line.startswith('data: '):
json_str = line[6:] # Remove 'data: ' prefix
try:
data = json.loads(json_str)
print(f" ✅ Initialize successful!")
if 'result' in data:
server_info = data['result'].get('serverInfo', {})
print(f" Server Name: {server_info.get('name', 'N/A')}")
print(f" Server Version: {server_info.get('version', 'N/A')}")
print(f" Protocol Version: {data['result'].get('protocolVersion', 'N/A')}")
break
except json.JSONDecodeError:
continue
else:
try:
data = response.json()
print(f" ✅ Initialize successful!")
if 'result' in data:
server_info = data['result'].get('serverInfo', {})
print(f" Server Name: {server_info.get('name', 'N/A')}")
print(f" Server Version: {server_info.get('version', 'N/A')}")
print(f" Protocol Version: {data['result'].get('protocolVersion', 'N/A')}")
except json.JSONDecodeError as je:
print(f" ⚠️ Response is not valid JSON: {je}")
print(f" Raw response (first 500 chars): {response.text[:500]}")
else:
print(f" ⚠️ Response body is empty")
return
else:
print(f" ❌ Initialize failed")
if response.text:
print(f" Response: {response.text[:500]}")
else:
print(f" Response body is empty")
return
except requests.exceptions.Timeout:
print(f" ❌ Request timed out")
return
except Exception as e:
print(f" ❌ Error: {e}")
return
# Test 2: Get tool list
print("\n📤 Test 2: Get tool list")
tools_request = {
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}
try:
response = requests.post(url, headers=headers, json=tools_request, timeout=10)
print(f" Status: {response.status_code}")
print(f" Response length: {len(response.text)} bytes")
if response.status_code == 200:
if response.text:
# Parse SSE format
if response.headers.get('Content-Type') == 'text/event-stream':
print(f" ✅ Received SSE response")
# Extract JSON from SSE format
lines = response.text.split('\n')
for line in lines:
if line.startswith('data: '):
json_str = line[6:] # Remove 'data: ' prefix
try:
data = json.loads(json_str)
print(f" ✅ Tool list retrieved!")
if 'result' in data and 'tools' in data['result']:
tools = data['result']['tools']
print(f"\n 📋 Available Tools ({len(tools)}):")
print(" " + "=" * 66)
for i, tool in enumerate(tools, 1):
print(f"\n {i}. {tool.get('name', 'N/A')}")
print(f" Description: {tool.get('description', 'N/A')}")
if 'inputSchema' in tool and 'properties' in tool['inputSchema']:
props = tool['inputSchema']['properties']
if props:
print(f" Parameters: {', '.join(props.keys())}")
else:
print(f" Response: {json.dumps(data, indent=2)}")
break
except json.JSONDecodeError:
continue
else:
try:
data = response.json()
print(f" ✅ Tool list retrieved!")
if 'result' in data and 'tools' in data['result']:
tools = data['result']['tools']
print(f"\n 📋 Available Tools ({len(tools)}):")
print(" " + "=" * 66)
for i, tool in enumerate(tools, 1):
print(f"\n {i}. {tool.get('name', 'N/A')}")
print(f" Description: {tool.get('description', 'N/A')}")
if 'inputSchema' in tool and 'properties' in tool['inputSchema']:
props = tool['inputSchema']['properties']
if props:
print(f" Parameters: {', '.join(props.keys())}")
else:
print(f" Response: {json.dumps(data, indent=2)}")
except json.JSONDecodeError:
print(f" ⚠️ Response is not valid JSON")
print(f" Raw response: {response.text[:200]}")
else:
print(f" ⚠️ Response body is empty")
else:
print(f" ❌ Tool list failed")
if response.text:
print(f" Response: {response.text[:500]}")
else:
print(f" Response body is empty")
except requests.exceptions.Timeout:
print(f" ❌ Request timed out")
except Exception as e:
print(f" ❌ Error: {e}")
def main():
print("\n" + "=" * 70)
print("Simple MCP Server Test with M2M Authentication")
print("=" * 70 + "\n")
# Step 1: Get token
access_token, region = get_m2m_token()
if not access_token:
print("\n❌ Failed to get access token. Exiting.")
return
# Step 2: Test MCP server
test_mcp_server(access_token, region)
print("\n" + "=" * 70)
print("Test Complete")
print("=" * 70 + "\n")
if __name__ == '__main__':
main()
@@ -0,0 +1,26 @@
# Health Lakehouse Agent - Main Requirements
# Core dependencies
boto3>=1.34.0
bedrock-agentcore>=1.0.0
bedrock-agentcore-starter-toolkit>=0.2.6
strands-agents>=1.0.0
# Web UI
streamlit>=1.30.0
requests>=2.31.0
# MCP
mcp>=1.9.0
# JWT handling
python-jose[cryptography]>=3.4.0
cryptography>=41.0.0
# Development
jupyter>=1.0.0
ipykernel>=6.29.0
# Testing
pytest>=7.4.0
hypothesis>=6.92.0
Binary file not shown.

After

Width:  |  Height:  |  Size: 351 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 348 KiB

@@ -0,0 +1,3 @@
streamlit>=1.30.0
requests>=2.31.0
boto3>=1.28.0
@@ -0,0 +1,444 @@
"""
Streamlit UI for Health Lakehouse Data Agent with Cognito User Authentication
"""
import streamlit as st
import requests
import json
import uuid
import boto3
import os
from typing import Optional
st.set_page_config(page_title="Lakehouse Data Assistant", page_icon="🏥", layout="wide")
# Initialize session state
if "messages" not in st.session_state:
st.session_state.messages = []
if "session_id" not in st.session_state:
st.session_state.session_id = str(uuid.uuid4())
if "access_token" not in st.session_state:
st.session_state.access_token = None
if "id_token" not in st.session_state:
st.session_state.id_token = None
if "user_email" not in st.session_state:
st.session_state.user_email = None
if "runtime_arn" not in st.session_state:
st.session_state.runtime_arn = ""
if "cognito_config" not in st.session_state:
st.session_state.cognito_config = {}
def load_config_from_ssm():
"""Load configuration from SSM Parameter Store"""
try:
session = boto3.Session()
# Get region with proper fallback
region = (
session.region_name or
os.environ.get('AWS_REGION') or
os.environ.get('AWS_DEFAULT_REGION') or
'us-east-1'
)
ssm = boto3.client('ssm', region_name=region)
config = {}
params = {
'runtime_arn': '/app/lakehouse-agent/agent-runtime-arn',
'cognito_user_pool_id': '/app/lakehouse-agent/cognito-user-pool-id',
'cognito_app_client_id': '/app/lakehouse-agent/cognito-app-client-id',
'cognito_domain': '/app/lakehouse-agent/cognito-domain',
'cognito_region': '/app/lakehouse-agent/cognito-region'
}
for key, param_name in params.items():
try:
response = ssm.get_parameter(Name=param_name)
config[key] = response['Parameter']['Value']
except:
config[key] = None
config['region'] = region
return config
except Exception as e:
st.error(f"Failed to load config from SSM: {e}")
# Return config with at least region set
session = boto3.Session()
return {
'region': (
session.region_name or
os.environ.get('AWS_REGION') or
os.environ.get('AWS_DEFAULT_REGION') or
'us-east-1'
)
}
def authenticate_user(username: str, password: str, user_pool_id: str, client_id: str, region: str) -> Optional[dict]:
"""Authenticate user with Cognito using USER_PASSWORD_AUTH flow"""
try:
client = boto3.client('cognito-idp', region_name=region)
# Get client secret from SSM
ssm = boto3.client('ssm', region_name=region)
try:
client_secret = ssm.get_parameter(
Name='/app/lakehouse-agent/cognito-app-client-secret',
WithDecryption=True
)['Parameter']['Value']
except:
st.error("❌ Could not retrieve client secret from SSM")
return None
# Calculate SECRET_HASH
import hmac
import hashlib
import base64
message = bytes(username + client_id, 'utf-8')
secret = bytes(client_secret, 'utf-8')
secret_hash = base64.b64encode(hmac.new(secret, message, digestmod=hashlib.sha256).digest()).decode()
response = client.admin_initiate_auth(
UserPoolId=user_pool_id,
ClientId=client_id,
AuthFlow='ADMIN_NO_SRP_AUTH',
AuthParameters={
'USERNAME': username,
'PASSWORD': password,
'SECRET_HASH': secret_hash
}
)
if 'ChallengeName' in response:
if response['ChallengeName'] == 'NEW_PASSWORD_REQUIRED':
return {
'challenge': 'NEW_PASSWORD_REQUIRED',
'session': response['Session']
}
if 'AuthenticationResult' in response:
return {
'access_token': response['AuthenticationResult']['AccessToken'],
'id_token': response['AuthenticationResult']['IdToken'],
'refresh_token': response['AuthenticationResult'].get('RefreshToken')
}
return None
except client.exceptions.NotAuthorizedException:
st.error("❌ Invalid username or password")
return None
except Exception as e:
st.error(f"❌ Authentication failed: {e}")
return None
def set_new_password(username: str, new_password: str, session: str, user_pool_id: str, client_id: str, region: str) -> Optional[dict]:
"""Set new password for user with NEW_PASSWORD_REQUIRED challenge"""
try:
client = boto3.client('cognito-idp', region_name=region)
# Get client secret from SSM
ssm = boto3.client('ssm', region_name=region)
try:
client_secret = ssm.get_parameter(
Name='/app/lakehouse-agent/cognito-app-client-secret',
WithDecryption=True
)['Parameter']['Value']
except:
st.error("❌ Could not retrieve client secret from SSM")
return None
# Calculate SECRET_HASH
import hmac
import hashlib
import base64
message = bytes(username + client_id, 'utf-8')
secret = bytes(client_secret, 'utf-8')
secret_hash = base64.b64encode(hmac.new(secret, message, digestmod=hashlib.sha256).digest()).decode()
response = client.admin_respond_to_auth_challenge(
UserPoolId=user_pool_id,
ClientId=client_id,
ChallengeName='NEW_PASSWORD_REQUIRED',
ChallengeResponses={
'USERNAME': username,
'NEW_PASSWORD': new_password,
'SECRET_HASH': secret_hash
},
Session=session
)
if 'AuthenticationResult' in response:
return {
'access_token': response['AuthenticationResult']['AccessToken'],
'id_token': response['AuthenticationResult']['IdToken'],
'refresh_token': response['AuthenticationResult'].get('RefreshToken')
}
return None
except Exception as e:
st.error(f"❌ Failed to set new password: {e}")
return None
def invoke_agent(runtime_arn: str, prompt: str, access_token: str, id_token: str, region: str) -> str:
"""Invoke AgentCore Runtime with OAuth bearer token via HTTPS"""
try:
import requests
import urllib.parse
# URL encode the agent ARN
escaped_agent_arn = urllib.parse.quote(runtime_arn, safe='')
# Construct the AWS API endpoint URL
url = f"https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{escaped_agent_arn}/invocations?qualifier=DEFAULT"
# Set up headers with bearer token
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
"X-Amzn-Bedrock-AgentCore-Runtime-Session-Id": st.session_state.session_id
}
# Prepare payload
payload = {"prompt": prompt, "bearer_token": access_token, "id_token": id_token}
st.info(f"🔗 Invoking AgentCore Runtime with OAuth")
# Make HTTPS request
response = requests.post(
url,
headers=headers,
json=payload,
timeout=60
)
# Check for errors
if response.status_code != 200:
error_msg = f"HTTP {response.status_code}"
try:
error_detail = response.json()
error_msg += f": {error_detail}"
except:
error_msg += f": {response.text}"
return f"❌ Error: {error_msg}"
# Handle streaming response (text/event-stream)
content_type = response.headers.get('Content-Type', '')
if 'text/event-stream' in content_type:
# Parse SSE (Server-Sent Events) format
content = []
for line in response.text.split('\n'):
if line.startswith('data: '):
data_str = line[6:].strip()
if data_str:
try:
data = json.loads(data_str)
# Extract content from various possible formats
if isinstance(data, dict):
if 'content' in data:
content.append(str(data['content']))
elif 'response' in data:
content.append(str(data['response']))
elif 'result' in data:
content.append(str(data['result']))
else:
content.append(str(data))
else:
content.append(str(data))
except json.JSONDecodeError:
content.append(data_str)
return '\n'.join(content) if content else "⚠️ No response received"
else:
# Handle JSON response
try:
result = response.json()
if isinstance(result, dict):
if 'content' in result:
return result['content']
elif 'response' in result:
return result['response']
elif 'result' in result:
return result['result']
return str(result)
except json.JSONDecodeError:
return response.text
except requests.exceptions.RequestException as e:
return f"❌ Request error: {str(e)}"
except Exception as e:
return f"❌ Error: {str(e)}"
# Load configuration from SSM on first run
if not st.session_state.cognito_config:
with st.spinner("Loading configuration from SSM..."):
st.session_state.cognito_config = load_config_from_ssm()
if st.session_state.cognito_config.get('runtime_arn'):
st.session_state.runtime_arn = st.session_state.cognito_config['runtime_arn']
# Sidebar configuration
with st.sidebar:
st.title("🏥 Claims Assistant")
st.markdown("---")
# Login section
if not st.session_state.access_token:
with st.expander("🔐 User Login", expanded=True):
st.markdown("*Default password: TempPass123!*")
st.markdown("---")
# Test users dropdown
test_users = [
"user001@example.com",
"user002@example.com",
"adjuster001@example.com"
]
username = st.selectbox("Email", options=test_users, index=0)
password = st.text_input("Password", type="password", placeholder="TempPass123!")
config = st.session_state.cognito_config
if st.button("🔑 Login", use_container_width=True):
if username and password:
if not config.get('cognito_user_pool_id') or not config.get('cognito_app_client_id'):
st.error("❌ Cognito not configured. Please run setup_cognito.py first.")
else:
with st.spinner("Authenticating..."):
result = authenticate_user(
username,
password,
config['cognito_user_pool_id'],
config['cognito_app_client_id'],
config.get('cognito_region') or config.get('region')
)
if result:
if result.get('challenge') == 'NEW_PASSWORD_REQUIRED':
st.session_state.password_challenge = {
'username': username,
'session': result['session']
}
st.warning("⚠️ You must set a new password")
st.rerun()
else:
st.session_state.access_token = result['access_token']
st.session_state.id_token = result['id_token']
st.session_state.user_email = username
st.success(f"✅ Logged in as {username}")
st.rerun()
else:
st.warning("⚠️ Please enter username and password")
# Handle password change challenge
if 'password_challenge' in st.session_state:
with st.expander("🔒 Set New Password", expanded=True):
st.info("First time login - please set a new password")
new_password = st.text_input("New Password", type="password", key="new_pwd")
confirm_password = st.text_input("Confirm Password", type="password", key="confirm_pwd")
if st.button("Set Password", use_container_width=True):
if new_password and new_password == confirm_password:
config = st.session_state.cognito_config
challenge = st.session_state.password_challenge
with st.spinner("Setting new password..."):
result = set_new_password(
challenge['username'],
new_password,
challenge['session'],
config['cognito_user_pool_id'],
config['cognito_app_client_id'],
config.get('cognito_region') or config.get('region')
)
if result:
st.session_state.access_token = result['access_token']
st.session_state.id_token = result['id_token']
st.session_state.user_email = challenge['username']
del st.session_state.password_challenge
st.success(f"✅ Password set! Logged in as {challenge['username']}")
st.rerun()
else:
st.error("❌ Passwords don't match or are empty")
else:
st.success(f"🔓 Logged in as: {st.session_state.user_email}")
if st.button("🚪 Logout", use_container_width=True):
st.session_state.access_token = None
st.session_state.id_token = None
st.session_state.user_email = None
st.session_state.messages = []
st.session_state.session_id = str(uuid.uuid4())
st.rerun()
st.markdown("---")
with st.expander("⚙️ Runtime Configuration", expanded=False):
runtime_arn = st.text_input("Runtime ARN", value=st.session_state.runtime_arn)
st.session_state.runtime_arn = runtime_arn
config = st.session_state.cognito_config
region = st.text_input("AWS Region", value=config.get('region', 'us-east-1'))
if st.button("🔄 Reload from SSM", use_container_width=True):
st.session_state.cognito_config = load_config_from_ssm()
if st.session_state.cognito_config.get('runtime_arn'):
st.session_state.runtime_arn = st.session_state.cognito_config['runtime_arn']
st.success("✅ Configuration reloaded")
st.rerun()
st.markdown("---")
st.markdown("### 💡 Example Queries")
examples = [
"Show me all my claims",
"What's the status of CLM-2024-001?",
"Get my claims summary",
"Show pending claims"
]
for ex in examples:
if st.button(ex, key=f"ex_{ex[:15]}", use_container_width=True):
st.session_state.example_prompt = ex
# Main interface
st.title("🏥 Health Lakehouse Data Assistant")
st.markdown(f"Ask me about your lakehouse data! *Logged in as: {st.session_state.user_email or 'Not logged in'}*")
if not st.session_state.access_token:
st.warning("⚠️ Please login in the sidebar first!")
st.stop()
if not st.session_state.runtime_arn:
st.warning("⚠️ Runtime ARN not configured. Please check SSM Parameter Store or enter manually in the sidebar.")
st.stop()
# Display chat history
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])
# Handle input
prompt = st.session_state.pop("example_prompt", None) or st.chat_input("Ask about your claims...")
if prompt:
with st.chat_message("user"):
st.markdown(prompt)
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("assistant"):
config = st.session_state.cognito_config
# Config should always have region from load_config_from_ssm
response = invoke_agent(
st.session_state.runtime_arn,
prompt,
st.session_state.access_token,
st.session_state.id_token,
config.get('region') # Should always be set from load_config_from_ssm
)
try:
data = json.loads(response)
response = data.get("content", response)
except:
pass
st.markdown(response)
st.session_state.messages.append({"role": "assistant", "content": response})
@@ -0,0 +1,243 @@
#!/usr/bin/env python3
"""
Check Agent Runtime Status and CloudWatch Logs
This script checks the status of the lakehouse agent runtime and helps you
find its CloudWatch logs.
"""
import boto3
import sys
from datetime import datetime, timedelta
def main():
print("=" * 80)
print("Agent Runtime Status and Logs Checker")
print("=" * 80)
session = boto3.Session()
region = session.region_name
print(f"\n📍 Region: {region}")
# Check if agent runtime ARN exists in SSM
print("\n🔍 Checking SSM Parameter Store...")
ssm = boto3.client('ssm', region_name=region)
try:
runtime_arn = ssm.get_parameter(Name='/app/lakehouse-agent/agent-runtime-arn')['Parameter']['Value']
print(f" ✅ Agent Runtime ARN: {runtime_arn}")
except ssm.exceptions.ParameterNotFound:
print(f" ❌ Agent runtime ARN not found in SSM")
print(f"\n💡 Solution:")
print(f" The agent hasn't been deployed yet.")
print(f" Run: python lakehouse-agent/deploy_lakehouse_agent.py")
return
# Get agent runtime details
print(f"\n🔍 Checking Agent Runtime Status...")
try:
client = boto3.client('bedrock-agentcore-control', region_name=region)
response = client.get_agent_runtime(agentRuntimeArn=runtime_arn)
runtime = response['agentRuntime']
status = runtime.get('status', 'UNKNOWN')
name = runtime.get('agentRuntimeName', 'unknown')
created = runtime.get('createdAt', 'unknown')
updated = runtime.get('updatedAt', 'unknown')
print(f" Name: {name}")
print(f" Status: {status}")
print(f" Created: {created}")
print(f" Updated: {updated}")
if status != 'ACTIVE':
print(f"\n ⚠️ Agent is not ACTIVE!")
print(f" Current status: {status}")
if status == 'CREATING':
print(f" ️ Agent is still being created. Wait a few minutes.")
elif status == 'FAILED':
print(f" ❌ Agent creation failed. Check CloudWatch logs for errors.")
elif status == 'UPDATING':
print(f" ️ Agent is being updated. Wait a few minutes.")
else:
print(f" ✅ Agent is ACTIVE and ready to receive requests")
# Check authorizer configuration
if 'authorizerConfiguration' in runtime:
auth_config = runtime['authorizerConfiguration']
if 'customJWTAuthorizer' in auth_config:
jwt_config = auth_config['customJWTAuthorizer']
print(f"\n 🔐 JWT Authentication:")
print(f" Discovery URL: {jwt_config.get('discoveryUrl')}")
print(f" Allowed Clients: {jwt_config.get('allowedClients')}")
else:
print(f"\n 🔐 Authentication: IAM SigV4")
else:
print(f"\n 🔐 Authentication: IAM SigV4 (default)")
except Exception as e:
print(f" ❌ Error getting agent runtime: {e}")
return
# Find CloudWatch log groups
print(f"\n🔍 Searching for CloudWatch Log Groups...")
logs = boto3.client('logs', region_name=region)
# Extract runtime ID from ARN
# ARN format: arn:aws:bedrock-agentcore:region:account:runtime/runtime-id
runtime_id = runtime_arn.split('/')[-1]
# Common log group patterns for AgentCore Runtime
patterns = [
f"/aws/bedrock-agentcore/runtime/{runtime_id}",
f"/aws/bedrock-agentcore/runtime/{name}",
f"/aws/bedrock-agentcore/{runtime_id}",
f"/aws/agentcore/runtime/{runtime_id}",
f"/aws/agentcore/{name}",
f"/aws/bedrock/agentcore/{runtime_id}",
]
found_log_groups = []
# Search for log groups
try:
# Get all log groups with bedrock-agentcore prefix
paginator = logs.get_paginator('describe_log_groups')
for page in paginator.paginate(logGroupNamePrefix='/aws/bedrock-agentcore'):
for log_group in page.get('logGroups', []):
log_group_name = log_group['logGroupName']
found_log_groups.append({
'name': log_group_name,
'created': log_group.get('creationTime'),
'size': log_group.get('storedBytes', 0)
})
# Also try /aws/agentcore prefix
for page in paginator.paginate(logGroupNamePrefix='/aws/agentcore'):
for log_group in page.get('logGroups', []):
log_group_name = log_group['logGroupName']
if log_group_name not in [lg['name'] for lg in found_log_groups]:
found_log_groups.append({
'name': log_group_name,
'created': log_group.get('creationTime'),
'size': log_group.get('storedBytes', 0)
})
except Exception as e:
print(f" ⚠️ Error searching log groups: {e}")
if found_log_groups:
print(f" ✅ Found {len(found_log_groups)} AgentCore log group(s):")
for lg in found_log_groups:
created_date = datetime.fromtimestamp(lg['created'] / 1000).strftime('%Y-%m-%d %H:%M:%S')
size_mb = lg['size'] / (1024 * 1024)
print(f"\n 📁 {lg['name']}")
print(f" Created: {created_date}")
print(f" Size: {size_mb:.2f} MB")
# Check for recent log streams
try:
streams_response = logs.describe_log_streams(
logGroupName=lg['name'],
orderBy='LastEventTime',
descending=True,
limit=5
)
streams = streams_response.get('logStreams', [])
if streams:
print(f" Recent log streams:")
for stream in streams[:3]:
stream_name = stream['logStreamName']
last_event = stream.get('lastEventTimestamp')
if last_event:
last_event_date = datetime.fromtimestamp(last_event / 1000).strftime('%Y-%m-%d %H:%M:%S')
print(f" - {stream_name} (last: {last_event_date})")
else:
print(f" - {stream_name} (no events)")
else:
print(f" ⚠️ No log streams found (agent hasn't been invoked yet)")
except Exception as e:
print(f" ⚠️ Error checking log streams: {e}")
else:
print(f" ⚠️ No AgentCore log groups found")
print(f"\n This could mean:")
print(f" 1. The agent hasn't been invoked yet (logs created on first invocation)")
print(f" 2. CloudWatch logging isn't enabled")
print(f" 3. The log group uses a different naming pattern")
# Provide instructions for viewing logs
print(f"\n📋 How to View Logs:")
print(f"\n Option 1: AWS Console")
print(f" 1. Go to CloudWatch Console")
print(f" 2. Click 'Log groups' in the left sidebar")
print(f" 3. Search for: /aws/bedrock-agentcore")
print(f" 4. Look for log groups containing: {runtime_id}")
print(f"\n Option 2: AWS CLI")
if found_log_groups:
log_group_name = found_log_groups[0]['name']
print(f" # List log streams")
print(f" aws logs describe-log-streams \\")
print(f" --log-group-name '{log_group_name}' \\")
print(f" --order-by LastEventTime \\")
print(f" --descending \\")
print(f" --max-items 10")
print(f"\n # Tail logs (last 10 minutes)")
print(f" aws logs tail '{log_group_name}' --follow --since 10m")
else:
print(f" # Search for log groups")
print(f" aws logs describe-log-groups \\")
print(f" --log-group-name-prefix '/aws/bedrock-agentcore'")
print(f"\n Option 3: Python Script")
print(f" python check_recent_logs.py")
# Check if agent has been invoked
print(f"\n🔍 Checking Invocation History...")
if found_log_groups and any(lg['size'] > 0 for lg in found_log_groups):
print(f" ✅ Agent has been invoked (logs exist)")
# Try to get recent log events
for lg in found_log_groups:
if lg['size'] > 0:
try:
# Get recent log events
end_time = int(datetime.now().timestamp() * 1000)
start_time = int((datetime.now() - timedelta(hours=1)).timestamp() * 1000)
events_response = logs.filter_log_events(
logGroupName=lg['name'],
startTime=start_time,
endTime=end_time,
limit=10
)
events = events_response.get('events', [])
if events:
print(f"\n 📄 Recent log events from {lg['name']}:")
for event in events[:5]:
timestamp = datetime.fromtimestamp(event['timestamp'] / 1000).strftime('%H:%M:%S')
message = event['message'][:100]
print(f" [{timestamp}] {message}")
if len(events) > 5:
print(f" ... and {len(events) - 5} more events")
except Exception as e:
print(f" ⚠️ Error reading log events: {e}")
else:
print(f" ⚠️ No invocations detected")
print(f"\n 💡 To generate logs:")
print(f" 1. Invoke the agent using the Streamlit UI")
print(f" 2. Or run: python test_agent_invocation.py")
print(f" 3. Then check CloudWatch logs again")
print(f"\n" + "=" * 80)
if __name__ == '__main__':
main()
+45
View File
@@ -0,0 +1,45 @@
#!/bin/bash
# Check CloudWatch logs for the agent runtime
RUNTIME_ID="lakehouse_agent-Hhb3lX6y7M"
REGION="us-east-1"
echo "🔍 Checking CloudWatch logs for runtime: $RUNTIME_ID"
echo ""
# Try different log group patterns
LOG_GROUPS=(
"/aws/bedrock-agentcore/runtime/$RUNTIME_ID"
"/aws/bedrock/agentcore/runtime/$RUNTIME_ID"
"/aws/bedrock-agentcore/$RUNTIME_ID"
)
for LOG_GROUP in "${LOG_GROUPS[@]}"; do
echo "Checking log group: $LOG_GROUP"
aws logs describe-log-streams \
--log-group-name "$LOG_GROUP" \
--region "$REGION" \
--max-items 5 \
2>&1 | head -20
if [ $? -eq 0 ]; then
echo ""
echo "✅ Found log group: $LOG_GROUP"
echo ""
echo "📋 Recent logs:"
aws logs tail "$LOG_GROUP" \
--region "$REGION" \
--since 1h \
--format short \
2>&1 | head -50
break
fi
echo ""
done
echo ""
echo "🔍 Searching for any bedrock-agentcore log groups..."
aws logs describe-log-groups \
--region "$REGION" \
--log-group-name-prefix "/aws/bedrock" \
2>&1 | grep -i "logGroupName" | head -20
@@ -0,0 +1,118 @@
#!/usr/bin/env python3
"""
Simple test of agent runtime WITHOUT gateway tools - just basic conversation
"""
import json
import uuid
import requests
from config import config
def get_cognito_token():
"""Get Cognito bearer token using client_credentials flow"""
print("🔑 Getting Cognito bearer token...")
cognito_domain = config.COGNITO_DOMAIN
client_id = config.COGNITO_APP_CLIENT_ID
client_secret = config.COGNITO_APP_CLIENT_SECRET
scope = config.COGNITO_SCOPE_QUERY
# Fix scope format: replace slashes with dots
scope = scope.replace('/claims/', '/claims.')
token_url = f"{cognito_domain}/oauth2/token"
data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": scope
}
try:
response = requests.post(token_url, data=data, timeout=10)
response.raise_for_status()
token = response.json().get("access_token")
print(f"✅ Token obtained: {token[:20]}...")
return token
except Exception as e:
print(f"❌ Failed to get token: {e}")
return None
def test_agent_no_gateway():
"""Test agent with a simple prompt that doesn't require gateway tools"""
print("=" * 60)
print("🧪 Testing Agent Runtime (No Gateway)")
print("=" * 60)
runtime_arn = config.RUNTIME_ARN
region = config.AWS_REGION
print(f"\nRuntime ARN: {runtime_arn}")
print(f"Region: {region}")
# Get Cognito token
bearer_token = get_cognito_token()
if not bearer_token:
print("\n❌ Cannot proceed without token")
return False
try:
# Build runtime endpoint URL
escaped_arn = requests.utils.quote(runtime_arn, safe='')
base_url = f"https://bedrock-agentcore.{region}.amazonaws.com"
runtime_url = f"{base_url}/runtimes/{escaped_arn}/invocations"
# Simple payload that doesn't require tools
# Just ask the agent to introduce itself
payload = {
"prompt": "Hello! Please introduce yourself and tell me what you can help with. Don't try to query any data, just explain your capabilities.",
"bearer_token": bearer_token
}
# Generate session ID
session_id = f"test-session-{uuid.uuid4().hex}"
print(f"\n📤 Sending request...")
print(f" URL: {runtime_url}")
print(f" Prompt: {payload['prompt'][:80]}...")
print(f" Session ID: {session_id}")
# Make direct HTTPS request with OAuth bearer token
headers = {
'Authorization': f'Bearer {bearer_token}',
'Content-Type': 'application/json',
'X-Amzn-Bedrock-AgentCore-Runtime-Session-Id': session_id
}
print(f"\n⏳ Waiting for response (this may take 30-60 seconds)...")
response = requests.post(
runtime_url,
headers=headers,
params={'qualifier': 'DEFAULT'},
json=payload,
timeout=90 # Longer timeout for first request
)
response.raise_for_status()
response_data = response.json()
print(f"\n✅ Agent response:")
print(json.dumps(response_data, indent=2))
return True
except Exception as e:
print(f"\n❌ Error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f" Status: {e.response.status_code}")
print(f" Response: {e.response.text}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = test_agent_no_gateway()
print("\n" + "=" * 60)
if success:
print("✅ Test passed!")
else:
print("❌ Test failed")
print("=" * 60)
@@ -0,0 +1,121 @@
#!/usr/bin/env python3
"""
Simple test of agent runtime with JWT authentication
"""
import json
import uuid
import requests
from config import config
def get_cognito_token():
"""Get Cognito bearer token using client_credentials flow"""
print("🔑 Getting Cognito bearer token...")
cognito_domain = config.COGNITO_DOMAIN
client_id = config.COGNITO_APP_CLIENT_ID
client_secret = config.COGNITO_APP_CLIENT_SECRET
scope = config.COGNITO_SCOPE_QUERY
# Fix scope format: replace slashes with dots
scope = scope.replace('/claims/', '/claims.')
token_url = f"{cognito_domain}/oauth2/token"
data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": scope
}
try:
response = requests.post(token_url, data=data, timeout=10)
response.raise_for_status()
token = response.json().get("access_token")
print(f"✅ Token obtained: {token[:20]}...")
return token
except Exception as e:
print(f"❌ Failed to get token: {e}")
return None
def test_agent_simple():
"""Test agent with a simple prompt using JWT authentication"""
print("=" * 60)
print("🧪 Testing Agent Runtime (Simple with JWT)")
print("=" * 60)
runtime_arn = config.RUNTIME_ARN
runtime_id = config.RUNTIME_ID
region = config.AWS_REGION
print(f"\nRuntime ARN: {runtime_arn}")
print(f"Runtime ID: {runtime_id}")
print(f"Region: {region}")
# Get Cognito token
bearer_token = get_cognito_token()
if not bearer_token:
print("\n❌ Cannot proceed without token")
return False
try:
# Build runtime endpoint URL using the correct format
# Format: https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{escaped_arn}/invocations
escaped_arn = requests.utils.quote(runtime_arn, safe='')
base_url = f"https://bedrock-agentcore.{region}.amazonaws.com"
runtime_url = f"{base_url}/runtimes/{escaped_arn}/invocations"
# Payload with bearer token for Gateway calls
payload = {
"prompt": "Hello, can you introduce yourself?",
"bearer_token": bearer_token # Pass token in payload for agent to use with Gateway
}
# Generate session ID
session_id = f"test-session-{uuid.uuid4().hex}"
print(f"\n📤 Sending request...")
print(f" URL: {runtime_url}")
print(f" Prompt: {payload['prompt']}")
print(f" Session ID: {session_id}")
print(f" Bearer token: {bearer_token[:20]}...")
# Make direct HTTPS request with OAuth bearer token
# The runtime is configured for JWT auth, so we must use Authorization header
headers = {
'Authorization': f'Bearer {bearer_token}',
'Content-Type': 'application/json',
'X-Amzn-Bedrock-AgentCore-Runtime-Session-Id': session_id
}
response = requests.post(
runtime_url,
headers=headers,
params={'qualifier': 'DEFAULT'},
json=payload,
timeout=60 # Increased timeout for agent processing
)
response.raise_for_status()
response_data = response.json()
print(f"\n✅ Agent response:")
print(json.dumps(response_data, indent=2))
return True
except Exception as e:
print(f"\n❌ Error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f" Status: {e.response.status_code}")
print(f" Response: {e.response.text}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = test_agent_simple()
print("\n" + "=" * 60)
if success:
print("✅ Simple test passed!")
else:
print("❌ Simple test failed")
print("=" * 60)
@@ -0,0 +1,132 @@
#!/usr/bin/env python3
"""
Test end-to-end flow: Cognito Token → Runtime → Agent → Gateway → MCP Server
"""
import json
import requests
from config import config
def get_cognito_token():
"""Get Cognito bearer token using client_credentials flow"""
print("🔑 Getting Cognito bearer token...")
cognito_domain = config.COGNITO_DOMAIN
client_id = config.COGNITO_APP_CLIENT_ID
client_secret = config.COGNITO_APP_CLIENT_SECRET
scope = config.COGNITO_SCOPE_QUERY
# Fix scope format: replace slashes with dots
scope = scope.replace('/claims/', '/claims.')
print(f" Domain: {cognito_domain}")
print(f" Client ID: {client_id}")
print(f" Scope: {scope}")
token_url = f"{cognito_domain}/oauth2/token"
data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": scope
}
try:
response = requests.post(token_url, data=data, timeout=10)
response.raise_for_status()
token = response.json().get("access_token")
print(f"✅ Token obtained: {token[:20]}...")
return token
except Exception as e:
print(f"❌ Failed to get token: {e}")
# Print response details for debugging
if hasattr(e, 'response') and e.response is not None:
print(f" Response status: {e.response.status_code}")
print(f" Response body: {e.response.text}")
return None
def invoke_agent_runtime(bearer_token: str, prompt: str = "Show me all my claims"):
"""Invoke the lakehouse agent runtime with bearer token via JWT authentication"""
print(f"\n🤖 Invoking agent runtime...")
print(f" Prompt: {prompt}")
runtime_id = config.RUNTIME_ID
region = config.AWS_REGION
try:
# Build runtime endpoint URL using the correct format
# Format: https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{escaped_arn}/invocations
runtime_arn = config.RUNTIME_ARN
escaped_arn = requests.utils.quote(runtime_arn, safe='')
base_url = f"https://bedrock-agentcore.{region}.amazonaws.com"
runtime_url = f"{base_url}/runtimes/{escaped_arn}/invocations"
# Payload with bearer token for Gateway calls
payload = {
"prompt": prompt,
"bearer_token": bearer_token # Pass token in payload for agent to use with Gateway
}
print(f" Runtime URL: {runtime_url}")
print(f" Bearer token: {bearer_token[:20]}...")
# Generate a session ID that meets the minimum length requirement (33 chars)
import uuid
session_id = f"test-session-{uuid.uuid4().hex}"
# Make direct HTTPS request with OAuth bearer token
# The runtime is configured for JWT auth, so we must use Authorization header
headers = {
'Authorization': f'Bearer {bearer_token}',
'Content-Type': 'application/json',
'X-Amzn-Bedrock-AgentCore-Runtime-Session-Id': session_id
}
response = requests.post(
runtime_url,
headers=headers,
params={'qualifier': 'DEFAULT'},
json=payload,
timeout=30
)
response.raise_for_status()
response_data = response.json()
print(f"\n✅ Agent response:")
print(json.dumps(response_data, indent=2))
return response_data
except Exception as e:
print(f"\n❌ Error invoking agent: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f" Status: {e.response.status_code}")
print(f" Response: {e.response.text}")
import traceback
traceback.print_exc()
return None
def main():
print("=" * 60)
print("🧪 Testing End-to-End Flow")
print("=" * 60)
# Step 1: Get Cognito token
token = get_cognito_token()
if not token:
print("\n❌ Cannot proceed without token")
return
# Step 2: Invoke agent runtime with token
response = invoke_agent_runtime(token)
if response:
print("\n" + "=" * 60)
print("✅ End-to-end test completed!")
print("=" * 60)
else:
print("\n" + "=" * 60)
print("❌ End-to-end test failed")
print("=" * 60)
if __name__ == "__main__":
main()
@@ -0,0 +1,177 @@
#!/usr/bin/env python3
"""
End-to-End Test with User Authentication for RLS
This test uses actual user credentials (not client_credentials) to test
row-level security with proper user identity.
"""
import sys
import requests
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent))
from aws_session_utils import get_aws_session
def main():
session, region, account_id = get_aws_session()
ssm = session.client('ssm', region_name=region)
print('='*70)
print('E2E TEST WITH USER AUTHENTICATION')
print('='*70)
print()
# Get configuration
print('Loading configuration from SSM...')
runtime_arn = ssm.get_parameter(Name='/app/lakehouse-agent/agent-runtime-id')['Parameter']['Value']
cognito_domain = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-domain')['Parameter']['Value']
client_id = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-id')['Parameter']['Value']
client_secret = ssm.get_parameter(Name='/app/lakehouse-agent/cognito-app-client-secret', WithDecryption=True)['Parameter']['Value']
# Get test user credentials
test_user = ssm.get_parameter(Name='/app/lakehouse-agent/test-user-3')['Parameter']['Value']
test_password = ssm.get_parameter(Name='/app/lakehouse-agent/test-password', WithDecryption=True)['Parameter']['Value']
print(f'✅ Runtime: {runtime_arn}')
print(f'✅ Test User: {test_user}')
print()
# Get user token using Resource Owner Password Credentials flow
print('🔑 Getting user token (ROPC flow)...')
token_url = f'{cognito_domain}/oauth2/token'
try:
response = requests.post(
token_url,
auth=(client_id, client_secret),
data={
'grant_type': 'password',
'username': test_user,
'password': test_password,
'scope': 'lakehouse-api/claims.query openid email profile'
},
headers={'Content-Type': 'application/x-www-form-urlencoded'}
)
if response.status_code == 200:
token_data = response.json()
access_token = token_data['access_token']
id_token = token_data.get('id_token')
print(f'✅ Access token obtained')
if id_token:
print(f'✅ ID token obtained')
# Decode token to show user identity
import base64
import json
def decode_jwt(token):
parts = token.split('.')
if len(parts) == 3:
payload = parts[1]
payload += '=' * (4 - len(payload) % 4)
decoded = base64.urlsafe_b64decode(payload)
return json.loads(decoded)
return {}
access_claims = decode_jwt(access_token)
print(f'\n🔍 Access Token Claims:')
print(f' Username: {access_claims.get("username", "N/A")}')
print(f' Email: {access_claims.get("email", "N/A")}')
print(f' Scope: {access_claims.get("scope", "N/A")}')
if id_token:
id_claims = decode_jwt(id_token)
print(f'\n🔍 ID Token Claims:')
print(f' Email: {id_claims.get("email", "N/A")}')
print(f' Email Verified: {id_claims.get("email_verified", "N/A")}')
# Use ID token if available (contains more user info), otherwise access token
bearer_token = id_token if id_token else access_token
else:
print(f'❌ Failed to get token: HTTP {response.status_code}')
print(f' Response: {response.text}')
# Try client_credentials as fallback
print(f'\n⚠️ Falling back to client_credentials flow...')
response = requests.post(
token_url,
data={
'grant_type': 'client_credentials',
'client_id': client_id,
'client_secret': client_secret,
'scope': 'lakehouse-api/claims.query'
}
)
if response.status_code == 200:
bearer_token = response.json()['access_token']
print(f'✅ Got client_credentials token (no user identity for RLS)')
else:
print(f'❌ Failed: {response.text}')
return False
except Exception as e:
print(f'❌ Error getting token: {e}')
import traceback
traceback.print_exc()
return False
# Invoke agent
print(f'\n🤖 Invoking agent with user token...')
import urllib.parse
encoded_arn = urllib.parse.quote(f'arn:aws:bedrock-agentcore:{region}:{account_id}:runtime/{runtime_arn}', safe='')
runtime_url = f'https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{encoded_arn}/invocations'
try:
response = requests.post(
runtime_url,
json={
'input': 'Show me all my claims',
'sessionId': f'test-session-{test_user.replace("@", "-at-")}'
},
headers={
'Authorization': f'Bearer {bearer_token}',
'Content-Type': 'application/json'
},
timeout=60
)
if response.status_code == 200:
result = response.json()
print(f'✅ Agent response received')
print(f'\nResponse:')
print(f' Content: {result.get("content", "N/A")[:200]}...')
print(f' Tool Calls: {result.get("tool_calls", 0)}')
if result.get('tool_calls', 0) > 0:
print(f'\n✅✅✅ SUCCESS: Tools were invoked!')
print(f'\nWith user identity: {test_user}')
print(f'RLS should be applied based on this user')
else:
print(f'\n❌ FAIL: No tools invoked')
print(f' Check MCP server logs for errors')
return result.get('tool_calls', 0) > 0
else:
print(f'❌ Agent invocation failed: HTTP {response.status_code}')
print(f' Response: {response.text}')
return False
except Exception as e:
print(f'❌ Error invoking agent: {e}')
import traceback
traceback.print_exc()
return False
if __name__ == '__main__':
success = main()
sys.exit(0 if success else 1)
@@ -0,0 +1,628 @@
#!/usr/bin/env python3
"""
Comprehensive validation test for SSM migration implementation.
This script validates all aspects of the SSM migration:
1. Migration utility functionality (dry-run)
2. SSM parameter creation and retrieval
3. Application startup with SSM configuration
4. Error handling when SSM unavailable
5. Sensitive parameter encryption
6. Parameter substitution for ARNs
7. IAM permissions validation
Requirements: 8.1, 8.2, 8.3, 8.4, 8.5
"""
import sys
import os
import tempfile
from pathlib import Path
from typing import Dict, List, Tuple
import boto3
from botocore.exceptions import ClientError, NoCredentialsError
# Add current directory to path
sys.path.insert(0, str(Path(__file__).parent))
from ssm_config import SSMConfigLoader
from ssm_migrate import SSMMigrationUtility, MigrationResult
from config import Config
class ValidationTest:
"""Comprehensive validation test suite for SSM migration."""
def __init__(self):
self.results: List[Tuple[str, bool, str]] = []
self.test_prefix = 'lh_test_'
self.cleanup_params: List[str] = []
def log_result(self, test_name: str, passed: bool, message: str = ""):
"""Log a test result."""
status = "✅ PASS" if passed else "❌ FAIL"
self.results.append((test_name, passed, message))
print(f"{status}: {test_name}")
if message:
print(f" {message}")
def print_summary(self):
"""Print test summary."""
print("\n" + "=" * 70)
print("VALIDATION TEST SUMMARY")
print("=" * 70)
passed = sum(1 for _, p, _ in self.results if p)
total = len(self.results)
print(f"\nTotal Tests: {total}")
print(f"Passed: {passed}")
print(f"Failed: {total - passed}")
print(f"Success Rate: {(passed/total*100):.1f}%\n")
if total - passed > 0:
print("Failed Tests:")
for name, passed, msg in self.results:
if not passed:
print(f"{name}")
if msg:
print(f" {msg}")
print("=" * 70 + "\n")
return passed == total
def test_1_ssm_connectivity(self) -> bool:
"""Test 1: Verify SSM connectivity and IAM permissions."""
print("\n" + "=" * 70)
print("TEST 1: SSM Connectivity and IAM Permissions")
print("=" * 70 + "\n")
try:
loader = SSMConfigLoader()
# Test SSM availability
is_available = loader.is_available()
self.log_result(
"SSM Parameter Store is accessible",
is_available,
"Check AWS credentials and IAM permissions" if not is_available else ""
)
if not is_available:
return False
# Test region detection
region = loader.get_region()
self.log_result(
"AWS region auto-detection",
bool(region),
f"Detected region: {region}"
)
# Test account ID detection
account_id = loader.get_account_id()
self.log_result(
"AWS account ID auto-detection",
bool(account_id) and account_id.isdigit(),
f"Detected account ID: {account_id}"
)
return True
except Exception as e:
self.log_result("SSM connectivity test", False, str(e))
return False
def test_2_migration_utility_dry_run(self) -> bool:
"""Test 2: Run migration utility in dry-run mode."""
print("\n" + "=" * 70)
print("TEST 2: Migration Utility (Dry-Run)")
print("=" * 70 + "\n")
try:
# Create a temporary .env file for testing
with tempfile.NamedTemporaryFile(mode='w', suffix='.env', delete=False) as f:
env_file = Path(f.name)
f.write("# Test configuration\n")
f.write("TEST_PARAM_1=value1\n")
f.write("TEST_PARAM_2=value2\n")
f.write("TEST_SECRET_KEY=secret123\n")
f.write("AWS_REGION=us-east-1\n") # Should be skipped
f.write("AWS_ACCOUNT_ID=XXXXXXXXXXXX\n") # Should be skipped
try:
utility = SSMMigrationUtility(prefix=self.test_prefix)
# Run dry-run migration
result = utility.migrate_env_to_ssm(
env_file=env_file,
overwrite=False,
dry_run=True
)
# Verify dry-run results
self.log_result(
"Dry-run migration executed",
True,
f"Would create/update {len(result.created) + len(result.updated)} parameters"
)
# Verify AWS_REGION and AWS_ACCOUNT_ID are skipped
skipped_auto = any('auto-detected' in s for s in result.skipped)
self.log_result(
"Auto-detected parameters skipped",
skipped_auto,
f"Skipped: {[s for s in result.skipped if 'auto-detected' in s]}"
)
# Verify no failures in dry-run
self.log_result(
"No failures in dry-run",
len(result.failed) == 0,
f"Failures: {result.failed}" if result.failed else ""
)
return True
finally:
# Clean up temp file
env_file.unlink()
except Exception as e:
self.log_result("Migration utility dry-run", False, str(e))
return False
def test_3_parameter_creation(self) -> bool:
"""Test 3: Create test parameters and verify they exist."""
print("\n" + "=" * 70)
print("TEST 3: SSM Parameter Creation and Retrieval")
print("=" * 70 + "\n")
try:
loader = SSMConfigLoader(prefix=self.test_prefix)
ssm_client = loader._ssm_client
# Create test parameters
test_params = {
'TEST_STRING_PARAM': ('test_value', 'String'),
'TEST_SECRET_KEY': ('secret_value', 'SecureString'),
'TEST_ARN_PARAM': ('arn:aws:service:${AWS_REGION}:${AWS_ACCOUNT_ID}:resource', 'String'),
}
for key, (value, param_type) in test_params.items():
ssm_name = loader._config_key_to_ssm_name(key)
self.cleanup_params.append(ssm_name)
try:
ssm_client.put_parameter(
Name=ssm_name,
Value=value,
Type=param_type,
Overwrite=True,
Description=f"Test parameter: {key}"
)
self.log_result(
f"Created parameter: {key}",
True,
f"SSM name: {ssm_name}, Type: {param_type}"
)
except Exception as e:
self.log_result(f"Create parameter: {key}", False, str(e))
return False
# Verify parameters can be retrieved
loader.clear_cache() # Clear cache to force SSM retrieval
for key, (expected_value, _) in test_params.items():
retrieved_value = loader.get_parameter(key)
matches = retrieved_value == expected_value
self.log_result(
f"Retrieved parameter: {key}",
matches,
f"Expected: {expected_value}, Got: {retrieved_value}" if not matches else ""
)
return True
except Exception as e:
self.log_result("Parameter creation test", False, str(e))
return False
def test_4_sensitive_parameter_encryption(self) -> bool:
"""Test 4: Verify sensitive parameters use SecureString type."""
print("\n" + "=" * 70)
print("TEST 4: Sensitive Parameter Encryption")
print("=" * 70 + "\n")
try:
loader = SSMConfigLoader(prefix=self.test_prefix)
ssm_client = loader._ssm_client
# Check if TEST_SECRET_KEY is SecureString
ssm_name = loader._config_key_to_ssm_name('TEST_SECRET_KEY')
response = ssm_client.get_parameter(Name=ssm_name, WithDecryption=False)
param_type = response['Parameter']['Type']
is_secure = param_type == 'SecureString'
self.log_result(
"Sensitive parameter uses SecureString",
is_secure,
f"Parameter type: {param_type}"
)
# Test sensitive parameter detection
test_cases = [
('MY_SECRET_KEY', True),
('DATABASE_PASSWORD', True),
('API_KEY', True),
('AUTH_TOKEN', True),
('S3_BUCKET_NAME', False),
('DATABASE_NAME', False),
]
for key, should_be_sensitive in test_cases:
is_sensitive = loader._is_sensitive(key)
matches = is_sensitive == should_be_sensitive
self.log_result(
f"Sensitive detection: {key}",
matches,
f"Expected: {should_be_sensitive}, Got: {is_sensitive}" if not matches else ""
)
return True
except Exception as e:
self.log_result("Sensitive parameter encryption test", False, str(e))
return False
def test_5_parameter_substitution(self) -> bool:
"""Test 5: Verify parameter substitution for ARNs."""
print("\n" + "=" * 70)
print("TEST 5: Parameter Substitution")
print("=" * 70 + "\n")
try:
loader = SSMConfigLoader(prefix=self.test_prefix)
# Get the ARN parameter with placeholders
arn_value = loader.get_parameter('TEST_ARN_PARAM')
# Create a minimal config-like object for substitution
class TestConfig:
def __init__(self):
self.AWS_REGION = loader.get_region()
self.AWS_ACCOUNT_ID = loader.get_account_id()
def _substitute_variables(self, value: str) -> str:
if '${AWS_ACCOUNT_ID}' in value:
value = value.replace('${AWS_ACCOUNT_ID}', self.AWS_ACCOUNT_ID)
if '${AWS_REGION}' in value:
value = value.replace('${AWS_REGION}', self.AWS_REGION)
return value
test_config = TestConfig()
substituted = test_config._substitute_variables(arn_value)
# Verify substitution occurred
has_placeholders = '${' in substituted
self.log_result(
"Parameter substitution removes placeholders",
not has_placeholders,
f"Result: {substituted}"
)
# Verify correct values were substituted
contains_region = test_config.AWS_REGION in substituted
contains_account = test_config.AWS_ACCOUNT_ID in substituted
self.log_result(
"Substitution includes AWS_REGION",
contains_region,
f"Region: {test_config.AWS_REGION}"
)
self.log_result(
"Substitution includes AWS_ACCOUNT_ID",
contains_account,
f"Account ID: {test_config.AWS_ACCOUNT_ID}"
)
return True
except Exception as e:
self.log_result("Parameter substitution test", False, str(e))
return False
def test_6_config_initialization(self) -> bool:
"""Test 6: Test application startup with SSM configuration."""
print("\n" + "=" * 70)
print("TEST 6: Application Configuration Initialization")
print("=" * 70 + "\n")
try:
# Note: This will use the actual lh_ prefix, not test prefix
# We're testing that the Config class can initialize
config = Config()
# Verify config loaded
self.log_result(
"Config class initialized",
config._loaded,
"Configuration loaded from SSM"
)
# Verify AWS credentials auto-detected
has_region = bool(config.AWS_REGION)
has_account = bool(config.AWS_ACCOUNT_ID)
self.log_result(
"AWS_REGION auto-detected",
has_region,
f"Region: {config.AWS_REGION}"
)
self.log_result(
"AWS_ACCOUNT_ID auto-detected",
has_account,
f"Account ID: {config.AWS_ACCOUNT_ID}"
)
# Test get() method
region_via_get = config.get('AWS_REGION')
self.log_result(
"Config.get() method works",
region_via_get == config.AWS_REGION,
f"Retrieved: {region_via_get}"
)
return True
except Exception as e:
self.log_result("Config initialization test", False, str(e))
return False
def test_7_error_handling(self) -> bool:
"""Test 7: Test error handling when SSM unavailable."""
print("\n" + "=" * 70)
print("TEST 7: Error Handling")
print("=" * 70 + "\n")
try:
loader = SSMConfigLoader(prefix=self.test_prefix)
# Test getting non-existent parameter with default
default_value = "default_value"
result = loader.get_parameter('NONEXISTENT_PARAM', default=default_value)
self.log_result(
"Non-existent parameter returns default",
result == default_value,
f"Expected: {default_value}, Got: {result}"
)
# Test parameter name conversion
test_key = "MY_TEST_PARAMETER"
expected_ssm_name = f"{self.test_prefix}my_test_parameter"
actual_ssm_name = loader._config_key_to_ssm_name(test_key)
self.log_result(
"Parameter name conversion",
actual_ssm_name == expected_ssm_name,
f"Expected: {expected_ssm_name}, Got: {actual_ssm_name}"
)
# Test cache functionality
loader.clear_cache()
self.log_result(
"Cache clear functionality",
len(loader._cache) == 0,
"Cache cleared successfully"
)
return True
except Exception as e:
self.log_result("Error handling test", False, str(e))
return False
def test_8_export_functionality(self) -> bool:
"""Test 8: Test export functionality."""
print("\n" + "=" * 70)
print("TEST 8: Export Functionality")
print("=" * 70 + "\n")
try:
utility = SSMMigrationUtility(prefix=self.test_prefix)
# Export to temporary file
with tempfile.NamedTemporaryFile(mode='w', suffix='.env', delete=False) as f:
output_file = Path(f.name)
try:
# Export without secrets
count = utility.export_ssm_to_env(
output_file=output_file,
include_secrets=False
)
self.log_result(
"Export executed successfully",
count > 0,
f"Exported {count} parameters"
)
# Verify file was created
exists = output_file.exists()
self.log_result(
"Export file created",
exists,
f"File: {output_file}"
)
if exists:
# Read and verify content
content = output_file.read_text()
# Should contain AWS_REGION and AWS_ACCOUNT_ID
has_region = 'AWS_REGION=' in content
has_account = 'AWS_ACCOUNT_ID=' in content
self.log_result(
"Export includes AWS_REGION",
has_region
)
self.log_result(
"Export includes AWS_ACCOUNT_ID",
has_account
)
# Should mask secrets
has_masked = '***MASKED***' in content
self.log_result(
"Sensitive values masked in export",
has_masked,
"Secrets are properly masked"
)
return True
finally:
# Clean up
if output_file.exists():
output_file.unlink()
except Exception as e:
self.log_result("Export functionality test", False, str(e))
return False
def test_9_validation_utility(self) -> bool:
"""Test 9: Test validation utility."""
print("\n" + "=" * 70)
print("TEST 9: Validation Utility")
print("=" * 70 + "\n")
try:
utility = SSMMigrationUtility(prefix=self.test_prefix)
# Run validation
results = utility.validate_ssm_parameters(verbose=False)
self.log_result(
"Validation utility executed",
isinstance(results, dict),
f"Checked {len(results)} parameters"
)
# Note: Validation may fail if required parameters don't exist
# This is expected for test prefix
self.log_result(
"Validation returns results dictionary",
True,
"Validation completed"
)
return True
except Exception as e:
self.log_result("Validation utility test", False, str(e))
return False
def cleanup(self):
"""Clean up test parameters."""
print("\n" + "=" * 70)
print("CLEANUP: Removing Test Parameters")
print("=" * 70 + "\n")
if not self.cleanup_params:
print("No parameters to clean up")
return
try:
loader = SSMConfigLoader(prefix=self.test_prefix)
ssm_client = loader._ssm_client
for param_name in self.cleanup_params:
try:
ssm_client.delete_parameter(Name=param_name)
print(f"✅ Deleted: {param_name}")
except ClientError as e:
if e.response.get('Error', {}).get('Code') == 'ParameterNotFound':
print(f"⏭️ Already deleted: {param_name}")
else:
print(f"❌ Failed to delete: {param_name} - {e}")
except Exception as e:
print(f"❌ Cleanup error: {e}")
def run_all_tests(self) -> bool:
"""Run all validation tests."""
print("\n" + "=" * 70)
print("SSM MIGRATION VALIDATION TEST SUITE")
print("=" * 70)
print("\nThis test suite validates all requirements for task 10:")
print(" - Migration utility functionality")
print(" - SSM parameter creation and retrieval")
print(" - Application startup with SSM configuration")
print(" - Error handling")
print(" - Sensitive parameter encryption")
print(" - Parameter substitution")
print(" - IAM permissions validation")
print("\n")
try:
# Run tests in sequence
self.test_1_ssm_connectivity()
self.test_2_migration_utility_dry_run()
self.test_3_parameter_creation()
self.test_4_sensitive_parameter_encryption()
self.test_5_parameter_substitution()
self.test_6_config_initialization()
self.test_7_error_handling()
self.test_8_export_functionality()
self.test_9_validation_utility()
# Print summary
all_passed = self.print_summary()
return all_passed
finally:
# Always cleanup
self.cleanup()
def main():
"""Main entry point."""
validator = ValidationTest()
try:
all_passed = validator.run_all_tests()
if all_passed:
print("\n🎉 All validation tests passed!")
print("The SSM migration implementation is ready for production use.")
sys.exit(0)
else:
print("\n⚠️ Some validation tests failed.")
print("Please review the failures above and address any issues.")
sys.exit(1)
except KeyboardInterrupt:
print("\n\n❌ Tests cancelled by user")
validator.cleanup()
sys.exit(130)
except Exception as e:
print(f"\n❌ Unexpected error: {e}")
import traceback
traceback.print_exc()
validator.cleanup()
sys.exit(1)
if __name__ == '__main__':
main()
@@ -0,0 +1,548 @@
#!/usr/bin/env python3
"""
AWS Session Utility for SSO-aware boto3 Session Creation
This module provides a centralized way to create and validate AWS sessions,
with special handling for AWS SSO authentication. It addresses common issues
with SSO token expiration and profile detection.
Features:
- Automatic fallback from expired .env credentials to AWS SSO profiles
- Clear error messages with remediation steps
- Support for container IAM roles (Lambda, ECS, EKS)
- Flexible credential priority: Container IAM > Env vars > SSO profiles
Usage:
from utils.aws_session_utils import get_aws_session, load_env_credentials
# Load credentials from .env file and get session
# If .env credentials are invalid/expired, automatically falls back to SSO
load_env_credentials()
session, region, account_id = get_aws_session()
# Auto-detect profile from environment
session, region, account_id = get_aws_session()
# Use specific profile
session, region, account_id = get_aws_session(profile_name='myprofile')
# Specify region
session, region, account_id = get_aws_session(region_name='us-west-2')
"""
import boto3
import os
import sys
from pathlib import Path
from typing import Tuple, Optional
from botocore.exceptions import (
NoCredentialsError,
ProfileNotFound,
ClientError,
TokenRetrievalError,
SSOTokenLoadError
)
def load_env_credentials(env_path: str = '.env', verbose: bool = True) -> bool:
"""
Load AWS credentials from .env file into environment variables.
This function loads environment variables from a .env file, making them
available to boto3 and other AWS tools. It's designed to work seamlessly
with the get_aws_session() function.
Args:
env_path: Path to the .env file. Default is '.env' in current directory.
verbose: If True, print status messages. Default True.
Returns:
bool: True if credentials were loaded successfully, False otherwise.
Example:
>>> from utils.aws_session_utils import load_env_credentials, get_aws_session
>>> load_env_credentials()
>>> session, region, account_id = get_aws_session()
"""
env_file = Path(env_path)
if not env_file.exists():
if verbose:
print(f"⚠️ .env file not found at {env_path}")
return False
loaded_vars = []
try:
with open(env_file, 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, value = line.split('=', 1)
key = key.strip()
value = value.strip().strip('"').strip("'")
if value: # Only set non-empty values
os.environ[key] = value
loaded_vars.append(key)
except Exception as e:
if verbose:
print(f"❌ Error reading .env file: {e}")
return False
if loaded_vars:
if verbose:
print(f"✅ Loaded {len(loaded_vars)} variables from .env file:")
for var in loaded_vars:
if any(keyword in var.upper() for keyword in ['SECRET', 'TOKEN', 'PASSWORD']):
print(f" {var}: ****** (hidden)")
else:
print(f" {var}: {os.environ[var]}")
# Validate AWS credentials were loaded
aws_vars = ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_SESSION_TOKEN', 'AWS_DEFAULT_REGION']
has_credentials = bool(
os.environ.get('AWS_ACCESS_KEY_ID') and
os.environ.get('AWS_SECRET_ACCESS_KEY')
)
if verbose:
print("\nCurrent AWS environment variables:")
for key in aws_vars:
value = os.environ.get(key)
if value:
if any(keyword in key for keyword in ['SECRET', 'TOKEN']):
print(f" {key}: {'*' * min(len(value), 20)} (hidden)")
else:
print(f" {key}: {value}")
else:
print(f" {key}: Not set")
if has_credentials:
if verbose:
print("\n✅ AWS credentials loaded successfully!")
return True
else:
if verbose:
print("\n⚠️ AWS credentials not found in .env file")
print(" Make sure your .env file contains:")
print(" AWS_ACCESS_KEY_ID=your-access-key")
print(" AWS_SECRET_ACCESS_KEY=your-secret-key")
print(" AWS_SESSION_TOKEN=your-session-token (if using STS)")
print(" AWS_DEFAULT_REGION=your-region")
return False
else:
if verbose:
print("⚠️ No valid environment variables found in .env file")
return False
def get_aws_session(
profile_name: Optional[str] = None,
region_name: Optional[str] = None,
verbose: bool = True
) -> Tuple[boto3.Session, str, str]:
"""
Create and validate AWS session with SSO support and automatic fallback.
This function:
1. Detects AWS profile from environment variables or parameters
2. Creates boto3 session with correct profile
3. Validates credentials are available and not expired
4. Automatically falls back to AWS SSO/profile if environment credentials fail
5. Provides clear error messages with remediation steps
Credential priority order:
1. Container IAM role (if running in Lambda/ECS/EKS)
2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
- If invalid/expired, automatically clears them and falls back to SSO
3. AWS SSO profile (from AWS_PROFILE or AWS_DEFAULT_PROFILE)
4. Default AWS credentials
Args:
profile_name: Optional AWS profile name to use. If not provided,
will check AWS_PROFILE and AWS_DEFAULT_PROFILE env vars.
region_name: Optional AWS region. If not provided, will auto-detect
from session, environment, or use us-east-1 default.
verbose: If True, print status messages. Default True.
Returns:
Tuple of (boto3.Session, region_name: str, account_id: str)
Raises:
ValueError: If container credentials validation fails
SystemExit: On unrecoverable authentication errors when no fallback available
Examples:
>>> session, region, account = get_aws_session()
>>> s3_client = session.client('s3', region_name=region)
>>> session, region, account = get_aws_session(profile_name='prod')
"""
# Check if running in container/Lambda environment
# In these environments, IAM roles provide credentials automatically
is_container = any([
os.environ.get('AWS_EXECUTION_ENV'), # Lambda, ECS, etc.
os.environ.get('AWS_CONTAINER_CREDENTIALS_RELATIVE_URI'), # ECS
os.environ.get('AWS_CONTAINER_CREDENTIALS_FULL_URI'), # ECS
os.environ.get('ECS_CONTAINER_METADATA_URI'), # ECS
os.environ.get('K8S_AWS_ROLE_ARN'), # Kubernetes
])
if is_container and not profile_name:
# In container environment, use simple session with IAM role credentials
if verbose:
print("🔍 Container environment detected - using IAM role credentials")
region = _detect_region_simple(region_name)
session = boto3.Session(region_name=region)
try:
sts_client = session.client('sts', region_name=region)
account_id = sts_client.get_caller_identity()['Account']
if verbose:
print(f"✅ Container credentials validated")
print(f" Region: {region}")
print(f" Account ID: {account_id}")
return session, region, account_id
except Exception as e:
print(f"❌ Failed to validate container credentials: {e}")
# In container, let the error propagate rather than SystemExit
raise ValueError(f"Container credential validation failed: {e}") from e
# Check if we have environment variables (from .env or terminal)
# This takes precedence over SSO profiles
has_env_credentials = bool(
os.environ.get('AWS_ACCESS_KEY_ID') and
os.environ.get('AWS_SECRET_ACCESS_KEY')
)
if has_env_credentials:
# Use environment variables directly (bypasses SSO)
if verbose:
print("🔑 Using AWS credentials from environment variables")
# Detect region from multiple sources with proper priority
# 1. Explicit parameter, 2. Environment vars, 3. AWS config, 4. Default
if region_name:
region = region_name
else:
# Try environment variables first
region = os.environ.get('AWS_DEFAULT_REGION') or os.environ.get('AWS_REGION')
if not region:
# Try to get from AWS config (without credentials first)
try:
temp_session = boto3.Session()
region = temp_session.region_name
except:
pass
# Final fallback
if not region:
region = 'us-east-1'
# Create session with environment credentials
session = boto3.Session(
aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
aws_session_token=os.environ.get('AWS_SESSION_TOKEN'), # Optional for STS
region_name=region
)
# Validate credentials
try:
sts_client = session.client('sts', region_name=region)
identity = sts_client.get_caller_identity()
account_id = identity['Account']
if verbose:
print(f"✅ AWS credentials validated")
print(f" Account ID: {account_id}")
print(f" Region: {region}")
print(f" User ARN: {identity['Arn']}")
return session, region, account_id
except Exception as e:
if verbose:
print(f"⚠️ Environment credentials validation failed: {e}")
print(" Clearing invalid environment credentials...")
print(" Attempting fallback to AWS profile/SSO...\n")
# Clear invalid environment credentials to allow fallback
cleared_keys = []
for key in ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_SESSION_TOKEN']:
if key in os.environ:
del os.environ[key]
cleared_keys.append(key)
if verbose and cleared_keys:
print(f" Cleared: {', '.join(cleared_keys)}")
# Don't raise - fall through to profile/SSO section below
# Determine which profile to use (for local development)
detected_profile = _detect_profile(profile_name, verbose)
# Create session with detected profile
try:
if detected_profile:
session = boto3.Session(profile_name=detected_profile)
else:
session = boto3.Session()
except ProfileNotFound as e:
_print_profile_not_found_error(detected_profile)
raise SystemExit(1) from e
# Detect region
region = _detect_region(session, region_name, verbose)
# Recreate session with explicit region if needed
if region and not session.region_name:
if detected_profile:
session = boto3.Session(profile_name=detected_profile, region_name=region)
else:
session = boto3.Session(region_name=region)
# Validate credentials
try:
credentials = session.get_credentials()
if not credentials:
_print_no_credentials_error(detected_profile)
raise SystemExit(1)
# Test credentials by getting account ID
sts_client = session.client('sts', region_name=region)
account_id = sts_client.get_caller_identity()['Account']
except (TokenRetrievalError, SSOTokenLoadError) as e:
_print_sso_token_expired_error(detected_profile, region, str(e))
raise SystemExit(1) from e
except NoCredentialsError as e:
_print_no_credentials_error(detected_profile)
raise SystemExit(1) from e
except ClientError as e:
if 'ExpiredToken' in str(e) or 'InvalidToken' in str(e):
_print_sso_token_expired_error(detected_profile, region, str(e))
raise SystemExit(1) from e
else:
print(f"\n❌ AWS API Error: {e}")
raise SystemExit(1) from e
except Exception as e:
print(f"\n❌ Unexpected error validating AWS credentials: {e}")
print(f" Error type: {type(e).__name__}")
raise SystemExit(1) from e
# Print success message
if verbose:
_print_success_message(detected_profile, region, account_id, credentials)
return session, region, account_id
def _detect_profile(profile_name: Optional[str], verbose: bool) -> Optional[str]:
"""Detect which AWS profile to use."""
if profile_name:
if verbose:
print(f"🔍 Using specified profile: {profile_name}")
return profile_name
# Check environment variables in order of precedence
env_profile = os.environ.get('AWS_PROFILE') or os.environ.get('AWS_DEFAULT_PROFILE')
if env_profile:
if verbose:
env_var = 'AWS_PROFILE' if os.environ.get('AWS_PROFILE') else 'AWS_DEFAULT_PROFILE'
print(f"🔍 Using profile from {env_var}: {env_profile}")
return env_profile
if verbose:
print("🔍 Using default AWS credentials (no profile specified)")
return None
def _detect_region_simple(region_name: Optional[str]) -> str:
"""Detect AWS region without session (for container environments)."""
if region_name:
return region_name
# Try environment variables
region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION')
if region:
return region
# Try to get from AWS config
try:
temp_session = boto3.Session()
if temp_session.region_name:
return temp_session.region_name
except:
pass
# Default to us-east-1 as last resort
return 'us-east-1'
def _detect_region(
session: boto3.Session,
region_name: Optional[str],
verbose: bool
) -> str:
"""Detect AWS region from multiple sources."""
if region_name:
if verbose:
print(f"🌍 Using specified region: {region_name}")
return region_name
# Try to get from session
region = session.region_name
if region:
if verbose:
print(f"🌍 Using region from AWS config: {region}")
return region
# Try environment variables
region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION')
if region:
env_var = 'AWS_REGION' if os.environ.get('AWS_REGION') else 'AWS_DEFAULT_REGION'
if verbose:
print(f"🌍 Using region from {env_var}: {region}")
return region
# No region found anywhere - use default
region = 'us-east-1'
if verbose:
print(f"⚠️ No AWS region configured, using default: {region}")
print(" To set your region:")
print(" - Environment variable: export AWS_DEFAULT_REGION=your-region")
print(" - AWS CLI: aws configure set region your-region")
return region
def _print_success_message(
profile: Optional[str],
region: str,
account_id: str,
credentials
) -> None:
"""Print success message with credential info."""
print("\n✅ AWS Credentials Validated")
print(f" Region: {region}")
print(f" Account ID: {account_id}")
print(f" Profile: {profile or 'default'}")
# Try to detect SSO
is_sso = False
try:
if hasattr(credentials, 'method'):
method_str = str(credentials.method).lower()
is_sso = 'sso' in method_str
except:
pass
if is_sso:
print(" Auth method: AWS SSO")
else:
print(" Auth method: AWS credentials (IAM/access keys)")
print()
def _print_no_credentials_error(profile: Optional[str]) -> None:
"""Print helpful error message when no credentials are found."""
print("\n" + "="*70)
print("❌ AWS CREDENTIALS NOT CONFIGURED")
print("="*70)
print("\nNo AWS credentials were found.")
if profile:
print(f"\nCurrent profile: {profile}")
print("\nThis profile may not be configured. To set it up:")
print(f" aws configure --profile {profile}")
print("\nTo configure AWS credentials, use one of these methods:")
print("\n1. AWS SSO (Recommended for organizations):")
print(" aws configure sso")
print(" aws sso login --profile your-profile-name")
print(" export AWS_PROFILE=your-profile-name")
print("\n2. AWS CLI with access keys:")
print(" aws configure")
print("\n3. Environment variables:")
print(" export AWS_ACCESS_KEY_ID=your-key")
print(" export AWS_SECRET_ACCESS_KEY=your-secret")
print("\n4. IAM Role (if running on EC2/ECS/Lambda):")
print(" Credentials are automatically provided")
print("\n" + "="*70)
def _print_sso_token_expired_error(
profile: Optional[str],
region: str,
error_detail: str
) -> None:
"""Print helpful error message for SSO token expiration."""
print("\n" + "="*70)
print("❌ AWS SSO TOKEN EXPIRED")
print("="*70)
print("\nYour AWS SSO session has expired and needs to be refreshed.")
if profile:
print(f"\nTo refresh your SSO credentials for profile '{profile}', run:")
print(f" aws sso login --profile {profile}")
print("\nThen ensure your environment uses this profile:")
print(f" export AWS_PROFILE={profile}")
else:
print("\nTo refresh your SSO credentials, run:")
print(" aws sso login")
print("\nIf you use a specific profile, specify it:")
print(" aws sso login --profile your-profile-name")
print(" export AWS_PROFILE=your-profile-name")
print("\nCurrent environment:")
print(f" AWS_PROFILE: {os.environ.get('AWS_PROFILE', 'not set')}")
print(f" AWS_DEFAULT_PROFILE: {os.environ.get('AWS_DEFAULT_PROFILE', 'not set')}")
print(f" AWS_REGION: {region}")
print("\n" + "="*70)
print(f"Error details: {error_detail}")
print("="*70)
def _print_profile_not_found_error(profile: str) -> None:
"""Print helpful error message when profile is not found."""
print("\n" + "="*70)
print(f"❌ AWS PROFILE NOT FOUND: {profile}")
print("="*70)
print(f"\nThe AWS profile '{profile}' was not found in your AWS configuration.")
print("\nTo list available profiles:")
print(" aws configure list-profiles")
print(f"\nTo create this profile:")
print(f" aws configure --profile {profile}")
print("\nOr for SSO:")
print(" aws configure sso")
print("\n" + "="*70)
if __name__ == '__main__':
"""Test the session utility."""
print("Testing AWS Session Utility\n")
print("="*70)
try:
session, region, account_id = get_aws_session()
print("\n✅ Test successful!")
print(f"\nYou can now use this session to create AWS clients:")
print(f" s3_client = session.client('s3', region_name='{region}')")
print(f" dynamodb = session.resource('dynamodb', region_name='{region}')")
except SystemExit:
print("\n❌ Test failed - please fix the errors above and try again")
sys.exit(1)
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
"""Check runtime status"""
import boto3
import json
from dotenv import load_dotenv
import os
load_dotenv()
runtime_id = os.getenv('LAKEHOUSE_AGENT_RUNTIME_ID', 'lakehouse_agent-Hhb3lX6y7M')
region = os.getenv('AWS_REGION', 'us-east-1')
print(f"Checking runtime: {runtime_id}")
print(f"Region: {region}")
client = boto3.client('bedrock-agentcore', region_name=region)
try:
response = client.get_runtime(runtimeIdentifier=runtime_id)
print("\n✅ Runtime found:")
print(json.dumps(response, indent=2, default=str))
except Exception as e:
print(f"\n❌ Error: {e}")
print("\nTrying to list all runtimes...")
try:
response = client.list_runtimes()
print(json.dumps(response, indent=2, default=str))
except Exception as e2:
print(f"❌ Error listing runtimes: {e2}")
@@ -0,0 +1,211 @@
#!/usr/bin/env python3
"""
Diagnose JWT Authentication Issue
This script checks the agent runtime and Cognito configuration to identify
why JWT authentication is failing.
"""
import boto3
import json
import base64
def decode_jwt_payload(token):
"""Decode JWT token payload without verification"""
try:
parts = token.split('.')
if len(parts) != 3:
return None
payload = parts[1]
padding = 4 - len(payload) % 4
if padding != 4:
payload += '=' * padding
decoded = base64.urlsafe_b64decode(payload)
return json.loads(decoded)
except Exception:
return None
def main():
print("=" * 80)
print("JWT Authentication Diagnostics")
print("=" * 80)
session = boto3.Session()
region = session.region_name
ssm = boto3.client('ssm', region_name=region)
print(f"\n📍 Region: {region}")
# Check SSM parameters
print("\n🔍 Checking SSM Parameter Store...")
params_to_check = [
'/app/lakehouse-agent/agent-runtime-arn',
'/app/lakehouse-agent/cognito-user-pool-id',
'/app/lakehouse-agent/cognito-app-client-id',
'/app/lakehouse-agent/cognito-region'
]
params = {}
missing_params = []
for param_name in params_to_check:
try:
value = ssm.get_parameter(Name=param_name)['Parameter']['Value']
params[param_name] = value
print(f"{param_name}: {value}")
except ssm.exceptions.ParameterNotFound:
missing_params.append(param_name)
print(f"{param_name}: NOT FOUND")
if missing_params:
print(f"\n❌ Missing SSM parameters!")
print(f"\n💡 Solution:")
print(f" Run the setup scripts in order:")
print(f" 1. python gateway-setup/setup_cognito.py")
print(f" 2. python lakehouse-agent/deploy_lakehouse_agent.py")
return
# Get agent runtime configuration
print(f"\n🔍 Checking Agent Runtime Configuration...")
try:
client = boto3.client('bedrock-agentcore-control', region_name=region)
runtime_arn = params['/app/lakehouse-agent/agent-runtime-arn']
response = client.get_agent_runtime(agentRuntimeArn=runtime_arn)
runtime_config = response['agentRuntime']
if 'authorizerConfiguration' not in runtime_config:
print(f" ❌ No authorizer configuration found!")
print(f" ️ Agent is using IAM SigV4 authentication")
print(f"\n💡 Solution:")
print(f" Run: python lakehouse-agent/update_agent_authorizer.py")
return
auth_config = runtime_config['authorizerConfiguration']
if 'customJWTAuthorizer' not in auth_config:
print(f" ❌ No JWT authorizer configured!")
print(f"\n💡 Solution:")
print(f" Run: python lakehouse-agent/update_agent_authorizer.py")
return
jwt_config = auth_config['customJWTAuthorizer']
discovery_url = jwt_config.get('discoveryUrl', '')
allowed_clients = jwt_config.get('allowedClients', [])
print(f" ✅ JWT Authorizer configured")
print(f" Discovery URL: {discovery_url}")
print(f" Allowed Clients: {allowed_clients}")
# Extract issuer from discovery URL
configured_issuer = discovery_url.replace('/.well-known/openid-configuration', '')
# Build expected issuer from Cognito config
cognito_region = params['/app/lakehouse-agent/cognito-region']
cognito_pool_id = params['/app/lakehouse-agent/cognito-user-pool-id']
cognito_client_id = params['/app/lakehouse-agent/cognito-app-client-id']
expected_issuer = f"https://cognito-idp.{cognito_region}.amazonaws.com/{cognito_pool_id}"
print(f"\n🔍 Comparing Issuers...")
print(f" Configured issuer: {configured_issuer}")
print(f" Expected issuer: {expected_issuer}")
if configured_issuer != expected_issuer:
print(f" ❌ MISMATCH!")
print(f"\n💡 Solution:")
print(f" Run: python lakehouse-agent/update_agent_authorizer.py")
return
print(f" ✅ Issuers match!")
# Check client ID
print(f"\n🔍 Comparing Client IDs...")
print(f" Configured clients: {allowed_clients}")
print(f" Expected client: {cognito_client_id}")
if cognito_client_id not in allowed_clients:
print(f" ❌ Client ID not in allowed list!")
print(f"\n💡 Solution:")
print(f" Run: python lakehouse-agent/update_agent_authorizer.py")
return
print(f" ✅ Client ID matches!")
# Test authentication
print(f"\n🔍 Testing Authentication...")
try:
cognito = boto3.client('cognito-idp', region_name=cognito_region)
username = 'user001@example.com'
password = 'TempPass123!'
# Get client secret
client_secret = ssm.get_parameter(
Name='/app/lakehouse-agent/cognito-app-client-secret',
WithDecryption=True
)['Parameter']['Value']
# Calculate SECRET_HASH
import hmac
import hashlib
message = bytes(username + cognito_client_id, 'utf-8')
secret = bytes(client_secret, 'utf-8')
secret_hash = base64.b64encode(hmac.new(secret, message, digestmod=hashlib.sha256).digest()).decode()
response = cognito.admin_initiate_auth(
UserPoolId=cognito_pool_id,
ClientId=cognito_client_id,
AuthFlow='ADMIN_NO_SRP_AUTH',
AuthParameters={
'USERNAME': username,
'PASSWORD': password,
'SECRET_HASH': secret_hash
}
)
if 'AuthenticationResult' in response:
access_token = response['AuthenticationResult']['AccessToken']
print(f" ✅ Successfully authenticated as {username}")
# Decode and check token
claims = decode_jwt_payload(access_token)
if claims:
token_issuer = claims.get('iss')
token_client_id = claims.get('client_id')
print(f"\n📄 Token Claims:")
print(f" Issuer (iss): {token_issuer}")
print(f" Client ID: {token_client_id}")
print(f" Username: {claims.get('username')}")
print(f" Expires: {claims.get('exp')}")
if token_issuer == configured_issuer and token_client_id in allowed_clients:
print(f"\n✅ ALL CHECKS PASSED!")
print(f"\n Your configuration is correct.")
print(f" If you're still getting errors, check:")
print(f" 1. Token hasn't expired")
print(f" 2. Network connectivity to AWS")
print(f" 3. Agent runtime is in ACTIVE state")
else:
print(f"\n❌ Token claims don't match configuration!")
if token_issuer != configured_issuer:
print(f" Issuer mismatch!")
if token_client_id not in allowed_clients:
print(f" Client ID not allowed!")
else:
print(f" ❌ Authentication failed")
except Exception as e:
print(f" ❌ Error testing authentication: {e}")
except Exception as e:
print(f" ❌ Error getting agent runtime: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
main()
@@ -0,0 +1,159 @@
#!/usr/bin/env python3
"""
Find All AgentCore Runtimes
This script lists all AgentCore runtimes in your account and their CloudWatch logs.
"""
import boto3
from datetime import datetime
def main():
print("=" * 80)
print("Find All AgentCore Runtimes")
print("=" * 80)
session = boto3.Session()
region = session.region_name
print(f"\n📍 Region: {region}")
# List all agent runtimes
print(f"\n🔍 Searching for AgentCore Runtimes...")
try:
client = boto3.client('bedrock-agentcore-control', region_name=region)
response = client.list_agent_runtimes()
runtimes = response.get('agentRuntimeSummaries', [])
if not runtimes:
print(f" ❌ No AgentCore runtimes found in {region}")
print(f"\n💡 To deploy the lakehouse agent:")
print(f" python lakehouse-agent/deploy_lakehouse_agent.py")
return
print(f" ✅ Found {len(runtimes)} runtime(s):")
for i, runtime_summary in enumerate(runtimes, 1):
name = runtime_summary.get('agentRuntimeName', 'unknown')
arn = runtime_summary.get('agentRuntimeArn', 'unknown')
status = runtime_summary.get('status', 'unknown')
updated = runtime_summary.get('updatedAt', 'unknown')
print(f"\n {i}. {name}")
print(f" ARN: {arn}")
print(f" Status: {status}")
print(f" Updated: {updated}")
# Get detailed runtime info
try:
detail_response = client.get_agent_runtime(agentRuntimeArn=arn)
runtime = detail_response['agentRuntime']
# Check auth configuration
if 'authorizerConfiguration' in runtime:
auth_config = runtime['authorizerConfiguration']
if 'customJWTAuthorizer' in auth_config:
jwt_config = auth_config['customJWTAuthorizer']
print(f" Auth: JWT")
print(f" Discovery URL: {jwt_config.get('discoveryUrl')}")
print(f" Allowed Clients: {jwt_config.get('allowedClients')}")
else:
print(f" Auth: IAM SigV4")
else:
print(f" Auth: IAM SigV4 (default)")
# Extract runtime ID for log search
runtime_id = arn.split('/')[-1]
# Search for CloudWatch logs
print(f"\n 🔍 Searching for CloudWatch logs...")
logs = boto3.client('logs', region_name=region)
# Try different log group patterns
log_patterns = [
f"/aws/bedrock-agentcore/runtime/{runtime_id}",
f"/aws/bedrock-agentcore/runtime/{name}",
f"/aws/bedrock-agentcore/{runtime_id}",
f"/aws/agentcore/runtime/{runtime_id}",
]
found_logs = False
for pattern in log_patterns:
try:
log_response = logs.describe_log_groups(
logGroupNamePrefix=pattern,
limit=1
)
if log_response.get('logGroups'):
log_group = log_response['logGroups'][0]
log_group_name = log_group['logGroupName']
size_mb = log_group.get('storedBytes', 0) / (1024 * 1024)
print(f" ✅ Log Group: {log_group_name}")
print(f" Size: {size_mb:.2f} MB")
# Check for recent log streams
streams_response = logs.describe_log_streams(
logGroupName=log_group_name,
orderBy='LastEventTime',
descending=True,
limit=3
)
streams = streams_response.get('logStreams', [])
if streams:
print(f" Recent streams:")
for stream in streams:
stream_name = stream['logStreamName']
last_event = stream.get('lastEventTimestamp')
if last_event:
last_event_date = datetime.fromtimestamp(last_event / 1000).strftime('%Y-%m-%d %H:%M:%S')
print(f" - {stream_name} (last: {last_event_date})")
else:
print(f" ⚠️ No log streams (not invoked yet)")
found_logs = True
break
except Exception:
continue
if not found_logs:
print(f" ⚠️ No CloudWatch logs found")
print(f" Logs are created on first invocation")
print(f" Expected log group: /aws/bedrock-agentcore/runtime/{runtime_id}")
except Exception as e:
print(f" ⚠️ Error getting runtime details: {e}")
# Provide next steps
print(f"\n" + "=" * 80)
print(f"Next Steps")
print(f"=" * 80)
print(f"\n📋 To view logs for a runtime:")
print(f" 1. Note the runtime ARN from above")
print(f" 2. Go to CloudWatch Console > Log groups")
print(f" 3. Search for the log group name")
print(f" 4. Or use AWS CLI:")
print(f" aws logs tail '/aws/bedrock-agentcore/runtime/<runtime-id>' --follow")
print(f"\n📋 To invoke a runtime and generate logs:")
print(f" 1. Use the Streamlit UI: streamlit run streamlit-ui/streamlit_app.py")
print(f" 2. Or use boto3/requests to invoke the runtime")
print(f"\n📋 To store runtime ARN in SSM (for scripts to use):")
print(f" aws ssm put-parameter \\")
print(f" --name '/app/lakehouse-agent/agent-runtime-arn' \\")
print(f" --value '<runtime-arn>' \\")
print(f" --type String \\")
print(f" --overwrite")
except Exception as e:
print(f" ❌ Error listing runtimes: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
main()
@@ -0,0 +1,65 @@
#!/usr/bin/env python3
"""
Notebook initialization utility for AWS session setup.
This module provides a simple init_aws() function for Jupyter notebooks
that loads credentials from .env and creates a validated AWS session.
Usage in notebooks:
from utils.notebook_init import init_aws
session, region, account_id = init_aws()
"""
from .aws_session_utils import load_env_credentials, get_aws_session
from typing import Tuple
import boto3
def init_aws(
env_path: str = '.env',
profile_name: str = None,
region_name: str = None,
verbose: bool = True
) -> Tuple[boto3.Session, str, str]:
"""
Initialize AWS session for notebook use with automatic SSO fallback.
This function:
1. Loads credentials from .env file (if it exists)
2. Creates and validates AWS session
3. Automatically falls back to AWS SSO if .env credentials are invalid/expired
4. Returns session, region, and account_id
Credential priority order:
- Container IAM role (if running in Lambda/ECS/EKS)
- Environment variables from .env file
* If invalid/expired, automatically clears them and falls back to SSO
- AWS SSO profile (from AWS_PROFILE or AWS_DEFAULT_PROFILE)
- Default AWS credentials
Args:
env_path: Path to .env file. Default is '.env' in current directory.
profile_name: Optional AWS profile name to use.
region_name: Optional AWS region to use.
verbose: If True, print status messages. Default True.
Returns:
Tuple of (boto3.Session, region_name: str, account_id: str)
Example:
>>> from utils.notebook_init import init_aws
>>> session, region, account_id = init_aws()
>>> s3_client = session.client('s3', region_name=region)
"""
# Try to load credentials from .env file
load_env_credentials(env_path=env_path, verbose=verbose)
# Create and validate AWS session
session, region, account_id = get_aws_session(
profile_name=profile_name,
region_name=region_name,
verbose=verbose
)
return session, region, account_id
+4 -1
View File
@@ -80,5 +80,8 @@
- vargas-dann-0896
- razkenari
- Kostas Tzouvanas
- Sunita Koppar (skoppar)
- Gi Kim (giryoong)
- richatt
- Hideki Tane
- richatt
- Hideki Tane