AWS Deployment¶
Production deployment using AWS ECS Fargate, RDS, and Qdrant on EC2
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ CloudFront │
│ (CDN + HTTPS) │
└──────────────────┬──────────────────────────────────────────┘
│
┌─────────────┴────────────┐
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ S3 Bucket │ │ API GW │
│ (Frontend) │ │ HTTP API │
└─────────────┘ └──────┬──────┘
│
▼
┌─────────────┐
│ ECS Fargate │
│ (Backend) │
└──────┬──────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌──────────┐ ┌──────────────┐
│ RDS Postgres│ │ Qdrant │ │ DynamoDB │
│ (pgvector) │ │ (EC2 t4g)│ │ (Cache) │
└─────────────┘ └──────────┘ └──────────────┘
Prerequisites¶
1. AWS Account Setup¶
- AWS account with billing enabled
- AWS CLI installed and configured:
2. Install AWS CDK¶
3. Required IAM Permissions¶
Your AWS user/role needs permissions for:
- ECS, EC2, VPC, ELB
- RDS, S3, CloudFront, DynamoDB
- IAM, CloudFormation, Systems Manager
Or use AdministratorAccess policy (not recommended for production).
4. LLM API Key¶
You need an API key for one of:
- OpenAI API (
sk-...) - Azure OpenAI (endpoint + key + deployment name)
- AWS Bedrock (in supported region)
Deployment Steps¶
1. Clone Repository¶
2. Install Dependencies¶
3. Configure CDK Context¶
Edit cdk.json or set environment variables:
{
"context": {
"stack_name": "greengovrag-prod",
"environment": "production",
"vpc_cidr": "10.0.0.0/16",
"database_name": "greengovrag",
"use_spot_instances": true,
"enable_cdn": true
}
}
4. Set Secrets in SSM Parameter Store¶
# OpenAI API Key
aws ssm put-parameter \
--name "/greengovrag/prod/openai-api-key" \
--value "sk-your-key-here" \
--type "SecureString" \
--description "OpenAI API key for LLM"
# Database Password (auto-generated, or set custom)
aws ssm put-parameter \
--name "/greengovrag/prod/db-password" \
--value "$(openssl rand -base64 32)" \
--type "SecureString"
# Optional: Azure OpenAI
aws ssm put-parameter \
--name "/greengovrag/prod/azure-openai-key" \
--value "your-azure-key" \
--type "SecureString"
5. Bootstrap CDK (First Time Only)¶
6. Review Stack¶
7. Deploy Stack¶
Deployment takes ~15-20 minutes. You'll see:
- VPC and subnets creation
- RDS PostgreSQL instance
- ECS cluster and service
- EC2 instance for Qdrant
- S3 bucket and CloudFront distribution
- API Gateway
8. Get Outputs¶
# View stack outputs
aws cloudformation describe-stacks \
--stack-name greengovrag-prod \
--query 'Stacks[0].Outputs'
Important outputs:
ApiUrl: Backend API endpointFrontendUrl: CloudFront distribution URLQdrantUrl: Qdrant service endpoint (internal VPC only)DatabaseEndpoint: RDS endpoint
9. Verify Deployment¶
# Health check
API_URL=$(aws cloudformation describe-stacks \
--stack-name greengovrag-prod \
--query 'Stacks[0].Outputs[?OutputKey==`ApiUrl`].OutputValue' \
--output text)
curl $API_URL/api/health
10. Run ETL Pipeline¶
ETL runs automatically via GitHub Actions (scheduled daily at 2 AM UTC).
Manual trigger:
# Via GitHub Actions UI: Actions → ETL Scheduled → Run workflow
# Or via GitHub CLI
gh workflow run etl-scheduled.yml
Configuration¶
Environment Variables¶
Set in deploy/aws/lib/backend-stack.ts:
environment: {
LLM_PROVIDER: 'openai',
LLM_MODEL: 'gpt-5-mini',
VECTOR_STORE_TYPE: 'qdrant',
QDRANT_URL: qdrant.instancePrivateIp,
DATABASE_URL: rds.instanceEndpoint.socketAddress,
CLOUD_PROVIDER: 'aws',
LOG_LEVEL: 'INFO',
}
Secrets (SSM Parameter Store)¶
Access in backend via:
import boto3
ssm = boto3.client('ssm', region_name='us-east-1')
api_key = ssm.get_parameter(
Name='/greengovrag/prod/openai-api-key',
WithDecryption=True
)['Parameter']['Value']
CDK automatically injects as environment variables.
Scaling Configuration¶
ECS Fargate Auto-Scaling:
Edit deploy/aws/lib/backend-stack.ts:
const scaling = service.autoScaleTaskCount({
minCapacity: 1,
maxCapacity: 10
});
scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60)
});
scaling.scaleOnMemoryUtilization('MemoryScaling', {
targetUtilizationPercent: 80
});
Redeploy: cdk deploy
Database Scaling¶
Upgrade RDS instance:
Edit deploy/aws/lib/database-stack.ts:
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T4G,
ec2.InstanceSize.SMALL // Change from MICRO
)
Qdrant Scaling¶
Upgrade EC2 instance:
Edit deploy/aws/lib/qdrant-stack.ts:
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T4G,
ec2.InstanceSize.MEDIUM // Change from MICRO
)
CI/CD Pipeline¶
GitHub Actions Setup¶
1. Configure Secrets¶
In GitHub repository settings (Settings → Secrets and variables → Actions):
2. Workflow Files¶
.github/workflows/deploy-aws.yml (Auto-deploy on push to main):
name: Deploy to AWS
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Deploy CDK stack
run: |
cd deploy/aws
npm install
cdk deploy --require-approval never
.github/workflows/etl-scheduled.yml (Daily ETL run):
name: ETL Scheduled Pipeline
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM UTC
jobs:
etl:
runs-on: ubuntu-latest
steps:
- name: Trigger ECS task
run: |
aws ecs run-task \
--cluster greengovrag-cluster \
--task-definition greengovrag-etl \
--launch-type FARGATE
Monitoring¶
CloudWatch Logs¶
# View backend logs
aws logs tail /ecs/greengovrag-backend --follow
# View ETL logs
aws logs tail /ecs/greengovrag-etl --follow
# Search for errors
aws logs filter-pattern /ecs/greengovrag-backend --pattern "ERROR"
CloudWatch Metrics¶
View in AWS Console: CloudWatch → Dashboards → greengovrag-dashboard
Key metrics:
- ECS CPU/Memory utilization
- RDS CPU/Connections
- API Gateway 4XX/5XX errors
- DynamoDB read/write capacity
CloudWatch Alarms¶
Auto-created alarms:
HighCPUUtilization: ECS CPU > 80% for 5 minutesHighMemoryUtilization: ECS Memory > 90% for 5 minutesDatabaseHighConnections: RDS connections > 80APIHighErrorRate: API 5XX errors > 10/minute
View alarms:
X-Ray Tracing¶
Enable X-Ray in backend-stack.ts:
import * as xray from 'aws-cdk-lib/aws-xray';
// Add X-Ray daemon as sidecar
taskDefinition.addContainer('xray-daemon', {
image: ecs.ContainerImage.fromRegistry('amazon/aws-xray-daemon'),
// ... config
});
Backup and Recovery¶
RDS Automated Backups¶
Configured in database-stack.ts:
new rds.DatabaseInstance(this, 'Database', {
backupRetention: cdk.Duration.days(7),
deleteAutomatedBackups: false,
preferredBackupWindow: '02:00-03:00', // 2-3 AM UTC
});
Manual snapshot:
aws rds create-db-snapshot \
--db-instance-identifier greengovrag-db \
--db-snapshot-identifier greengovrag-manual-$(date +%Y%m%d)
Restore from snapshot:
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier greengovrag-db-restored \
--db-snapshot-identifier greengovrag-manual-20251115
Qdrant Backups¶
Snapshot creation (runs weekly via EventBridge):
# Manual snapshot
ssh ec2-user@qdrant-instance
curl -X POST 'http://localhost:6333/collections/greengovrag/snapshots'
# Download snapshot
scp ec2-user@qdrant-instance:/var/lib/qdrant/snapshots/*.snapshot ./
Restore:
# Upload snapshot
scp snapshot.tar ec2-user@qdrant-instance:/tmp/
# Restore
curl -X PUT 'http://localhost:6333/collections/greengovrag/snapshots/upload' \
--data-binary @/tmp/snapshot.tar
DynamoDB Backups¶
Enable point-in-time recovery:
Troubleshooting¶
Issue: CDK Deploy Fails¶
Error: Stack greengovrag-prod failed: CREATE_FAILED
Solution: Check CloudFormation events:
Common causes:
- Insufficient IAM permissions
- Parameter Store secrets missing
- Resource limits exceeded (VPC, EIP, etc.)
Issue: ECS Task Fails to Start¶
Error: Task stopped: Essential container exited
Solution: Check ECS task logs:
# Get task ARN
aws ecs list-tasks --cluster greengovrag-cluster
# View stopped task reason
aws ecs describe-tasks \
--cluster greengovrag-cluster \
--tasks <task-arn>
Issue: RDS Connection Timeout¶
Error: could not connect to server: Connection timed out
Solution: Check security group rules:
# Ensure backend SG can access RDS SG on port 5432
aws ec2 describe-security-groups \
--filters "Name=tag:Name,Values=greengovrag-*"
Issue: High Costs¶
Solution: Optimize resources:
- Use Spot instances for Qdrant (already default)
- Reduce RDS instance size (t4g.micro → t4g.nano if low traffic)
- Enable S3 Intelligent-Tiering
- Set DynamoDB to on-demand pricing
- Enable CloudFront compression
Updating Deployment¶
Update Backend Code¶
Push to main branch → GitHub Actions auto-deploys:
Update Infrastructure¶
Edit CDK code, then:
Update Environment Variables¶
# Update SSM parameter
aws ssm put-parameter \
--name "/greengovrag/prod/openai-api-key" \
--value "sk-new-key" \
--type "SecureString" \
--overwrite
# Restart ECS service to pick up new value
aws ecs update-service \
--cluster greengovrag-cluster \
--service greengovrag-service \
--force-new-deployment
Teardown¶
Delete Stack¶
# Delete all resources
cdk destroy
# Confirm deletion
# This will delete: ECS, RDS, EC2, S3, CloudFront, API Gateway, VPC
Note: Some resources may have deletion protection:
- RDS instance (if
deletionProtection: true) - S3 bucket (if not empty)
Force delete:
# Empty S3 bucket
aws s3 rm s3://greengovrag-frontend-bucket --recursive
# Disable RDS deletion protection
aws rds modify-db-instance \
--db-instance-identifier greengovrag-db \
--no-deletion-protection
# Then retry cdk destroy
cdk destroy
Next Steps¶
- Production Checklist - Pre-launch verification
- Monitoring Guide - Detailed monitoring setup
- Cloud Comparison - AWS vs Azure comparison
Last Updated: 2025-11-22