This Terraform project deploys a complete AWS RDS MySQL database with multi-AZ failover capabilities, including automated health monitoring and alerting system.
- VPC Module: Creates VPC with public/private subnets across multiple AZs
- RDS Module: Multi-AZ MySQL database with enhanced monitoring, security groups, and parameter groups
- EC2 Module: Optional bastion host for database connectivity testing
- Lambda Function: Python-based health checker that runs every 5 minutes
- SNS Topic: Email notifications for database failover alerts
- CloudWatch: Monitoring, alarms, and EventBridge scheduling
- Multi-AZ RDS MySQL with automatic failover
- Enhanced monitoring with CloudWatch metrics and alarms
- Automated health checks via Lambda function every 5 minutes
- Email alerts for database connectivity issues
- Security best practices with VPC, security groups, and encryption
- Modular design for reusability across environments
- Multi-environment support (dev/staging/prod)
- Terraform 1.6+ compatibility with proper backend configuration
- AWS CLI configured with appropriate permissions
- Terraform >= 1.6.0
- Python 3.11 (for Lambda function)
- Email address for receiving alerts
Your AWS credentials need the following permissions:
- VPC, Subnet, Route Table, Internet Gateway, NAT Gateway management
- RDS instance, subnet group, parameter group, option group management
- EC2 instance, security group, AMI management
- Lambda function, IAM role, CloudWatch Events management
- SNS topic and subscription management
- CloudWatch alarms and logs management
git clone https://github.com/Copubah/aws-rds-multi-az-terraform.git
cd aws-rds-multi-az-terraformEdit terraform.tf and uncomment one of the backend configurations:
terraform {
backend "s3" {
bucket = "your-terraform-state-bucket"
key = "rds-failover/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-locks"
encrypt = true
}
}Edit the appropriate .tfvars file:
For Development:
cp environments/dev.tfvars.example environments/dev.tfvars
# Edit environments/dev.tfvars with your valuesFor Production:
cp environments/prod.tfvars.example environments/prod.tfvars
# Edit environments/prod.tfvars with your values# Initialize Terraform
terraform init
# Plan deployment (development)
terraform plan -var-file="environments/dev.tfvars"
# Apply deployment
terraform apply -var-file="environments/dev.tfvars"After deployment, you'll receive outputs including:
- RDS endpoint
- EC2 public IP (if created)
- SNS topic ARN
| Variable | Description | Default | Required |
|---|---|---|---|
environment |
Environment name (dev/staging/prod) | dev |
Yes |
aws_region |
AWS region for deployment | us-east-1 |
No |
alert_email |
Email for RDS alerts | - | Yes |
db_password |
RDS master password | - | Yes |
db_multi_az |
Enable Multi-AZ deployment | true |
No |
create_ec2_instance |
Create EC2 bastion host | true |
No |
The project implements security best practices:
- Network Security: Private subnets for RDS, security groups with minimal access
- Encryption: RDS encryption at rest, EBS encryption for EC2
- Access Control: IAM roles with least privilege principles
- Monitoring: CloudWatch alarms for CPU, connections, and custom metrics
- Creates VPC with DNS support
- Public and private subnets across multiple AZs
- Internet Gateway and NAT Gateways
- Route tables and associations
- Multi-AZ MySQL RDS instance
- DB subnet group and security groups
- Parameter and option groups
- Enhanced monitoring and CloudWatch alarms
- Lambda health check function
- SNS topic for alerts
- Optional bastion host in public subnet
- Security group allowing RDS access
- IAM role for CloudWatch and SSM
- User data script with MySQL client
- Runs every 5 minutes via EventBridge
- Tests database connectivity
- Checks RDS instance status
- Sends SNS alerts on failures
- CPU utilization > 80%
- Database connections > 50
- Custom metrics from Lambda
Email alerts are sent for:
- Database connectivity failures
- High CPU utilization
- High connection count
- Lambda function errors
# SSH to EC2 instance
ssh -i your-key.pem ec2-user@<ec2-public-ip>
# Test RDS connectivity
./test-rds.sh <rds-endpoint> admin <password>The Lambda function automatically tests connectivity every 5 minutes. Check CloudWatch Logs:
aws logs tail /aws/lambda/<environment>-rds-health-check --followterraform workspace new dev
terraform apply -var-file="environments/dev.tfvars"terraform workspace new prod
terraform apply -var-file="environments/prod.tfvars"-
Lambda Function Timeout
- Check VPC configuration and NAT Gateway
- Verify security group rules
-
RDS Connection Failures
- Verify security group allows port 3306
- Check subnet group configuration
- Validate credentials
-
SNS Email Not Received
- Check email subscription confirmation
- Verify SNS topic permissions
# Check RDS instance status
aws rds describe-db-instances --db-instance-identifier <environment>-mysql-db
# Test Lambda function manually
aws lambda invoke --function-name <environment>-rds-health-check output.json
# Check CloudWatch logs
aws logs describe-log-groups --log-group-name-prefix "/aws/lambda"- Single AZ deployment (
db_multi_az = false) - Smaller instance types (
db.t3.micro) - Reduced backup retention (1 day)
- No deletion protection
- Multi-AZ deployment for high availability
- Appropriate instance sizing
- Extended backup retention (30 days)
- Deletion protection enabled
- Use AWS Secrets Manager for database passwords in production
- Enable VPC Flow Logs for network monitoring
- Implement least privilege IAM policies
- Enable AWS Config for compliance monitoring
- Use AWS Systems Manager for secure EC2 access instead of SSH keys
To destroy the infrastructure:
# Disable deletion protection first (if enabled)
terraform apply -var="db_deletion_protection=false" -var-file="environments/dev.tfvars"
# Destroy infrastructure
terraform destroy -var-file="environments/dev.tfvars"- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review CloudWatch logs
- Open an issue in the repository