Build a cost-efficient, serverless data lake on AWS using S3, Glue, Athena, and CloudFormation, orchestrated with Kiro as an agentic IDE.
This project shows how to design, deploy, and operate a production-inspired data lake
— without clusters, without idle costs, and with a clean teardown.
- Project Overview
- Architecture
- Kiro Features in Use
- Deployment Steps
- Cost Awareness
- Teardown
- Sample Data
- References
- Avoid overengineering AWS data lakes
- Prevent surprise cloud costs
- Use serverless, batch-oriented architecture
- Enable reproducible infrastructure with IaC
- Amazon S3 (raw + curated layers)
- AWS Glue (crawlers + jobs)
- Amazon Athena (SQL query engine)
- AWS CloudFormation (infrastructure as code)
- Kiro (agentic IDE: Specs, Steering, Hooks)
- Fully serverless (no EMR, no always-on compute)
- Raw / curated separation
- Parquet + partitioning for efficient queries
- CloudFormation for reproducibility and teardown
| Feature | How it helps |
|---|---|
| Specs | Define a minimal, dev-focused data lake |
| Steering | Guide decisions toward serverless and cost efficiency |
| Hooks | Automate Glue crawlers and jobs |
| Agentic Chat | Conversational setup for infra and pipelines |
Kiro helps you think like a data engineer before you deploy.
git clone https://github.com/YOUR_USERNAME/serverless-aws-data-lake-with-kiro.git cd serverless-aws-data-lake-with-kiro
aws configure export AWS_REGION=us-east-2 export AWS_DEFAULT_REGION=us-east-2
aws cloudformation deploy
--template-file cloudformation/data-lake-infrastructure.yaml
--stack-name latam-data-lake-dev
--capabilities CAPABILITY_NAMED_IAM
--tags env=dev team=data project=latam-data-lake
aws s3 cp data/raw/sample_raw.csv
s3://latam-data-lake-raw-dev-<ACCOUNT_ID>/orders/year=2024/month=01/
Trigger manually from the AWS Console
Or let Kiro Hooks automate execution
Use Athena to query curated Parquet data via the Glue Data Catalog.
Where costs come from:
-
S3: storage and requests (low, predictable)
-
Glue: pay per second while jobs run
-
Athena: pay per data scanned
Cost controls applied:
-
Parquet + partitioning
-
Serverless-only services
-
Automated governance checks
-
Easy teardown
To stop all costs: aws cloudformation delete-stack --stack-name latam-data-lake-dev This removes Glue, IAM roles, and automation.
Optionally delete S3 buckets if no longer needed.
Sample CSV files are provided in: data/raw/sample_raw.csv Used to test crawlers, jobs, and Athena queries.
Kiro Documentation: https://kiro.dev/docs
AWS Glue: https://aws.amazon.com/glue/
Amazon Athena: https://aws.amazon.com/athena/
AWS CloudFormation: https://aws.amazon.com/cloudformation/
⭐ If you find this repository useful, consider giving it a star. 📩 Contact: rociomnbaigorria@gmail.com