Serverless AWS Data Lake with Kiro

Build a cost-efficient, serverless data lake on AWS using S3, Glue, Athena, and CloudFormation, orchestrated with Kiro as an agentic IDE.

This project shows how to design, deploy, and operate a production-inspired data lake
— without clusters, without idle costs, and with a clean teardown.

Project Overview

Problem this project solves

Avoid overengineering AWS data lakes
Prevent surprise cloud costs
Use serverless, batch-oriented architecture
Enable reproducible infrastructure with IaC

Tech stack

Amazon S3 (raw + curated layers)
AWS Glue (crawlers + jobs)
Amazon Athena (SQL query engine)
AWS CloudFormation (infrastructure as code)
Kiro (agentic IDE: Specs, Steering, Hooks)

Architecture

Key characteristics

Fully serverless (no EMR, no always-on compute)
Raw / curated separation
Parquet + partitioning for efficient queries
CloudFormation for reproducibility and teardown

Kiro Features in Use

Feature	How it helps
Specs	Define a minimal, dev-focused data lake
Steering	Guide decisions toward serverless and cost efficiency
Hooks	Automate Glue crawlers and jobs
Agentic Chat	Conversational setup for infra and pipelines

Kiro helps you think like a data engineer before you deploy.

Deployment Steps

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/serverless-aws-data-lake-with-kiro.git cd serverless-aws-data-lake-with-kiro

2. Configure AWS credentials and region

aws configure export AWS_REGION=us-east-2 export AWS_DEFAULT_REGION=us-east-2

3. Deploy infrastructure

aws cloudformation deploy
--template-file cloudformation/data-lake-infrastructure.yaml
--stack-name latam-data-lake-dev
--capabilities CAPABILITY_NAMED_IAM
--tags env=dev team=data project=latam-data-lake

4. Upload sample raw data to S3

aws s3 cp data/raw/sample_raw.csv
s3://latam-data-lake-raw-dev-<ACCOUNT_ID>/orders/year=2024/month=01/

5. Run Glue crawlers and jobs

Trigger manually from the AWS Console

Or let Kiro Hooks automate execution

6. Query data with Athena

Use Athena to query curated Parquet data via the Glue Data Catalog.

Cost Awareness

Where costs come from:

S3: storage and requests (low, predictable)
Glue: pay per second while jobs run
Athena: pay per data scanned

Cost controls applied:

Parquet + partitioning
Serverless-only services
Automated governance checks
Easy teardown

Teardown

To stop all costs: aws cloudformation delete-stack --stack-name latam-data-lake-dev This removes Glue, IAM roles, and automation.

Optionally delete S3 buckets if no longer needed.

Sample Data

Sample CSV files are provided in: data/raw/sample_raw.csv Used to test crawlers, jobs, and Athena queries.

References

Kiro Documentation: https://kiro.dev/docs

AWS Glue: https://aws.amazon.com/glue/

Amazon Athena: https://aws.amazon.com/athena/

AWS CloudFormation: https://aws.amazon.com/cloudformation/

⭐ If you find this repository useful, consider giving it a star. 📩 Contact: rociomnbaigorria@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
athena		athena
cloudformation		cloudformation
data		data
glue		glue
infra		infra
lambdas		lambdas
scripts		scripts
teardown		teardown
triggers		triggers
.gitignore		.gitignore
> Here are the key Well-Architected Fram.md		> Here are the key Well-Architected Fram.md
LICENSE		LICENSE
README.md		README.md
data-lake-infrastructure.yaml		data-lake-infrastructure.yaml
sales.csv		sales.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless AWS Data Lake with Kiro

Table of Contents

Project Overview

Problem this project solves

Tech stack

Architecture

Key characteristics

Kiro Features in Use

Deployment Steps

1. Clone the repository

2. Configure AWS credentials and region

3. Deploy infrastructure

4. Upload sample raw data to S3

5. Run Glue crawlers and jobs

6. Query data with Athena

Cost Awareness

Teardown

Sample Data

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Serverless AWS Data Lake with Kiro

Table of Contents

Project Overview

Problem this project solves

Tech stack

Architecture

Key characteristics

Kiro Features in Use

Deployment Steps

1. Clone the repository

2. Configure AWS credentials and region

3. Deploy infrastructure

4. Upload sample raw data to S3

5. Run Glue crawlers and jobs

6. Query data with Athena

Cost Awareness

Teardown

Sample Data

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages