| Folder/File | Description |
|---|---|
| config/dev.yaml | This is a directory where the Profile of environment are stored. dev.yaml is the configuration file for a dev instance of AWS Account |
| cdk.json | This file tells the CDK Toolkit how to execute your app |
| src | Holds the Glue ETL script for the denormalisation logic for Neherlab project |
| tests | Holds the Glue test scripts |
| cdkglueblog | This directory contains the CDK code for creating the whole Glue Stack for Covid19 Application as per the config |
| pipelineStack | This directory contains the CDK code for creating the CodePipeline and which in turn deploys the Glue Stack per the config |
| app.py | This is the entry point for the CDK application and intiates the CDK stacks mentioned in this file |
| requirment.txt | Contains the list of the all the dependencies that need to be installed CDK application |
| setup.py | Defines how this Python package would be constructed and its dependencies |
At the time of publishing of this project, AWS CDK has two versions of Glue modules available. The modules are @aws-cdk/aws-glue module and @aws-cdk/aws-glue-alpha. The alpha module is a higher-level construct and provides multiple advantages. However, at this time the module is still in experimental stage. Hence, we have used the stable @aws-cdk/aws-glue module.
| ETL File | Job Description |
|---|---|
| src/j_emit_start_event.py | This is a python jobs that starts the workflow and creates the event |
| src/j_neherlab_denorm.py | Spark ETL for transform and create a denormalised view by stiching all the base data together in a parquet format |
| src/j_emit_ended_event.py | This is a python jobs that ends the workflow and creates the specific event |
| Table Name | Description | Dataset Location | Access | Location |
|---|---|---|---|---|
| neherlab_case_counts | Total number of cases | s3://covid19-harmonized-dataset/covid19tos3/neherlab_case_counts/ | Read | Public |
| neherlab_country_codes | Country code | s3://covid19-harmonized-dataset/covid19tos3/neherlab_country_codes/ | Read | Public |
| neherlab_icu_capacity | Intensive Care Unit (ICU) capacity | s3://covid19-harmonized-dataset/covid19tos3/neherlab_icu_capacity/ | Read | Public |
| neherlab_population | Population | s3://covid19-harmonized-dataset/covid19tos3/neherlab_population/ | Read | Public |
| neherla_denormalized | Denormalized table | s3://your-S3-bucket-name/neherla_denormalized | Read/Write | Readers AWS Account |
Below images shows the artifacts created by cdk deploy step.
The CdkGlueStage stage of the CodePipeline will deploy a Glue Stack cdk-covid19-glue-stack.
Below jobs will be created in the Glue Jobs console

Please refer to Developing and testing AWS Glue job scripts locally for additional infromation.







