Fix Bug in Workflows#18
Conversation
❌ 3 blocking issues (3 total)
|
| concurrency: | ||
| group: dev-build-deploy | ||
| cancel-in-progress: false | ||
| queue: single |
There was a problem hiding this comment.
Per the official GitHub documentation for concurrency, queue is an acceptable key:
To allow more than one pending job or workflow run to wait in the same concurrency group, use the optional queue property. The queue property accepts the following values:
- single (default): At most one job or workflow run can be pending in the concurrency group. When a new job or workflow run is queued, any existing pending job or workflow run in the same group is canceled and replaced.
- max: Up to 100 jobs or workflow runs can be pending in the concurrency group. When the queue is full, any additional jobs or workflow runs are canceled.
| concurrency: | ||
| group: prod-deploy | ||
| cancel-in-progress: false | ||
| queue: single |
There was a problem hiding this comment.
Per the official GitHub documentation for concurrency, queue is an acceptable key:
To allow more than one pending job or workflow run to wait in the same concurrency group, use the optional queue property. The queue property accepts the following values:
- single (default): At most one job or workflow run can be pending in the concurrency group. When a new job or workflow run is queued, any existing pending job or workflow run in the same group is canceled and replaced.
- max: Up to 100 jobs or workflow runs can be pending in the concurrency group. When the queue is full, any additional jobs or workflow runs are canceled.
| concurrency: | ||
| group: stage-build-deploy | ||
| cancel-in-progress: false | ||
| queue: single |
There was a problem hiding this comment.
Per the official GitHub documentation for concurrency, queue is an acceptable key:
To allow more than one pending job or workflow run to wait in the same concurrency group, use the optional queue property. The queue property accepts the following values:
- single (default): At most one job or workflow run can be pending in the concurrency group. When a new job or workflow run is queued, any existing pending job or workflow run in the same group is canceled and replaced.
- max: Up to 100 jobs or workflow runs can be pending in the concurrency group. When the queue is full, any additional jobs or workflow runs are canceled.
Why these changes are being introduced: Recently, the stage-build workflow has had more than one failure. After some investigation, the failure seems to be related to a quick sequence of commits to the main branch from dependabot updates. By default, `on.*` triggered workflows will run in parallel, but this is not the desired behavior for our deployment workflows. Instead, in a situation where there is more than one PR merge to `main`` at the same or close-to-same time, we want to ensure that the workflows run sequentially, not in parallel. See https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#concurrency for more details on how we are attempting to solve this. How this addresses that need: * Use the concurrency option in the workflows to force the actions to run sequentially if they are triggered too close together Side effects of this change: None. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-633
Why these changes are being introduced: Turns out that it was more than just workflow concurrency! There are latencies within AWS related to deploying, publishing, and assigning aliases for Lambda functions. For each of these actions, the command completes quickly, but there are behind-the-scenes actions by AWS that take some time to complete. If we do not wait for the backend processes to finish, we run the risk of some of the Lambda-related deployment commands failing. So, in addition to ensuring that the workflows do not run concurrently, we also add in some additional polling to avoid failed commands. How this addresses that need: * Add a while loop poller to the Update Lambda Alias step in the dev workflow to prevent the step from completing until the alias has stabilized on the single new published version Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-633
ccf27e4 to
1c89b1b
Compare
Why these changes are being introduced: All the changes have been tested in the dev workflow and passed the tests, so we can extend the same changes to the stage and prod workflows. How this addresses that need: * Implement the same changes that were made in the dev workflow in the stage and prod workflows. Side effects of this change: None. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-633
Purpose and background context
There are two problems that need solutions:
Concurrency
By default,
on.*triggered workflows will run in parallel, but this is not the desired behavior for our deployment workflows. Instead, in a situation where there is more than one PR merge to `main`` at the same or close-to-same time, we want to ensure that the workflows run sequentially, not in parallel. See GitHub Workflow Syntax: concurrency for more details on how we are attempting to solve this.For consistency, the
concurrencyblock was added to all three workflows.AWS Lambda Latencies
The workflows already had a mechanism to wait for Lambda deployment to complete and Lambda publishing to complete (using the
aws lambda waitcommand. However, this wasn't enough. There is also a latency related to assigning an alias to a new published version that wasn't handled and there is noaws lambda waitcommand related to alias management.So, we add some
bashcode to the workflow to run a while loop based poller to monitor the output of theaws lambda get-aliascommand. There is aRoutingConfigkey in the JSON output from that command that only exists while the alias is moving to the new published version. Once the new published version is the only version linked to the alias, theRoutingConfigkey disappears from the output. At that point, the while loop exits cleanly and the workflow continues.How can a reviewer manually see the effects of these changes?
The reviewer can quickly launch the same workflow twice via

workflow_dispatchin the UI and see that one job gets queued while the other is still running. I did this for the Dev Build and Deploy workflow and saw this in the UI:Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review