Save Money with AWS ECR Lifecycle Policies
You can save money on your monthly AWS ECR bill by adding a Lifecycle Policy to your repositories.
A policy is made up of rules
which tell AWS which images in your repository are safe to delete, referred to as expiring an image in the AWS documentation.
AWS only charges you for images you are currently storing in ECR.
So, letting AWS automatically delete images you do not need saves money on your monthly bill.
At Good Dog we have been using ECR for over five years without any policies which was costing us over $400 a month in storage costs.
After adding a policy to delete non-production images after 30 days we dropped our bill from $400 to around $15 a month.
While the cost savings from Lifecycle Policies are straight forward, unfortunately writing policies is not.
Parts of a Lifecycle Policy
Each ECR can have a single Lifecycle Policy which is a JSON object with a single rules
property.
These rules
tell AWS when an image is safe to be deleted.
As the name rules
suggest, a policy may have many different rules which target different sets of images.
For example, you may choose to delete all images tagged with beta
every 30 days whereas you want to keep your three newest prod
builds in case you need to rollback.
Watever your case may be, I recommend reading the AWS documentation to learn how to write rules.
But after you are familiar with the syntax of a single rule, there are a few common pitfalls to watch out for when starting to write multi-rule policies.
How are Multi-Rule Lifecycle Policies Applied?
When a Lifecycle Policy rule is applied AWS will either (1) skip the image, (2) marks the image as KEEP
, or (3) marks the image as EXPIRED
.
When an image is marked as EXPIRED
AWS will immediately delete that image.
To know which action to perform, that is to skip, KEEP
, or EXPIRE
an image, AWS performs three checks per image per rule in the following order:
- Has this image already been marked as
KEEP
orEXPIRED
by a higher priority rule? If yes, then skip this image. - Does this rule apply to this image (e.g. tagged with
prod
,beta
, etc.)? If no, then skip this image. - Does this image meet the criteria to be
EXPIRED
? If yes, mark the image asEXPIRED
, otherwise, mark the image asKEEP
.
Common Pitfalls of Multi-Rule Lifecycle Policies
If an image is marked as EXPIRED
or KEEP
by a higher priority rule then all lower priority rules are skipped for that image.
A common pitfall is having a high priority rule that applies to "any"
image which skips over every lower priority rule because each image with be marked as EXPIRED
or KEEP
as a result of the "any"
rule.
For this reason you may only ever have one "any"
or "untagged"
rule in your policy and they must be your lowest priority rules.
Another common pitfall is deleting your production images.
Lifting from the the AWS ECR Lifecycle Policy examples, say you have images
| Label | Pushed at | Image Tags | |
| | | | |
|:-----:|:-----------:|:----------:|:------:|
| C | 8 days ago | beta-3 | |
| B | 9 days ago | beta-2 | prod-2 |
| A | 10 days ago | beta-1 | prod-1 |
where image A
is your oldest image and image C
is your newest image.
If we wrote a policy where we intended to only keep the newest beta
and the newest prod
images like
{
"rules": [
{
"rulePriority": 1,
"description": "Only keep the newest beta image",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["beta"],
"countType": "imageCountMoreThan",
"countNumber": 1
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 2,
"description": "Only keep the newest prod image",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["prod"],
"countType": "imageCountMoreThan",
"countNumber": 1
},
"action": {
"type": "expire"
}
}
]
}
we would instead accidentally be deleting all our prod
images.
| Label | Pushed at | Image Tags | | Rules | | Expired? |
| | | | | Rule 1 | Rule 2 | |
|:-----:|:-----------:|:----------:|:------:|:---------:|:---------:|:--------:|
| C | 8 days ago | beta-3 | | Keep | - | Keep |
| B | 9 days ago | beta-2 | prod-2 | Expire | - | Expire |
| A | 10 days ago | beta-1 | prod-1 | Expire | - | Expire |
This is because the "Only keep the newest beta image"
is applied first marks image C
as KEEP
and both B
and A
as EXPIRED
.
So, when the second rule "Only keep the newest prod image"
is applied each image is already marked as KEEP
or EXPIRED
and so the rule is effectively skipped: deleting all our prod
images.
Learning from these types of accidents, a rule of thumb is to write your highest priority rules for your highest priority images and then your lowest priority rules for your lowest priority images.
Using this guideline, rewriting your policy with rules around prod
first, we get a policy like
{
"rules": [
{
"rulePriority": 1,
"description": "Only keep the newest prod image",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["prod"],
"countType": "imageCountMoreThan",
"countNumber": 1
},
"action": {
"type": "expire"
}
}
{
"rulePriority": 2,
"description": "Only keep the newest beta image",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["beta"],
"countType": "imageCountMoreThan",
"countNumber": 1
},
"action": {
"type": "expire"
}
},
]
}
which works as intended to keep only the newest beta
and prod
images.
| Label | Pushed at | Image Tags | | Rules | | Expired? |
| | | | | Rule 1 | Rule 2 | |
|:-----:|:-----------:|:----------:|:------:|:---------:|:---------:|:--------:|
| C | 8 days ago | beta-3 | | - | Keep | Keep |
| B | 9 days ago | beta-2 | prod-2 | Keep | - | Keep |
| A | 10 days ago | beta-1 | prod-1 | Expire | - | Expire |
In the end, writing ECR Lifecycle Policies can save you money like we did at Good Dog, but be sure to write your policies carefully and verify they are being applied as you expect.