Automation: How can you automate AWS infrastructure and reduce your AWS bill by 60%?

In this article, we shall discuss automated strategies to save the AWS bill by optimizing the compute and storage resources.

Kapil Jain
7 min readOct 24, 2020
Automate AWS infrastructure to reduce AWS Bill

In this article, We shall discuss ways to reduce the AWS cost and assumed that you have already optimized the EC2 instance type and EBS size. If you have not, then I encourage you to look at this article to optimize the AWS cost.

We have been using the AWS platform mostly for the compute(EC2) and storage(EBS) for our CRM products. We recently noticed that our AWS cost shoot up ~2.5 times from last quarter and that raised concerns. We deep dive to find the reason for it and here were our findings:

  • We upgraded machines for developers so that they can work efficiently.
  • We added support of a new database environment for our product so we added new machines for the QA.
  • We started bi-weekly backup of running machines.

Most of us would have observed this during the software product development and testing phase and would have thought it to be unavoidable. I was in the same boat but decided to dig more into this. I was interested in reducing the cost so I did a cost analysis and here were my observations:

  • We were not utilizing our development machine 100% and these machines were used for < 20 hours per week because the development team was handling multiple products and work on our product for a few hours a day.
  • We were underutilizing our automated QA machines as these were used during the running of automated test cases, which was less than an hour/day.
  • Our Manual QA testing machines were used during the weekly release so we were not using them 100%.
  • A lot of redundant snapshots and EBS volumes were created during creating a machine or creating AMIs.

It was interesting observations and I realized that we could save the AWS cost and finally come up with the following strategies:

  1. We need to use spot instances to save up to 90% of the cost as compared to an on-demand instance. We were already using spot instances but some of you can use it to reduce your cost.
  2. AWS charges when a spot instance is running and we can save the cost if we stop an instance when the machine is idle. We need to automate this process so that when there is no activity on the machine then it can be turned off.
  3. Some of the machines are using for a specific period so we could schedule this machine when it is required and turn it off after use.
  4. Automate a process to remove the redundant snapshots and EBS volumes periodically.

We shall discuss these strategies in the details now and have hands-on experience to automate these strategies to save our costs and efforts.

Strategy 1: Start using a spot instance instead of an on-demands instance

All of these machines were used for development and testing purpose so we don’t have to run them all the time. We were already using spot instances for all the machines so this strategy is already in place for us. This strategy has one disadvantage that you can not be assured of the availability of these machines as it depends on the AWS marketplace. You can start trying this strategy for one of your machines and if it works for you then you can move your on-demand instances to spot instances gradually. I found that if we use the latest instance type then we can save the cost as well. Here is the comparison of the instance types and you can save 10% — 25% of the cost and get better performance at a lower price. If you are planning to migrate then you need to upgrade the drivers.

Cost Saving Using the latest instance type

Strategy 2: Automate the process of detecting an idle machine and stop it

I found an interesting article to stop the machine when it’s idle(there is no windows RDP connection) and save the cost. I took inspiration from this script and enhanced this utility so that it can be reused by others. Though this script is written in C# and used in a Windows environment but can be enhanced to use it for Linux or any other platform.

How does this utility work?

How this utility works?
  • Creates a windows service which sends active RDP connection to Cloudwatch service periodically (every 10 minutes)
  • Creates an alarm on AWS Cloudwatch that stops the instance if there is no RDP connection.

I packaged this utility as an installer and it can be installed on Windows machines. You can get the installer directly from the release directory and install it on the machine:

You need to create an IAM user with an access policy of Cloudwatch and EC2. you need to enter your AWS details in this step and enter the region for your machine.

Once the installation is done then you can start the service or reboot the machine so that it can start itself.

Once the service started, it shall start sending custom metrics (RDP connection) to AWS Cloudwatch every 10 minutes.

It creates an alarm on the AWS cloudwatch that looks for active RDP connections and stops the machine if the machine does not have RDP connections for an hour.

So we have implemented this strategy on our development and Manual QA machines. We shall look at the effect of this strategy on the final result section.

Strategy 3: Schedule starting and stopping of ec2 instances based on the periodic usage

We targeted this strategy for our Automated QA machines which run at a specific period of the day. We used AWS Scheduler Service for this purpose and set up rules to turn on/off an ec2 instance based on a specific hour or office-hours or hours for a timezone.

This strategy ensures that we machines are turned on when automation test cases are running and turned off after testing is done.

Strategy 4: Automate removal of unused AWS storage resources

The last strategy was to remove the unused storage. I found a powerful script that can be run manually from the system to remove the unused storage. This shall reduce unnecessary storage costs.

I may modify this in the coming days so that it can be automated as a lambda function and can be run periodically.

Final Results

Now is the time to see the result of strategies, we were able to reduce our cost by ~ 60% and you can see it in our daily cost for the machine.

Other Strategies to reduce AWS Bill:

  • We can use CPU utilization metrics to turn off the machine instead of based on RDP Connection.
  • We can create an API(API Gateway+ Lambda) to start an instance. We can use this script for our CICD pipeline i.e. whenever we are planning for deployment then these machines can be started.
  • We can create a lambda function and schedule it to create machines' backups on regular basis.

The above strategies worked for us and it reduced our cost. I am sure that there can be more strategies to save the cost further and I would like to hear it from you. I would happy to implement those strategies and share the results here.

If you find this article helps then please clap and share this article in your circle so that others can take advantage of it. Also, I would like to hear how you implemented these strategies and reduced AWS Bill.

--

--