Many companies have already started their cloud adoption journey however those newer to cloud often state that their biggest challenge is lack of resources or skills. As companies become more advanced in their use of cloud the challenge shifts towards managing cloud spend[i]. Many successfully move to the cloud but struggle to control cost sprawl, eventually losing the cost reduction benefit of moving to the cloud in the first place. If you have adopted AWS as your cloud platform there are steps you should take and tools and features available that can help you optimise cloud cost management and reduce costs.

Monitoring and analysing costs

The first step you should take whenever you start thinking about reducing the costs of your cloud infrastructure is to review your current spending. AWS comes with a lot of tools to help you stay on top of it, here are some of the tools that I recommend you use to help with cost control and management.

AWS Billing and Cost Management is the place to start when reviewing your costs. From here you can browse all your previous bills and drill down on costs, it also gives you a high-level overview of what your biggest spends are. Your account bill will list all resources consumed including reservations, and in the case of EC2, also Spot Instances, helping to identify areas that you should start looking for optimisations within. Reviewing your monthly cost allocation reports can also help to identify key areas for optimisation. In the case of large AWS accounts, I highly recommend creating your own tags and monitoring costs across groups of resources/services.

Trusted Advisor Cost Optimisation is a useful tool that analyses the usage of your resources and prepares a list of cost-saving recommendations for you. By using this tool you may be surprised how easy it is to find cost-saving options within your AWS account.

I would also recommend that you create AWS CloudWatch Billing Alerts to stay notified about your cloud infrastructure spending. A good starting point is to calculate the average of your last three bills and setup CloudWatch alerts for S3, EC2, RDS, ElastiCache, and Data Transfer. Getting a billing alert towards the end of a month would be completely normal, however, getting such alert on the fifth day of a month would indicate a problem and should prompt investigation into your cloud infrastructure.

AWS recently released a new service called AWS Budgets. It combines a lot of the functionality covered by other services into one central place, but it also has some additional features. For a start, you can create and manage budgets for your cloud infrastructure. You can use dashboards, reports, and filters to review and analyse costs, you will also get alerts when cost is increasing/forecasted to exceed/exceeded. Finally, a nice addition is that all your reservations (EC2, Redshift, RDS, ElastiCache, etc.) are visible in one place so you won’t miss reservations expiring.

If you are a big organisation which manages different AWS accounts, using Consolidated Billing is a must. It gives insight into individual AWS account spending but can also provide you with additional cost savings – we do this at a company level in Kainos and thanks to this, unused AWS EC2 reservations are shared with other accounts, you also get Volume Discounts for services like S3 and EC2. Finally, some services like AWS Shield Advanced, once purchased, are enabled out of the box for all consolidated accounts.

How to achieve cost savings 

Now that you have analysed your spending it is time to introduce cost-saving actions in the identified areas for optimisation. From my experience, there are costs savings which you can safely apply in both production and non-production environments. You may also want to be a little bit more aggressive on costs in your development and test environments.

Let’s look at some actions that can be successfully implemented in both production and non-production environments.

Buy reservations

  • Make sure you buy reservations for RDS (one of the most expensive services we use), ElastiCache, and of course EC2.

  • When buying reservations for EC2 make sure to buy the convertible reservations – they are slightly more expensive but have the advantage of being convertible to new machine types. Cloud platforms are not static and platform sizing exercises must be done regularly, so convertible reservations will come in useful.

  • Ensure that you buy EC2 reservations for all static servers; such as Auto Scaling Groups (ASG) i.e. a fleet of background workers, use on-demand instances (AWS uses a one-second billing cycle). Of course, if your ASG is operating at a predictable/measurable minimum capacity 24/7 it makes perfect sense to purchase reservations for your ASG to match its minimum capacity.

  • Make sure you purchase convertible EC2 reservations for three years with no up-front payment. This can give a reduction of up to 50% compared to on-demand ones. Also, three years, no upfront payment reservations have the advantage of not sending ripples through your finance team, which could potentially happen if they saw an invoice for 50 machines for three years with full up-front payment.

Utilise services and infrastructure

  • If you use other services from the AWS Compute family there is a new offering available called Savings Plan which applies to AWS Compute services like AWS EC2 (across any machine type or operating system), AWS Fargate and AWS Lambda regardless of AWS Regions. By utilising the savings plan service, you can reduce your spending by up to 72%.

  • If you have a stateless architecture or you designed your background jobs to easily restart failed jobs, you may want to use a mix of reserved and Spot Instances. Reserved instances should always be running, making sure your end-users always get the agreed minimal service-level while spot ones will add more operational flexibility to handle increased load. This setup is now possible thanks to Auto Scaling Launch Templates which supports multiple purchase options.

  • Starting servers on-demand, especially expensive clusters, can help significantly reduce costs. For example, you could start AWS Redshift on-demand in the morning, run your governance and compliance checks and then terminate it once completed – this is the very essence of pay as you go within cloud.

Platform sizing

  • Make sure you perform platform sizing exercises at least twice a year and don’t just stick to default machine types within this exercise. AWS has multiple machine families – here is the list to refresh your memory: a, c, d, f, g, h, i, m, p, r, t, u, x, and z. Families like c (compute-optimised), m (general), r (memory-optimised) have several different variations (types, sizes, and CPU) and there are plenty of specialised machines to choose from too.

  • Another, often neglected, platform sizing exercise is right-sizing EBS and RDS storage size. It can turn out that the costs of “live” GBs are only a fraction of what you pay for local and/or cross-region backups (the latter, a part of storage costs, generates network costs). This is especially important if you are legally or contractually bound to maintain daily copies of your data for a specific time. Also, make sure that your EBS and RDS storage has a healthy utilisation factor, there is no point in allocating 500 GB EBS when 100 GB is fine.

  • When performing platform sizing always pay attention to new generations of machines. They are often cheaper than the older generations- this is how AWS encourages its customers to upgrade their hardware.

Optimise your storage and network

  • In recent years AWS released four different AWS S3 storage classes, with an especially interesting one called intelligent tiering. The benefit of this new storage class is that it detects usage & access patterns of data and moves the data to the most cost-effective access tier in order to optimise the overall storage costs.

  • Optimising your web traffic is just as important as optimising storage when trying to reduce and manage your costs. If you have a particularly high volume website, I recommend utilising CloudFront to help manage and optimise your web traffic.

  • Make sure to optimise your network traffic as well – if you are using AWS S3 or AWS DynamoDB a lot then VPC Gateways Endpoints are a must. AWS S3 and AWS DynamoDB have public endpoints and requests made from within VPC, leave your VPC and then hit public endpoints, which generates additional network traffic. By utilising S3 and DynamoDB gateway endpoints the network traffic is routed as an internal one, which for large systems, can turn into thousands of pounds of savings very quickly.

In a non-production environment, if you can accept a slightly lower service level agreement, you can implement the following cost savings actions

Serverless and reducing capacity resources

  • Evaluate what you truly need running in your test and development environments. For example, they most likely don’t need AWS RDS running in Multi-AZ, nor do you need a cluster of AWS ElastiCache Redis nodes. Do you need 2xlarge AWS RDS instance? Maybe you could live with just one NAT GW for all your VPC AZs? It is important that you complete a proper non-production platform sizing, if you are using AWS CloudFormation use condition functions to provide additional reduced capacity resources.

  • I would highly recommend investigating what serverless options of popular DB engines are available by using AWS Aurora Serverless (MySQL- compatible and PostgreSQL- compatible). Serverless offerings are cost-effective options for infrequent, intermittent, or unpredictable workloads, perfect for use in development and test environments.

  • Evaluate the necessity of running some tasks daily in non-production environments and move what you can to weekly instead. For example, in production, we do a daily cross-region replication of all resources in both US and EU (DB snapshots, golden images, etc.). Inter-region transfer fees can total to thousands of pounds, so this would not be cost-effective to run at this frequency in a non-production environment.

Spot Instances and build clusters

  • Ensure you are using Spot Instances where you can. When utilised correctly they can help provide cost savings of up to 90% – we are currently saving tens of thousands of pounds every month thanks to Spot Instances.

  • Build clusters are the ideal use case for utilising Spot Instances. We have six different build clusters and all their agents are running on Spot Instances helping provide us with cost savings, see TeamCity and Jenkins documentation for further information.

  • Speaking of build clusters, network traffic costs come right after EC2 costs. Our build clusters fetch TBs of data from our git repository, both TeamCity and Jenkins support caching of git repos which enables us to save thousands of pounds on network traffic.

  • AWS makes it very easy to use Spot Instances in Auto Scaling Groups – just specify the spot price bid in Launch Configuration and AWS will take care of the rest.

  • Don’t be greedy – as your Spot Instance bid sets the on-demand price the worst-case scenario is you will pay the on-demand price.

  • Sometimes it’s even better to set the bid price a little bit above the on-demand one, this way you become more resilient to short term fluctuations.

  • From the AWS Web Console, you can actually view Spot Instance price history graphs. This feature is also available in the AWS Command Line Interface describe-spot-price-history command and is useful for viewing and planning for cost fluctuations over time.

Shutdowns

  • I highly recommend shutting down environments at night and during weekends to help reduce costs. Auto Scaling Group has schedule time triggers which can be used to start/stop machines based on time events. The ultimate cloud cost reduction rule is – if you don’t need it, terminate it.

  • For those companies that have already adapted infrastructure-as-code, the next step after shutting down environments at night and during weekends is running environments that can be started on demand. This ensures that you are only paying for what you truly use, when and as you need it.

Managing costs when changing existing infrastructure

Making changes to existing infrastructure can also cause cloud costs to increase so it is important to manage costs from the outset of this process. When designing new components or functionality you can easily get an estimate of the anticipated costs by doing the following:

  • Use AWS Pricing Calculator– throughout the years this was my go-to tool for creating all my infrastructure budgets. My budgets were always accurate and with AWS costs calculator it’s very easy to run multiple infrastructure simulations and compare them.

  • AWS Pricing Calculator also supports reservations so you can try out different setups and choose the best one for you and your budget.

  • If you do infrastructure-as-code and use AWS CloudFormation for it, you can get an estimate of your monthly bill when making adjustments – just upload the CloudFormation template to AWS Web Console and click on the costs link. This will give you a good indication of how much new features/services /resources could affect your monthly bill; the feature is also available in the AWS Command Line Interface estimate-template-costs command.

  • Always evaluate new AWS services – you may want to replace AWS Redshift with AWS Athena or maybe you missed the announcement of AWS managed MongoDB or Cassandra services, these services have the advantage of lower maintenance & management overheads. Remember that a DevOps engineer’s time is also your cost – if a new managed service is released that can do the job of a DevOps engineer (and perhaps even better), then go for it.

  • You could also re-architect your application to use serverless technologies as another cost-saving option. AWS has a dedicated portal about the benefits of serverless solutions and how to implement them, visit AWS Serverless to find out more.

Making the move to the cloud can help companies benefit from reduced costs from the outset. However, over time, poor cloud cost management and lack of on-going optimisation can lead to spiralling cloud costs. By utilising the tools and services offered by your cloud platform provider, like the ones mentioned above, you can effectively manage your cloud spend long term and help your company harness the power of cloud without the headache of being over budget.

To find out more about our cloud cost optimisation offering and how we can help you manage cloud cost sprawl click here.

[i] Flexera™ State of the Cloud Report 2020