RIFM: Right-Sizing – Provisioning Instances to Match Workloads

TL;DR

Save 70% of your monthly bill by matching your instance types and sizes to the average performance and capacity needed. Dynamically scaling resources like S3, Lambda need not apply.

Introduction

Historically infrastructure has been provisioned to meet peak demand. Cloud infrastructures are much more elastic. Right-size to save up to 70% monthly. This is mostly applicable to the static parts of your infrastructure stack like EC2, RDS and ElastiCache because they don’t scale without intervention. See the bottom of this post for a big list of tools to help.

Right Size Pre-Migration

The absolute best time to right-size is before and during migrating into the cloud. It can be retrofitted and you will likely see a faster migration as a result but you will see a higher cost and it will take much longer to pull it back later. In the fast/cheap/good triangle, “Good” is a given because you’ve already decided this is the best move for your company. Now you need to choose between fast and cheap.

Don’t Stop Right-Sizing

Right-sizing is not a one size fits all, set it and forget it recipe. Performance and capacity requirements change so frequently that usage of resources (or the lack thereof) can get away from you fast. Consider it at the start, and then add it as a monthly maintenance item to your team’s backlog. There are a lot of tools to help with this (see below) but tagging is a must.

Amazon EC2 and RDS Overview

Choosing an EC2 or RDS instance for a workload requires knowing the full suite of instances families available, and the pros and cons of each. This is covered best in their respective docs. For EC2 there are 5 general families: General Purpose, Compute Optimized, Memory Optimized, Storage Optimized, and Accelerated Computing. RDS has 3: Standard Performance, Burstable Performance, and Memory Optimized.

Don’t feel bad if you need a mnemonic to remember them all.

Identifying Opportunities to Right Size

Monitor, monitor, monitor. Cloud computing and enhanced monitoring must go hand in hand because of the inherent trade-offs between fixed cost purchasing and dynamic, cloud scaling architectures. I’ve seen companies overspend by over $30k a month in completely unused resources just because they weren’t watching. Watch performance and usage of things like utilization of vCPU, memory, network, and ephemeral disk use over two weeks to a month.

There are loads of tools for this both by Amazon and third parties, paid and free. See below for a list.

Developing Your Own Tools

The takeaway here should be it is possible to develop custom tools specific to your use case and business. There is a lot of detail in the white paper about metrics to look for and what to exclude and include in searches. If you’re considering writing a tool, read the paper. It’ll be worth your while.

Tips for Right-Sizing

Under-utilization

Look for instances (non-burstable like T2) with maximum CPU and memory usage of less than 40% over a 4 week period. Cut those machines in half.

Usage Needs

Look at the patterns of your applications load and choose strategies accordingly.

Do you have a fairly constant load level? Consider Reserved Instances.

Does your load change predictably? Look at Auto Scaling.

Does your team have pre-prod environments like dev and UAT? Turn them off when they aren’t in use like weekends and holidays.

Is your workload very short term, like monthly reporting? Bid for Spot Instances instead of using On-Demand.

Turn off Idle Instances

Generally speaking, it should be safe to turn off (or even terminate) instances that have been idle for more than 2 weeks. Consider stakeholders and your whole team before terminating an instance though. Stopping instances leaves EBS volumes operational, but terminating them deletes attached volumes and re-provisioning can be laborious.

For pre-prod environments or day based workloads use tagging and tools like EC2 Scheduler and Lambda to stop instances during non-business hours.

Right-Sizing Your Database Instance

Remember that storage and RDS instances are decoupled. You can modify and scale up or down your database and storage completely independently.

Conclusion

This white paper is full of great, detailed tips on right-sizing your environments and specific metrics to look for when building tools to do so. It’s only about 10 pages of actual content. Give it a read on Amazon for all the details.