Manage CPU Credits for Small and Medium Nodes
Within MedStack Control, Small and Medium-sized nodes are budget-friendly virtual machines that are perfect for light workloads while still handling periodic spikes of activity. They do this through the use of a "burstable" architecture, where the CPU runs at a low-load nominally (< 20% for these VMs) but the cloud host allows these machines to run above this threshold for short periods of time. Managing burstable nodes requires awareness of CPU credits and how they affect the behaviour of the node.
How Do CPU Credits Work?
To track this, cloud hosts use the concept of "CPU credits" which are saved up by running the CPU at low load and are spent when the CPU spikes higher. Each node has its own tally of CPU credits available. CPU credits are not shared or pooled. Once the credits run out, the node is throttled back to the nominal level until the credits can accumulate once more.
How Do CPU Credits Get Used?
Since CPU credits are based upon CPU load, any workload performed by the node can draw them down. The most common sources are:
- Your own services you run on containers on the node
- Backups, which are required for compliance, compress the data before transmitting
- MedStack-managed services, like the Load Balancer
- Operating System actions like updates, reboots, etc
If your services are consistently running near the threshold, you run the risk of your CPU credits drawing down from mandatory operations like backups and security patches.
What Happens When My Credits are Exhausted?
When a node runs out of CPU credits, the CPU is throttled down to the nominal level. This translates into longer response times for your services that are running on the node. These can result in response timeouts and watchdogs triggering if operations drag on.
A particularly painful failure pattern occurs when a node has run out of CPU credits and a node is manually rebooted: there is typically a spike in CPU usage when a node is initialing the OS, which will make it appear like it is refusing to come back online. This is made worse if the OS needs to update a package. Whenever possible, MedStack takes steps to ensure that updates and maintenance do not throttle your CPU but OS updates can occur asynchronously.
What Should I Do if I am Running out of Credits?
Drawing down CPU credits is a normal part of operating burstable nodes: it is often impossible to accurately predict when your loads are going to spike. When you are consistently finding your nodes drawing down their credits, a few options are available to you:
Temporarily Add Nodes to Handle the Peak Load
If it's a particularly busy time, one of the advantages of having your application containerized in Docker is you can spin up additional nodes to host additional services and then tear them down again when things settle down. Often only a single node is necessary to tip the scales and bring your system back to stability.
You can use our Infrastructure API in conjunction with a simple script or cron job to automatically scale nodes in anticipation of predictable demand (eg. daytime vs nighttime or weekdays vs weekends)
Add More Small/Medium Nodes
If you are seeing an increase in base load, you can add additional small or medium-sized nodes and rebalance container workloads to these nodes to share the burden. They can chip away at the base load but can burst to absorb peak loads as needed.
Resize to a Larger Node
Resizing to a larger node will immediately add CPU capacity to the node and may swiftly restore the operation of unresponsive nodes with completely utilized CPU and CPU credits.
Large and X-Large nodes are not burstable and have virtual cores that are more powerful and are intended to handle larger, general purpose workloads.
Prune Unneeded Volumes & Data
An increase in CPU load can often be attributed to the compute capacity required to collect, compress and transmit backups. If your application has a lot of data, it may be worth identifying redundancies or vestigial data that is no longer needed and removing it so that there is less data to process. Refer to these instructions on how to remove a Docker volume.
Optimize Your Application
Finally, you can profile your own application and identify areas that are compute-heavy. The Pareto principle can often be applied to minimize effort - often a small handful of routines or queries result in a disproportionate CPU load.
Updated 7 months ago