Go to Production
Recommendations on creating production ready systems.
Important
This section is extremely important for running production-grade application. Please read the recommendations below.
If you've already created a staging environment, you’ll have by now everything you need to successfully build environments, deploy and maintain your applications in production! There are however some nuances in the configuration of production environments you will want to make to improve cluster and service availability and reliability.
- When building production environments, we recommend using (1) multiple nodes and (2) general purpose tier databases.
In practice, when using multi-node architectures, we find that:
- Small nodes are suitable for staging environments that don't run heavy workloads
- Medium nodes or larger make appropriate Manager nodes in production environments.
- At least 2 medium nodes are appropriate Worker nodes in production environments and scaled to large and x-large nodes as needed.
- You can always add more worker nodes of any size to meet the performance requirements of your services.
- When deploying services, it's important to leave the Manager node as free of tasks/processing as possible.
This is because the Manager node is responsible for orchestrating Docker. If the Manager node's resources are ever exhausted, Docker will have trouble orchestrating and the performance of managing your cluster in MedStack Control can become very slow. This is because MedStack Control interacts directly with Docker. Another symptom of an overworked Manager node is that the web services become unavailable. There are some strategies you can do to prevent this from happening:
- Run a Manager node of considerable size, at least a Medium sized node.
- Use service placement constraints to devise conditions for containers to run on nodes other than the Manager node. For example, this could be
node.role != manager
. This service placement constraint could be coupled with others that might be used by stateful services, for example likenode.labels.database == true
provided that you've created a node label fordatabase = true
for running a containerized database on a specific node.
Resources
The production-recommended cloud resources to use in a production environment are as follows.
Nodes
See the complete list of nodes in the node specifications table.
- Large (2 vCPU, 8 GB memory)
- X-Large (4 vCPU, 16 GB memory)
Databases
See the complete list of databases in the database specification table.
- General purpose (2 vCPU, 10 GB memory)
- General purpose (4 vCPU, 20 GB memory)
Optimize Docker application
Docker's official documentation has the following recommendations for running your application on a production environment.
-
Keep your images small. This will speed image download which is a key factor in deployment time.
Digital Ocean has an extensive guide on optimizing images for production. See the official Digital Ocean guide on optimizing Docker images for production.
-
Switch from bind mounts to volumes. Bind mounts are convenient for development but are not available in production.
-
Make use of secrets and configs.
For more information, see the official Docker development best practices guide.
Production switchover
If you're migrating to MedStack Control, before performing a production switchover, we recommend you have first done the following:
-
Test a database dump from the current environment and restore into the MedStack Control environment.
-
Map an A record in your DNS settings to point to the manager node in the production Docker environment. The load balancer is pinned to the manager node thus it is the IP address where traffic will ingress. We recommend mapping the DNS setting 48 hours in advance if possible to mitigate any issues of the DNS settings being delayed in propagating.
-
Schedule at least a two (2) hours maintenance window with your end users where the application may not be available.
Maintaining production environments
As part of optimizing for production, it is important to consider various disaster recovery, service disruption and degradation, and data continuity scenarios.
Planned reboots for system resiliency
Nodes require an occasional reboot to remain operationally robust, and to keep the virtual hardware healthy. For resilient system uptime, maintaining system patch stability, and ensuring node protection from any recently discovered vulnerabilities, we recommend that you:
- Reboot nodes once a week.
- Perform node maintenance whenever it is available.
It is not uncommon for companies to set aside 15-90 minutes a week for system maintenance. Nodes can run for a long period of time without problems, however it is important to introduce a reboot strategy so that potential problems can be caught early. Rebooting nodes can be considered a form of chaos engineering to build confidence in your system's capabilities to withstand turbulent conditions in production. See also:
- Reducing Deployment Downtime - Strategies for reducing downtime.
- Load Balancer Healthchecks - Determine when a container is not healthy enough to receive traffic.
Architect and plan effectively to avoid surprises
The longer a node has remained active since its last reboot, the greater the risk of something going wrong. If necessary, it may be safer to create a new node and migrate all required data and services instead.
If a single node is critical to your operations that rebooting it or scheduling maintenance feels impossible, you may wish to reconsider your system design and disaster recovery scenarios.
Data integrity and disaster recovery
By privacy and security design, we are unable to validate what is inside of your backups.
- Create a schedule for testing your backup restoration process.
- Validate your backups and disaster recovery plan once every 1-3 months.
Incident response
- Enable notifications in MedStack Control.
- Share your incident readiness and response protocols with us.
- Know how to contact our help desk in MedStack Control.
Updated 7 months ago