Top Ten - reasons why you need a cloud landing zone.
You have embraced the cloud. Probably through some sort of proof of concept (PoC). It works! Put it in production! Time to grow that PoC that was funded through someone’s corporate card. Add more people. Add some complexity. Add some more applications. Why did we get this email from our cloud provider that someone exposed a key in a public repository? Hey, that expense report is getting big. Why is everyone an Admin? What are these charges from a remote region that we haven’t used yet?
Welcome to the endless possibilities (and risks) of using “The Cloud.”
Do not fear—the Landing Zone (LZ) is here. A well-crafted LZ reduces costs by standardizing the creation and operation of cloud-based applications, thus allowing for a shorter time to market and a reduced Mean Time to Repair (MTTR). Additionally, its enhanced security measures around identity, network security, and data protection help prevent data loss, both accidentally and otherwise.
Okay, but what is an LZ?
A cloud-based LZ is essentially a set of pre-configured guardrails for a cloud environment. These guardrails allow for deploying protected resources to the cloud in an automated, repeatable, and reliable fashion. Automation is key to ensuring that your LZ provides the protections necessary to keep your organization from financial and reputational risk. As you have already learned (or will very shortly), building cloud-based resources through the console is one of the quickest ways to create an unsupportable and costly mess.
A well-crafted LZ provides many benefits. Let’s look at the top 10.
1. Standardization
Standardization is one of the primary drivers for creating a cloud Landing Zone: making sure that policies are consistent and enforced. Through the use of Infrastructure as Code (IaC) templates, LZs provide you with a customized set of cookie cutters. These cookie cutters enforce a set of guardrails implemented via cloud policies. These policies ensure that all cloud resources running under your account apply the same set of guardrails. There are literally hundreds of different guardrail policies that can be configured to ensure protection through their consistent and automatic application.
You can do things like:
Set resource naming and tagging policies
Create a list of approved images
Restrict the set of (ever-expanding) resource types
Automatically creating a standard set of environments for each application
Limit access from approved domains
All with appropriate (and separate) permission sets. The rest of this list provides more details about the types of policies, but the primary reason they exist is to make sure that people, oops, forget and put your company at risk.
2. Security
On-prem or in the cloud, security in all its forms is of paramount importance when building, deploying, and operating systems. Security of your cloud platform is key to avoiding financial and reputational risk.
Certificates are the passports of the cloud world. Policy allows the enforcement of the approved corporate security standards in creating and requiring those certificates. Additionally policy allows you to configure their expiry date in order to force systematic rotation so as to minimize leakage.
Security also plays an important role in the configuration of identity, access management, and network configuration as we will see.
3. Identity and Access Management (IAM)
Identity management policy connects with your Authentication / Authorization provider to protect yourself from outside intrusion (or from former employees). MFA should be required the the fashion specified by your security team. Using policy ensures that new accounts are created in line with specified guidelines.
Often, role-based access is configured to ensure that granular control permits only authorized staff to access and manipulate specific resources. A crucial security measure is the use of the Principle of Least Privilege to provide the minimal privileges required and is tailored for specific resources. This principle applies to restricting system to system access as well as to data access (as we will discuss later).
Another important aspect of configuring IAM is to configure different roles for different environments of an application. This allows you to restrict access to your development, test, integration, production, etc. environments to different users. No one role should have access to all resources or environments. This will help prevent accidental manipulation (I thought I was in test!).
4. Tagging (resource identity)
Tagging resources allows the organization to keep track of important information about the resources created. Policy allows specific tags to be required for fields like project name/ID, program name/ID, organization, system name, and so on to be filled out (and even validated). This pattern allows for quick identification of who to contact when things go wrong.
Policy also enforces a corporate naming standard helping to maintain a clear and self-descriptive structure that is essential for operations.
Tagging will also play a critical role in allocating the cloud costs to the right products and projects (more on that later).
5. Networking
There are a wide variety of networking policies that can be configured. An in-depth discussion is beyond the scope of this conversation. That being said, there are some basic things that you will want to be sure are required.
All cloud providers support the creation of virtual networks. This permits secure connections from your internal network to the cloud resources through VPN tunnels. Policy also works to conform to your network design and necessitates specific ranges for subnets allocated to your resources.
Speaking of on-prem networks, block access to your LZ from outside domains and limit access back into your network to approved resources. Make sure that only necessary ports are open, and even then, restrict their access to expected resources.
6. Data Protection and Compliance
As with network guardrails, data guardrails are a long topic. At the very least, you want to be sure that all data in the cloud is encrypted in transit and at rest. Using role-based access previously described will allow you to restrict access to only those users/applications that are approved.
Beyond that, policy will help you configure automated backup procedures (say, to a different data center, just in case) and create appropriate data retention policies to ensure that unnecessary data isn’t hanging around. It can also help you comply with localized regulations and restrictions like the General Data Protection Regulation (GDPR).
7. Cost Control & Allocation
Tagging (previously mentioned) is a vital part of any Financial Operations (FinOps) effort to allocate cloud costs appropriately. Making cloud costs transparent and visible is key to operating an efficient cloud platform.
Policy can be created to ensure that non-production environments are cycled off during non-working hours (saving even more money). Some organizations also use it to cap spending on certain systems/programs as a means of providing financial governance.
Finally (and perhaps most importantly) policy can be used to make sure that people from your domain stop trying to use credit cards to get around all of your policies.
8. Observability (Logging, Monitoring, Alarming)
Monitoring is critical to maintaining a robust and efficient cloud infrastructure. Whether you are using a commercial product or the tools from your cloud provider, all cloud platforms provide a comprehensive, integrated logging solution. Policy allows you to pre-configure how and where logs are captured and the format used. Consistency is critical to the smooth operation of cloud environments.
9. Resilience and Disaster Recovery
One of the things that makes it easier than it ever was on-prem is the support for resiliency. All major providers create data centers with multiple discrete sources of power and networks to try to minimize possible service disruptions. They also have centers spread throughout the globe so that your applications can be close to your customers. In many instances, you can deploy your application in several geographically dispersed centers. This allows you to create services that are active-active so that if one center goes offline, the traffic will be routed to the nearest active service.
If an application isn’t ready to behave in such a way, it is easy to create a hot/warm/cold stand-by in a different geographic region. Through the use of data replication, the application data (within reason) will be ready, and your application can failover gracefully. Just make sure that you have selected a data center that provides all of the same services as your primary (because not all of them do).
Planning for backups and disaster recovery is a fundamental aspect of ensuring integrity and business continuity. It requires a deep understanding of how each selected resource in your architecture stores and backs up data, as well as its specific configurations.
And since you have done this all through automation, you can point your scripts at another region and stand up an LZ in a fraction of the time
10. Speed
Perhaps the best reason for a well-configured and automated LZ is the speed and efficiency with which new users, applications, and resources can be added in a standardized and secure fashion with minimal friction and risk. This will reduce your overall time to market and spur innovation focused on business value rather than spending time trying to configure the next application.
Leveraging the automation you have created permits the creation of new opportunities. Self-service portals that allow teams to create new systems without intervention. Integrating scripts into ticketing applications allows a zero-touch approach to fulfillment.
Conclusion
The Cloud gave us unprecedented flexibility and a dizzying array of options. The objective of implementing LZ is not to slow things down or reduce flexibility. It is quite the opposite. It is an initial investment, a foundation. Like a well built highway a cloud LZ is there to make sure that you can deliver and maximum speed, but with maximum safety.
That being said, there is a phrase we like to use “If you build it, they still won’t come!” A well crafted LZ requires documentation, training, and support to make the most impact. Helping engineering organizations understand how to take maximum advantage of your LZ will go a long way.