Apr 09 2024

/

5 Mistakes in Cloud Infrastructure Management You Want to Avoid

Most of the major cloud providers today provide a relatively easy user experience on creating cloud infrastructure. Anyone with basic cloud knowledge is able to create relatively complex cloud environments easily. At the same time, as the old saying goes, “with great power, comes great responsibility”, as such, it is also very easy to make mistakes which will make it difficult for later to grow your cloud infrastructure. In this article, I want to share with you 5 mistakes in cloud infrastructure management you want to avoid, to have a more easily scalable cloud environment.

1. Naming and tagging

As we have jumped to the cloud provider website and we want to create a new cloud resource, we mostly are focused on creating the resource and see them in action. We are focused on solving the problem at hand and completing that task. This in itself has some sense of urgency and in this urgency we usually do not have the will to stop and think properly how to name a resource and often use what comes to mind first. At the beginning this usually does not seem like a problem, but after your cloud infrastructure grows, you realize the “elephant in the room”. You start to face problems like:

  • You are not sure which cloud resource belongs to which solution
  • The resources have inconsistent naming structure
  • A change is needed to be made on a resource, but It is not clear who is the owner of the resource
  • Perhaps you receive a request from your finance department to map resources to products for cost calculation
  • Some of the resources are very difficult to be renamed or recreated later, improper naming will cost you a lot of effort later
  • etc.

Unstructured naming of cloud resources can create tremendously big problems in the future if not addressed on time. Here are some good practices we can follow here:

  1. Define a naming standard to be followed for every resource. There is not a defined best practice for this, you would need to see what are your organizational requirements. My preferred structure is as follows:

    Environment_SolutionName_AnyCustomIndicator

    e.g. for a solution named MyProjectMonitor, here is how I would name some resources

    – D_MyProjectMonitor
    – D_MyProjectMonitor_VM1
    – D_MyProjectMonitor_VM2
    – D_MyProjectMonitor_DB_Master
    – D_MyProjectMonitor_DB_ReadReplica

    The first letter would be indicating the environment (e.g. D – Development, T – Test, S – Staging, P – Production, etc.), the middle part names the solution, the last part the type of resource. The last part is convenience indicator (e.g. number of VM)

    This way, as soon as you read the first letter, you realize which environment you touch (helps avoiding touching production by mistake), then you read the solution clearly. I would say, any standard is better than no standard.
  2. Tag your resources. Tagging resources will help you later do different filtering and easy identification of resources for various purposes. What kind of tags you need will be dependent of your situation, here are some tags that I would see as required/helpful:

    – Owner – indicate the owning team or the department if the solution
    – Creator – who has created the resource. Later, if you have questions about the resource, you know whom to contact
    – Cost Center – this is mostly helpful when you need to calculate resource expenses per cost center (project, department, etc.)

    Remember: even if you currently don’t have tags, it is always helpful to go back and tag them later

2. Over-provisioning

This is a relatively easy problem to fix but also easy to overlook. When creating resources we often are not completely sure what kind of resource capacity we need and err on the side of over-provisioning. Then we deploy our solution, we test it, we see it’s working and done – we leave it as it is. Often, over-provisioned resources do not come to our attention immediately, until our cloud bill becomes too large. When looking for optimizations, we find out we were paying for unneeded capacity.

How to avoid over-provisioning? Well, start with the smallest unit you can go and adjust from there. If you have provisioned under capacity, it will immediately be visible to you as your solution will not work, but if you over-provision, it will silently eat your budget up. Also, from time to time, iterate over your cloud resources and check how well is their capacity utilized. If underutilized, then downscale to optimize the costs.

3. Over-architecting

We often do enjoy creating complex cloud architectures – comes from the desire to learn I guess. But… is this exactly what we need in our current situation? When deploying our MVP monolith solution, do we really need that Kubernetes cluster there?

Chances are, you can often start with a simple architecture and evolve from there. You can deploy a monolith to a container runner before you jump on to Kubernetes, you can probably at the beginning do well with a simple queue service before you configure a Kafka cluster.

Keeping your cloud architecture simple will not only keep your cloud bills low, it will also enable your team to move forward faster. Fewer and simpler components to maintain will make you benefit from being small and innovate swiftly.

4. Not defining security and access privileges properly

Security is not a problem, until it is a big problem. It is easy to overlook definition of privileges or using one account with wide privileges for one or many cloud solutions. Unfortunately, privileges is a topic that can be relatively difficult and effort intensive to be fixed later. If you want to adjust resource privileges later, it is very challenging to do this without causing disruptions to running systems, often with obscure error messages that make debugging extremely difficult. Also, same as with over-provisioning, you will not hear anyone complaining that their account has too many privileges assigned.

Proper Access Management is something you want to do right from the beginning, it will pay off. Here is my check-list on the topic:

  • Create one account per context of the solution (Azure: Service Principal, AWS: IAM account)
  • Start with zero privileges, then add what is needed and nothing more
  • Keep accounts documented in a central place (Confluence, Notion, etc.)
  • Create access groups per solution (e.g. Solution-Developers, Solution-Admins, etc.)
  • Access on resources is given only with groups, not individual accounts (otherwise too difficult to manage when people leave)
  • Review the accounts and their privileges every couple of months. Ideally automate this through policies

5. Not using Infrastructure as Code (IaC)

This one is very easy to be overlooked, right? You need a resource, jump to the cloud provider web site, a couple of clicks, and here you are, you have the resource up and running. Then you need one more, later another one, and soon enough you have a solution with 10+ resources in place. But, what happens when for example:

  • You need to create an identical infrastructure for another environment
  • Your company wants to get a certain certification and as per the certification requirements you need to have a disaster recovery solution in place
  • You want to expand to the a market on the other side of the world and you need to place resources closer to your market, or in the specific country because of their laws
  • Your solution needs to scale and you need to provision more resources

These are only some of the situation which will require you to re-provision the same resources/architecture somewhere else and in such a scenario, you will need to recreate all of the resources manually. This will not only be slow, but also very error prone, especially if long time has passed since you originally created your infrastructure and your documentation is outdated.

This nightmare scenario can easily be alleviated or completely avoided if you use IaC. By using an Infrastructure as Code provider such as Terraform, Pulumi, etc., we do not only create resources automatically, but also have a living documentation of our cloud infrastructure. With such a solution in place, you can evolve or recreate completely your cloud infrastructure in another data center with minimal effort and complexity.

Final thoughts

Creating a well structured cloud infrastructure is not an easy task. Especially, if you are just starting with your solution and speed is your ultimate requirement. It is OK to hack your way at the beginning and create a solution fast. It is also very important, when you know that the current solution is stable and this is what you need, to go back and do your homework, establish those standards. Most of these common problems can be relatively easily fixed when the solution is small. Ignoring these mistakes will cost you time and money later on, so the earlier you fix them, the better.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *