1.1.1. Preface
It has been more than four years since I started writing the Terraform Tutorial. During these four years, many things have happened, and both HashiCorp and Terraform have undergone numerous changes. I have primarily worked on the Terraform Module ecosystem during this period, and have increasingly realized that modules are the most important part of the Terraform ecosystem. After much consideration, I decided to compile the experiences and lessons accumulated over these years into a book to help subsequent developers, especially Chinese developers, better use Terraform.
1.1.1.1. What is a Terraform Module
Terraform modules are independent Infrastructure as Code snippets that abstract the underlying complexity of infrastructure deployment. Terraform users can accelerate IaC adoption through pre-built configuration code and lower the barrier to entry. Therefore, module authors should strive to follow code best practices such as clear code structure and the DRY ("Don't Repeat Yourself") principle.
Essentially, a Terraform Module is a directory containing multiple Terraform configuration files, typically used to encapsulate a group of related resources and configurations for reuse and sharing. Modules help users better organize and manage Terraform code, improving code readability and maintainability.
Important Note: This book assumes readers are already familiar with basic Terraform usage skills and will not elaborate on these foundational concepts.
1.1.1.2. What is Azure Verified Module
Azure Verified Modules (AVM) is an official open-source project launched by Microsoft, aimed at establishing unified standards for Infrastructure as Code (IaC) modules and improving the consistency, reliability, and maintainability of Azure resource deployment.
AVM provides predefined, reusable IaC modules developed and supported by Microsoft, applicable to both Bicep and Terraform. These modules follow Microsoft's best practices and architectural specifications (such as the Well-Architected Framework, WAF), helping users quickly and securely deploy Azure resources and architectural patterns.
Its core advantages include:
- Standardization and Consistency: All modules follow unified structure and coding standards, ensuring deployment consistency.
- Official Microsoft Support: Developed and maintained by Microsoft officially, providing long-term support and regular updates.
- Cross-language Support: Currently supports Bicep and Terraform, with potential expansion to more IaC tools in the future.
- Compliance and Best Practices: Modules align with WAF and security benchmarks, ensuring deployment security and compliance.
- Automated Documentation and Testing: Built-in automatic documentation generation and comprehensive testing, improving module usability and reliability.
I am one of core developers of AVM and have been involved in development and governance work since the project's inception. The experiences, lessons, and standards summarized in this book come from AVM.
Important Note: All rules, policies and experiences introduced in this book are my personal opinions and DO NOT represent the official opinions of the AVM team or Microsoft. If readers find content that differs from AVM standards or disagree with certain viewpoints, please do not consider them as viewpoints of AVM or Microsoft. This book is a work with a strong personal style, and all viewpoints represent only my personal opinions.
Important Note: This book will introduce some tools we have used or are currently using. Due to my limited capabilities and the rapid iterative evolution of the Infrastructure as Code community, if readers find that some content in the tools section is inconsistent with official documentation or encounters errors during use, please don't doubt, it is most likely my mistake or AVM have more breaking changes. Please submit relevant issues to notify me for corrections. Thank you in advance.
1.1.1.3. Starting from the Big Mud Ball
Typically, tutorials for Terraform beginners often put all demonstration code in a single main.tf file, or slightly more formally, present a structure like this:
.
├── main.tf
├── outputs.tf
└── variables.tf
This approach of writing the entire infrastructure in a single main.tf file is fine for demonstrating functionality or teaching beginners, but in production environments, this would result in excessively long files that are hare for a code review. So a slightly more complex module structure could look like this:
.
├── Makefile
├── README.md
├── chefignore
├── db.tf
├── gateway.tf
├── images
│ └── diagram.png
├── kitchen.tf
├── main.tf
├── securitygroup.tf
├── subnet.tf
├── terraform.tfvars
├── variables.tf
└── vpc.tf
By storing resources in independent files categorized by type, individual code files don't become too long, and the cohesion of code files is improved.
However, even this approach still has certain problems:
- The stack is too large, managing too many resources, and each
planandapplyexecution takes a long time - The blast radius is too large—if you make a mistake, such as accidentally executing
destroy, or passing a wrong value to a variable assigned to a force-new argument, the entire infrastructure could be wiped out
1.1.1.4. Multi-layer Monolithic Terraform
Let's assume a simple cloud application composited by three layers: VPC, MySQL database, and stateless App. We design the project structure like this:
.
├── app
│ ├── main.tf
│ ├── outputs.tf
│ ├── terraform.tfstate
│ └── variables.tf
├── mysql
│ ├── main.tf
│ ├── outputs.tf
│ ├── terraform.tfstate
│ └── variables.tf
└── vpc
├── main.tf
├── outputs.tf
├── terraform.tfstate
└── variables.tf
Unlike the initial big mud ball, this time we split the infrastructure into 3 folders for separate management, with resources written to three independent state files. This way, each time we make changes, we only deal with resources in that layer, which can reduce the execution time of plan and apply operations and reduce the blast radius of our mistakes.
However, we still put all database resources under management in one state file. If operational errors occur, we might delete all database resources, leading to data loss.
1.1.1.5. Further Modular Decomposition
└── live
├── prod
│ ├── app
│ │ └── main.tf
│ ├── mysql
│ │ └── main.tf
│ └── vpc
│ └── main.tf
├── qa
│ ├── app
│ │ └── main.tf
│ ├── mysql
│ │ └── main.tf
│ └── vpc
│ └── main.tf
└── stage
│ ├── app
│ │ └── main.tf
│ ├── mysql
│ │ └── main.tf
│ └── vpc
│ └── main.tf
└── modules
├── app
│ └── main.tf
├── mysql
│ └── main.tf
└── vpc
└── main.tf
We further decomposed based on deployment environment dimensions, dividing into three environments: prod, qa, and stage. Each environment contains three modules: app, mysql, and vpc. The sub-modules under each environment all reference the same modules in the modules directory, allowing us to deploy similar infrastructure in different environments with identical deployment code but potentially different parameters, avoiding code duplication. We can first apply changes to the qa environment, and after verification, gradually promote them to the prod environment, achieving gradual infrastructure deployment.
In actual production environments, the above structure can be further divided by region. We can divide by deployment regions such as us-east-1, us-west-2, or by different cloud service providers such as aws, azure, gcp, further reducing the blast radius.
All the infrastructure deployment code we describe here is for a specific online service, such as an online payment service for an e-commerce platform. We can put this code in a Git repository named payment-service. An enterprise might have dozens, hundreds, or even thousands of such repositories, corresponding to the infrastructure of various internal services. Based on such large-scale Terraform Module structures, GitOps can be implemented: all changes to infrastructure are completed through Pull (Merge) Requests. Repository administrators, who are also service owners, review changes by examining code modifications and Terraform Plans generated by CI pipelines. If they agree to the changes, they merge the change request, and the pipeline automatically executes the audited changes.
Such management might seem complex, but it truly ensures that all infrastructure changes, regardless of size, are reviewed and have version control records. If issues occur, Terraform's capabilities ensure we can roll back to the most recent known correct state (of course, to prevent data loss, we need to do a lot of additional design work when designing modules).
1.1.1.6. How to Obtain These Modules
In the above examples, we described two different Terraform project structure styles in an extremely simplified process, from chaotic big mud balls to a large number of clear hierarchical modular matrices. So how can we obtain the modules that make up this matrix? This is the purpose of this book: from how to build a module, how to maintain a module, to how to build and maintain multiple modules at scale, and finally how to promote modular Terraform code in enterprises. Terraform reusable modules are somewhat different from many common open-source projects and require some special governance methods.
1.1.1.7. Target Audience of This Book
Readers who already have some understanding of Terraform and can proficiently use Terraform to create and manage infrastructure. If you want to further understand how to use Terraform modules to optimize your infrastructure management framework, or want to learn how to maintain a large number of proprietary Terraform modules at scale, then this book is prepared for you.
1.1.1.8. Conclusion
Let me put it this way: the only correct way to use Terraform to manage infrastructure is to break down massive infrastructure into such modular structures. We can put this modular code in Git repositories and manage it using GitOps methods. This way, we can deploy similar infrastructure in different environments, avoid code duplication, and achieve gradual infrastructure deployment. I have seen many teams consider using Terraform as an execution tool, with other tools, such as a GUI graphical tool, generating Terraform and HCL or JSON code, which is then executed by Terraform. This approach is an anti-pattern because Terraform Modules have many considerations in terms of usage, maintenance, and design, which is what this book hopes to discuss.
1.1.1.9. 马驰 Exclusion Clause
This e-book is published under the CC-BY-SA-4.0 license. Readers may freely read, quote, or use the content of this e-book within the scope permitted by this agreement, except for the following circumstances:
- The specific individual entity named "马驰" is prohibited from reading, quoting, or copying the content of this e-book
- Opening or saving the content of this book or offline copies on any device owned by the specific individual entity named "马驰" is prohibited
- The specific individual entity named "马驰" is prohibited from printing, transcribing the content of this book, or possessing non-digital copies of this book's content (including but not limited to books, handwritten copies, photographs, etc.)
- Opening or saving the content of this e-book or offline copies on any electronic device owned by enterprises or organizations that have employment relationships, equity relationships, or are associated with direct relatives of the specific individual entity named "马驰" is prohibited
All of the above circumstances will be considered infringement. If a reader is named "马驰" but is unsure whether they are the specific "马驰" individual entity prohibited by this clause, they may submit an issue in the book's GitHub repository to confirm with the author.
Forking and re-creation of this e-book follow the relevant provisions of the CC-BY-SA-4.0 License, but removal of this clause content is not permitted.