Storing the Terraform State File in Remote Backend (S3 bucket)

Storing the Terraform State File in Remote Backend (S3 bucket)

In this article let us build a simple terraform script to create an EC2 instance (you can create any resource of your choice) and then let us store the state file in S3 bucket.
Storing Terraform state files in an S3 bucket is a recommended best practice because it provides a central location for storing and managing your infrastructure’s state files. Here’s a step-by-step guide on how to store a Terraform state file in an S3 bucket:
Prerequisites:
Install Terraform on your local machine.
AWS account with the necessary IAM permissions to create S3 buckets and manage EC2 instances.

Step 1:
Create an IAM user (for example ‘SDM-TerraformStateInS3’).

Note:
While creating the IAM user do not attach any policy and do not add the user to any group. Just create the IAM user with a password of your choice.

Step 2:
Create a S3 bucket for example (‘sdm-terraform-state-bucket-1’) manually in the region ‘ap-south-1’. You can choose any region of your choice.

Note:
In your backend configuration file ‘backend_config.tf’ (which will be created in Step 8), it’s a good practice to specify the same region where you have created the S3 bucket. This helps ensure that your Terraform backend configuration aligns with the region where the bucket is located, which can help avoid potential issues related to region mismatch.
By specifying the same region as your S3 bucket in the backend block in the file ‘backend_config.tf’, you ensure that Terraform communicates with the correct S3 bucket in the designated region when storing and retrieving the state file. This alignment between your Terraform backend configuration and the region of your S3 bucket helps maintain consistency and ensures that Terraform functions as expected.

Step 3:
Create a DynamoDB table (for example ‘SDM-terraform-lock’) for state locking with the Partition key ‘LockID’ and its type ‘String’.

Note:
When creating a DynamoDB table for use as a Terraform state lock, it’s important to ensure that the table is created in the same region that you specify in your Terraform backend configuration (backend “s3” block in the file ‘backend_config.tf’ which will be created in Step 8) to maintain consistency. Terraform will interact with the DynamoDB table in the specified region to manage state locks.

Step 4:
With the help of JSON code create your own ‘Customer managed’ IAM policy (for example ‘SDM-Terraform-S3’) for S3. The JSON code for the policy is as follows:

Then the created policy looks like this in the list of policies in IAM service of AWS:

Note:
Instead of creating our own ‘Customer managed’ policy we could attach ‘AWS managed’ policy ‘AmazonS3FullAccess’ to the IAM user. But the AWS managed policy ‘AmazonS3FullAccess’ provides full access to Amazon S3 resources, allowing users to perform a wide range of actions on S3 buckets and objects. If your primary goal was to store Terraform state files in an S3 bucket and manage infrastructure with Terraform, attaching this policy would be useful and sufficient for storing Terraform state files in S3.
Here’s why it’s useful:
S3 Bucket Operations: The ‘AmazonS3FullAccess’ policy grants permissions for various S3 bucket operations, including creating, listing, deleting, and updating buckets. These permissions are necessary for creating and managing an S3 bucket for Terraform state storage.
Object Operations: The policy allows users to perform actions on S3 objects (files) within a bucket, which includes the ability to upload, download, and delete objects. This is important for managing the Terraform state file in the bucket.

However, it’s essential to consider the principle of least privilege when granting permissions. While ‘AmazonS3FullAccess’ provides broad access to S3, it may grant more permissions than strictly necessary for your use case. For security best practices:
Use a More Specific Policy: If possible, create a custom IAM policy tailored to the specific actions and resources needed for your use case. This allows you to grant only the required permissions, reducing the potential attack surface.
Consider State Locking: If you plan to use Terraform in a collaborative environment with multiple users, consider using Terraform’s state locking feature, which uses DynamoDB to manage locks. Ensure that your IAM policies grant appropriate permissions for DynamoDB if you implement state locking.
Regularly Review and Audit Policies: Periodically review and audit your IAM policies to ensure they align with your current infrastructure and security requirements. Remove unnecessary permissions and ensure that permissions are granted on a need-to-know basis.

Hence, ‘AmazonS3FullAccess’ can be useful for managing Terraform state files in an S3 bucket, but it’s essential to review and fine-tune your IAM policies to meet your specific security and infrastructure needs while adhering to security best practices.

Step 5:
Attach this policy ‘SDM-Terraform-S3’ to IAM user ‘SDM-TerraformStateInS3’.

Step 6:
Manually create an AWS EC2 Ubuntu instance (for example ‘SDM-Terraform’) with instance type ‘t2.micro’ in the region ‘ap-south-1’.

Then ssh into it.
Create a directory ‘S3’.
mkdir S3
Navigate into the directory ‘S3’.
cd S3

Step 7:
Install Terraform in it using the following commands.
Update the system:
sudo apt-get install unzip

Confirm the latest version number on the terraform website given below:
https://www.terraform.io/downloads.html

Download latest version of the terraform (substituting newer version number if needed):
wget https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_linux_amd64.zip

Install unzip:
unzip terraform_1.5.7_linux_amd64.zip

Extract the downloaded file archive:
unzip terraform_1.5.7_linux_amd64.zip

Move the executable into a directory searched for executables:
sudo mv terraform /usr/local/bin/

Confirm the installation:
terraform –version

Step 8:
Create a backend configuration file ‘backend_config.tf’ with the following content inside the directory ‘S3’.

Step 9:
Create an infra (EC2 instance) configuration file or resource definition file ‘ec2.tf’ with the following content inside the directory ‘S3’.

Note:
Keep both the files ‘backend_config.tf’ and ‘ec2.tf’ in the same directory ‘S3’.
In this setup, ‘backend_conf.tf’ contains the Terraform backend configuration for state storage, while ‘ec2.tf’ contains your EC2 instance resource definitions.

Note:
We can combine the contents of both ‘backend_conf.tf’ and ‘ec2.tf’ into a single Terraform configuration file. Here’s how you can structure it:

Note:
In this combined file:
The provider block configures the AWS provider.
The resource block defines an AWS EC2 instance.
The terraform block includes the backend configuration for S3 and DynamoDB.
This single file contains both the resource definition and the backend configuration, and you can use it to create the EC2 instance and manage the state in S3.
The earlier provided guidance with two separate files is based on the typical recommended project structure and best practices for Terraform configuration management. Using separate files for backend configuration and resource definitions is often recommended for the following reasons:

  1. Modularity and Organization: Separating backend configuration from resource definitions allows you to keep your infrastructure code organized and modular. It’s easier to manage different aspects of your configuration in distinct files, making it more maintainable, especially in larger projects.
  2. Collaboration: In collaborative environments, different team members may be responsible for different parts of the configuration. By having separate files, team members can work on the backend configuration independently of resource definitions, reducing conflicts when merging changes in version control.
  3. Flexibility: Separating backend configuration enables you to reuse resource definitions across different environments or projects while changing only the backend configuration as needed.

However, using a single file can be a valid approach for smaller or less complex projects, as it simplifies the file structure. The choice of whether to use separate files or a single file depends on the specific requirements and complexity of your project, as well as your personal preference for organization.
The above provided guidance on both approaches gives you flexibility and helps you choose the one that best suits your needs.
If you go with two separate files for adopting best practice then that sounds like a good choice! Separating your Terraform configuration into two separate files for backend configuration and resource definitions aligns with best practices, especially as your projects grow in complexity or involve collaboration with multiple team members.

Step 10:
Run the following commands:
terraform init
terraform plan
terraform apply

Now go to the service ‘EC2’ in AWS and go to ‘Instances’. Then in list of instances you will see an instance with the name ‘SDM-TestTfStateinS3’ in Running state.

Also if you go to the service ‘S3’ in AWS, if you go to ‘Buckets’ and if you go to the bucket ‘sdm-terraform-state-bucket-1’ then you will be able to see the terraform state file in that bucket.

Note:
We could store the state file in GutHub as well. But it has some drawbacks.
Yes, you can store your Terraform state file in a version control system (VCS) like GitHub, but it’s generally not recommended for several reasons:

  1. Concurrency and Locking: VCS systems like GitHub do not provide built-in mechanisms for handling concurrent access and locking of the state file. In a collaborative environment, multiple team members could attempt to modify the state file simultaneously, leading to conflicts and potential data corruption.
  2. Performance: Terraform state files can become large and contain sensitive information. Storing them in a VCS can impact the performance of the repository and could expose sensitive data if not properly protected.
  3. Versioning: VCS systems are designed for source code versioning, not infrastructure state. Managing state in a VCS can become unwieldy as your infrastructure grows and changes.
  4. Security: Storing sensitive information, such as secrets or access keys, in a VCS is generally discouraged due to security concerns. State files may contain sensitive information, and their exposure should be minimized.
  5. Ease of Collaboration: Remote backends like Amazon S3 and others (eg, Azure Blob Storage, Google Cloud Storage) are specifically designed for storing Terraform state files. They provide features like state locking and access control, making it easier for teams to collaborate safely.

Using a remote backend like S3 or an equivalent is recommended because it addresses these concerns and is purpose-built for managing Terraform state. It provides a secure, centralized, and scalable solution for storing and managing state files, especially in team environments.
While you could store Terraform configurations in a VCS like GitHub, it’s generally better to use a remote backend for managing the state files. This separation of concerns allows you to benefit from the strengths of each tool: VCS for code collaboration and versioning, and a remote backend for state management and collaboration on infrastructure changes.
Hope this article helps you to understand the purpose of storing Terraform state files into remote backend such as S3 buckets. Please comment if you have any suggestions or improvements.