Starting Soon! Want to secure workload access to LLMs like ChatGPT? Join Our Webinar | Today at 1 pm. PT

Aembit Earns Prestigious Runner-Up Spot at RSA Innovation Sandbox Contest! Watch the Announcement

RSAC™ Innovation Sandbox FINALIST 2024 banner
Aembit is an RSA Conference Innovation Sandbox finalist! Read the news

How to Migrate AWS PostgreSQL RDS to Aurora Using Terraform

AWS RDS to Aurora Migration Using Terraform with screenshot

As part of the Aembit Workload IAM Platform, we make use of RDS PostgreSQL instances deployed in AWS for online transaction processing. Originally, we deployed a primary RDS instance in a single region, with one or more read-replica instances deployed in the same region for resilience. Additionally, we replicated our database snapshots to an alternate region to facilitate disaster recovery.

With this architecture, we observed two primary limitations:

  • No automatic failover between RDS instances. In the case of primary instance failure, our monitoring infrastructure needed to detect and report on the failure, and then we needed to promote one of the read replicas to be the primary instance. We  also needed to redirect our software to connect to the newly promoted instance. Automation of this process required that we develop and maintain our own system to perform these operations in the event of RDS failure.

  • No multi-region redundancy. If AWS experienced a regional outage, we needed to deploy a new RDS instance in an alternate region using a stored backup.

Given these concerns, we began the process of migrating our AWS RDS PostgreSQL instances to AWS Aurora. In this post, we’d like to share a bit about our thought process, and some of the challenges we encountered during this exercise.

With respect to the limitations described above, Aurora provides:

  • Automatic failover between instances. Within a single region, if a failure is detected on a primary Aurora DB instance, failover to a replica Aurora DB instance is automatic. The Aurora cluster within a region provides a single endpoint for our software to connect to, so software requires no modification to maintain connectivity with the Aurora cluster.

  • Multi-region redundancy. Aurora provides the option of a global cluster, whereby we can deploy Aurora in multiple regions. Failover to another region is not automatic, but switching to an active Aurora cluster in another region obviates the need to restore a DB instance from backup, which should streamline our disaster recovery scenarios.

We did have some concerns about Aurora compared with RDS PostgreSQL, including:

  • Cost. Aurora is not ideal for smaller DB deployments because the minimum DB instance type supported is ‘db.t3.medium.’ For global clusters, the minimum instance requirement is one of the ‘db.r5’ or ‘db.r6’ memory-optimized instance types.

  • Automated cross-region backups. AWS RDS PostgreSQL supports replication of backups to an alternate region. With Aurora, cross-region backups cannot be configured through Aurora directly, but must be implemented through Lambda, systems manager automation, or AWS Backup.

  • No upgrade logs can be published to Amazon CloudWatch, only PostgreSQL logs.

  • Delays. Occasionally, we see the Aurora PostgreSQL engine version lag behind AWS RDS PostgreSQL, but this has not been considered to be a severe limitation.

Terraform Migration Process

Our current AWS RDS infrastructure is fully managed through Terraform, and for the migration to Aurora, we had the following requirements:

1) Existing RDS data must be preserved.

2) The Aurora infrastructure must be wholly managed through Terraform templates.

For point No. 2, we understood that several  manual steps may be required during the migration process, but after the migration is complete, there should be no further manual intervention required – the resultant infrastructure should be fully managed through Terraform.

Migration Steps

Make a Backup

Make a backup of the existing RDS database before migration (‘Take Snapshot’ from the ‘Actions’ menu).

Deploy an Aurora Cluster as a Read Replica

Deploy an Aurora cluster with a single non-redundant instance and attach it to the existing RDS instance as a ‘read-only’ replica.

Aurora will only allow a single instance cluster to be attached as a replica – we cannot deploy a multi-instance Aurora cluster for this purpose. On the AWS console, this action can be performed by selecting the existing RDS instance, and under the ‘Actions’ menu, select ‘Create Aurora read replica.’ This will create both the Aurora cluster, as well as a single read-only Aurora instance associated with the cluster.

When creating the Aurora replica, select the same VPC and subnet group as the original RDS instance.

We will eventually need to import this Aurora cluster into our Terraform configuration, so be sure to set the backup retention period for the Aurora cluster to the same number of days that you plan to configure in Terraform for this cluster – the import operation will fail if the backup retention period doesn’t match. For instance, a sample Terraform definition of an Aurora cluster would look like this:

In this example, since ‘backup_retention_period’ is 5, select 5 days when creating the Aurora read replica.

The existing RDS parameter groups cannot be used with Aurora – we must create Aurora-specific parameter groups – so just select the default Aurora parameter groups at this time. Later we will create the correct parameter groups in Terraform.

Make RDS Read-Only

Break existing connections to RDS, and put RDS in read-only mode.

RDS can be put into read-only mode by modifying the parameter group: default_transaction_read_only = true

Existing connections to RDS can be broken by rebooting the RDS instance.

Monitor RDS Activity

Monitor RDS activity to ensure that replication to the Aurora cluster replica is completed.

Watch the ‘ReplicaLag’ metric to see it approach zero, which indicates that replication has completed. However, since we have blocked write access to RDS, ReplicaLag will rise and fall every five minutes due to the lack of activity on the database.

Promote the Aurora Cluster

‘Promote’ the Aurora cluster replica to be a standalone Aurora cluster. This breaks the replication with the original RDS instance and places the Aurora cluster in read-write mode.

We first attempted to configure the Aurora read replica and promote it using Terraform. However, the promotion operation is not possible through Terraform. Therefore all steps up to this point have been performed in the AWS Management Console and not executed through Terraform.

Import Aurora Resources into Terraform

Define a new Aurora cluster and Aurora cluster instance resources in your Terraform configuration. At this time, you can also define a Terraform resource to create an Aurora parameter group which can be configured for the Aurora cluster (replacing the default parameter group that was used when first creating the Aurora read-replica in the management console). Be sure that the family is of the ‘aurora-*’ variety and that the version matches your DB engine version.

Import the newly promoted Aurora cluster into your Terraform configuration, using the ‘terraform import’ command.

Since the ‘count’ meta-parameter is specified in the definition for ‘aws_rds_cluster_instance,’ the import command must provide the array index.

Complete the Migration

At this point, we have successfully migrated our RDS instance to a single-region Aurora cluster, with all data intact. In our environment, we had to finalize the process with the following additional steps:

1) Update software connection strings to reference the Aurora cluster endpoint instead of the original RDS endpoint.

2) Update CloudWatch alerts to reference Aurora metrics instead of RDS metrics.

3) Configure AWS Backup in Terraform to replicate Aurora cluster snapshots to a different region.

In addition, to adding a replica Aurora instance in the same region, you can define an additional Terraform resource for an Aurora cluster instance and associate it with the Aurora cluster in the Terraform configuration by specifying the same cluster identifier. This example is similar to the example used above, but with a different resource name and identifier.

Next Steps and Conclusion

To recap, we did the following:

  • Deployed an Aurora cluster as a read-replica of our existing AWS RDS instance.
  • Configured our existing AWS RDS instance as read-only and broke all existing connections to the instance.
  • Ensured that all existing RDS data was synchronized with the read-replica by monitoring the ‘ReplicaLag’ metric.
  • Promoted the Aurora read-replica cluster to be a standalone cluster.
  • Imported the Aurora cluster and Aurora instance into our Terraform configuration.

Now that we have successfully migrated our RDS instance to a single-region Aurora cluster, we plan to configure an Aurora global cluster based on our newly created Aurora cluster to provide cross-region redundancy. This activity will be covered in a future blog post.

Jason Kwon is a principal DevOps engineer at Aembit.

You might also like

Aembit Workload IAM extends RBAC by grouping and isolating non-human resources and policies within an organization or tenant.
As organizations emphasize safeguarding non-human identities, you must balance immediate security measures with long-term oversight and compliance.
Sticky note security now plagues application and service connections, necessitating a shift to more mature workload access safeguards.