AWS Consulting Sydney

Extending the CIDR range of VIF attached VPC (Case study)

Setting up an AWS account and VPC is obviously straightforward and very flexible.

However, one thing to be aware of is that CIDR ranges can’t be changed once the VPC or the subnet is set.

We have a customer use case whereby the production VPC is sitting on a small CIDR block (10.100.100.0/24). Due to a variety of reasons, this VPC not only has a small address range but also needed to include 33 subnets. As a result, only 91 addresses, in total, can be used. The client corporate network is linked to this VPC via a Virtual Interface(VIF) of a ISP provided Direct Connect link. After launching this environment, new services keep being deployed in this VPC. Shortage of  IP resources becomes a big restraint within the environment. An additional constraint was that the same underlying private IP addresses needed to be maintained.

The plan of the CIDR range upgrade was:

  1. Create a new VPC (VPC-Lrg) in the same account with the same but larger address range (10.100.100.0/22),
  2. Restore all services in the VPC-Lrg,
  3. Detach the VGW from the old VPC (VPC-Sml)
  4. Attach the VGW to the VPC-Lrg.

extend VPC ip range

Before applying the change to the production environment, we carried out testing in the UAT account.  The test account has the same setup as the production account.  Deploying the VPC-Lrg beside the VPC-Sml, flipping over VGW from VPC-Sml to VPC-Lrg were done by the cloudformation template without any problem. However, we couldn’t reach the test server sitting in VPC-Lrg from corporate LAN after switching the VGW.

All settings in these two VPCs are same, the larger address range is in the office firewall whitelist and the VIF status is OK. But the instance in VPC-Lrg couldn’t be reached. Also, there isn’t any traffic from on-premise test client to the test server recorded in the VPC-Lrg flowlogs. Then we did few traceroute tests to locate the problem as below:

Environment introduction:
  • 10.50.50.50 is the on-premise test client;
  • 10.100.100.5 is test servers’ address in both VPC-Lrg and VPC-Sml;
  • 169.254.247.9 is the VIF peer IP, user side

First, the VGW is attached back to the VPC-Sml. This test is for comparison as we know it works. As expected, the traceroute was successful.

[10.50.50.50] traceroute 10.100.100.5
traceroute to 10.100.100.5 (10.100.100.5), 64 hops max, 72 byte packets
1  10.50.50.1 (10.50.50.1)  0.562 ms  0.313 ms  0.292 ms
2  10.186.0.1 (10.186.0.1)  0.389 ms  0.411 ms  0.406 ms
3  10.98.244.221 (10.98.244.221)  0.662 ms  0.693 ms  0.727 ms
4  10.186.6.249 (10.186.6.249)  15.250 ms  15.469 ms  17.010 ms
5  10.186.6.250 (10.186.6.250)  15.618 ms  15.510 ms  16.669 ms
6  10.186.6.1 (10.186.6.1)  15.615 ms  15.616 ms  16.351 ms
7  10.186.6.212 (10.186.6.212)  15.887 ms  15.923 ms  16.049 ms
8  10.186.6.238 (10.186.6.238)  16.049 ms  16.063 ms   16.236 ms
9  169.254.247.9 (169.254.247.9)  16.636 ms  16.619 ms  16.197 ms
10  *^C

From the results above, we can see the package reached the user peer IP of the VIF connection. In another word, the package arrived the VPC-Sml. This is what the route path should look like in this environment.

Then we attach the VGW to VPC-Lrg and run the same traceroute again.

[10.50.50.50] traceroute 10.100.100.5
traceroute to 10.100.100.5 (10.100.100.5), 64 hops max, 72 byte packets
1  10.50.50.1 (10.50.50.1)  0.762 ms  0.315 ms  0.297 ms
2  10.186.0.1 (10.186.0.1)  0.400 ms  0.421 ms  0.683 ms
3  10.98.244.221 (10.98.244.221)  1.372 ms  13.981 ms  1.130 ms
4  100.68.1.2 (100.68.1.2)  15.520 ms  14.459 ms  14.010 ms
5  100.68.1.1 (100.68.1.1)  15.420 ms  27.510 ms  14.669 ms
6  10.186.62.250 (10.186.62.250)  14.615 ms  14.404 ms  15.251 ms
7  * * ^C

In the result, we can see the package couldn’t reach the VIF. As the package didn’t hit the VPC-Lrg at all, no wonder we couldn’t find the trace in the VPC-Lrg flowlogs.

Thus, according to above tests, switching the VGW did purge the old route from the main switch but the switch didn’t pick up the new route for some reason.

After doing troubleshooting with the ISP engineer, he found out the CIDR range in the prefix list on the main switch is still 10.100.100.0/24 and the switch didn’t receive the updated route from AWS side.

At this stage, there are two possibilities.

  1. The VGW advertised its route after attaching but the ISP switch ignored it as the advertised CIDR is larger than the one saved in the prefix list.
  2. The operation of flipping VGW didn’t trigger the route advertising.

Regardless which possibility it could be, the prefix list needs an update and recreating VIF is commonly recommended. The process of extending the VPC IP range becomes per diagram below.

best-practice-extend VPC CIDR range

To wrap up, the recommended steps of extending the CIDR block of a VIF attached VPC are:

  1. Create a new VPC in the same account with the same but larger address range,
  2. Do a proper backup
  3. Restore all services in the NEW VPC,
  4. Detach the VGW from the OLD VPC
  5. (Optional, if the prefix list is set at customer/ISP side) Update the prefix list.
  6. Re-create the VIF
  7. Attach the VGW to the NEW VPC

Creating a new VPC on a different CIDR block and peering it with the live VPC may be easier. The best part is, peering VPC requires no down time of the live environment.

P.S. The official resolution of changing IP range is here.

https://aws.amazon.com/premiumsupport/knowledge-center/vpc-ip-address-range/