VMware Site Recovery Manager & Active Directory – Part 1 – Testing Recovery Plans with Active Directory

To include Active Directory or not to include Active Directory, that is the question.

I’ve been reading a lot around VMware’s Site Recovery Manager and considerations surrounding Active Directory. Most of what you will read says ‘NEVER’ protect AD with SRM, only use native AD replication, especially since SRM & vCenter at your Recovery Site require AD to be running anyway.

But what if you have multiple domains for different uses? This is where the lines become blurred. Think about this for a second:

One AD environment (single forest/domain, no trusts) where vCenter & SRM live, call it infrastructure AD
A second AD environment (also single forest/domain, no trusts) for your application servers, call it application AD
You have infrastructure AD at both sites, SRM & vCenter authenticate accordingly
Protected site has application AD
Recovery site has nothing

Now here is where I say ‘why wouldn’t you protect AD with SRM?’ In a true disaster, the protected site is gone, no AD exists anywhere, so using SRM to bring them up on the recovery site makes sense. Is my logic flawed?

However, if I had my application AD living at both sites, using native replication, I agree 100% in not including your Domain Controllers in your SRM Recovery Plan. This leads to my concern…

Testing vs Planned vs Unplanned

This post will cover testing only. I’ll write a follow-up covering planned & unplanned failovers later.

To me, the only way to really test your DR plan (in this instance, your SRM Recovery Plan) is to not have anything different between them.

Let’s look at it from the perspective of having nothing at the recovery site, basically if I decided to use a DR service provider as my target site. We each have our own vCenter servers, my SRM server is paired with their SRM server, I have AD at my site, and the DRaaS provider has AD at their site. Microsoft doesn’t officially support protecting DCs with SRM, although it’s really no different than losing power at a datacenter and bringing the DCs back up after power has been restored. There are now two main considerations: Active Directory integrated DNS, or standalone DNS.

Active Directory integrated DNS

The main risk here is there is a slight possibility Active Directory services could enter a race condition when DNS is AD-integrated. It’s kind of like the ‘chicken and the egg’ argument, one kind of depends on the other. AD relies on DNS, and DNS won’t be up unless AD is running.

I’ve talked with Microsoft regarding this, and although their recommendation is not use SRM for AD DR, they did say it’s a fairly easy fix if you happen to end up in this race condition. You would have to enter Directory Services Restore Mode and basically pull DNS out. I really don’t know how common this is. How many of you have AD labs running where the rug gets pulled out from underneath them? I’ve had multiple labs with AD-integrated DNS and have NEVER had this problem (I bet I will now since I dropped the ‘NEVER’ word, HA!).

When building your SRM Recovery Plan, you’ll want to make sure your PDCE boots up first (for good measure). You could accomplish this in multiple ways:

Place your PDCE in Priority Group 1, then the rest of the DCs in Priority Group 2, everything else in remaining Priority Groups
Place all of your DCs in Priority Group 1 and set all of the non-PDCE DCs to depend on the PDCE

Active Directory with Standalone DNS

This is desirable in reference to the possible race condition, but really depends on your environment. Your SRM Recovery Plan will be similar ton AD-integrated DNS, but you will need to add an additional step or dependency:

Place your DNS server & PDCE in Priority Group 1, set the PDCE to depend on the DNS server, then the rest of the DCs in Priority Group 2
Place all of your DCs in Priority Group 1 and set all of the non-PDCE DCs to depend on the PDCE, and the PDCE to depend on the DNS server

Active Directory in both sites

Now this is truly the best way to handle DR with Active Directory, let AD do it’s native replication across your sites. If you lose your protected site, your recovery site already has AD running before you ever hit the Recovery button in SRM. I don’t need my Domain Controllers in my Recovery Plans now, right? Wrong, well, maybe, maybe not. It really depends on how you want to do it. The main issue is to be able to have your test VMs authenticate to AD, but NOT your Production Active Directory. There are basically two ways to test your recovery plan:

Including Active Directory Domain Controllers in Recovery Plans

Maintain two different Recovery Plans; one for testing, one for DR
Requires a separate Protection Group with Active Directory Domain Controllers to facilitate running DR Recovery Plans without the DCs
Testing recovery plan includes your DCs from the Protected site
Be certain you’re testing in a network that CANNOT talk to the production DCs
Your DR Recovery Plan should NOT include your DCs, and the VMs should land on your production network

Leaving Active Directory Domain Controllers out of Recovery Plans

Only need one Recovery Plan with your protected VMs
Running a test requires you to clone a Global Catalog DC into your test network, then testing your Recovery Plan in that same test network
Running a DR failover is no different from the test, except you don’t need to clone a DC

So why is testing different than a failover?

Microsoft actually recommends the cloning of Domain Controllers into the test environment, then destroying when done. When you’re running a test, you have duplicate computer accounts, and possibly duplicate Domain Controllers on the network. This poses a problem because if your test VM changed it’s computer account password in your production AD, it could replicate that and your production VM now has a bad password. There’s also an issue where AD is trying to replicate with a DC that is a test DC, and possibly cause a USN roll back.

This is why testing SRM with Active Directory is very touchy. You need to make sure you have everything in order, and test, TEST, TEST!

Feel free to comment or email me, this is a touchy subject, so I’m very open to discussion and interested in what others are doing here.

Watch for Part 2!

ThepHuck