On May 25th, I published this post covering some scenarios on how to use Site Recovery Manager & Active Directory. Michael White from VMware responded with some good info. He had an awesome suggestion of using a script to cold clone a DC daily to use for testing.
Let’s take a look at some ways we can get this done:
Cloning a Domain Controller for SRM Recovery Plan testing
In the instances described below, there are a few expectations:
- You have a dedicated test network that cannot communicate with production systems
- You have at least one DC running at the Recovery Site (two or more is preferred)
- Your Active Directory is running native replication to your Recovery Site
- AD Change Notification is enable – Optional, but speeds up change replication
- Lastly, a PowerShell PowerCLI script to do the cloning for you – this wouldn’t be right without some PowerShell!
It is best to have more than one Domain Controller at the Recovery Site. Why? When cold cloning the VM, it will be powered off. If you only have one DC at the Recovery Site, your AD at that site will be 100% offline. True, it’s only for as long as the VM is offline for the clone, but I don’t like the idea of being vulnerable, if even for a few minutes. Here’s my recommendation: build a mirror of your Protected Site and make sure all DCs are Global Catalogs. If this is not feasible, have at least two DCs at the recovery site, both being Global Catalogs.
Now that you’ve got your Recovery Site AD all set up, your DC for cloning should not be your bridgehead. This way, while your DC is offline for cloning, the two AD sites are still replicating data, and when the DC comes back online, it replicates locally with it’s bridgehead. It’s just an additional step to keep our vulnerability as low as possible. If you have several DCs, all the others are still replicating because the bridgehead is still up.
What your script should do
Perhaps another blog post is in order with this script! In the mean time, here are some steps it should accomplish:
- Issue graceful shutdown the VM
- Check back in 5-10 minutes to see if it’s powered off
- If not, wait another 5-10 minutes and check again – You don’t want to power it off, since you want a clean file system
- Check to see if cloned DC already exists, if yes, power off & delete from disk
- Initiate clone of VM
- Validate cloned DC is on test network
- Power on VM
Now that we have a script, how and when should we run it?
Running the clone as part of your Recovery Plan
This seems logical because you want the most up-date-date Active Directory database. You could follow Mike Laverick’s guidelines in Chapter 11 – Custom Recovery Plans – Adding Additional Steps to a Recovery Plan from his Administering VMware Site Recover Manager 5.0 book (found here), he has some good info. In short, he has a redirect.cmd script that is used to call the ps1 scripts. You’d want this to run before powering on your VMs, and can do this wherever you want in the Recovery Plan, but keep in mind the timeout. Ideally, you’ll want to run the script outside of SRM to see how long it’s going to take, and pad it slightly. If it takes only 15 minutes, you may want to set your timeout to maybe 20 minutes. If it succeeds earlier, that’s fine, but it can cause your test to fail if it takes longer.
Why wouldn’t you run this way? Well, because of that timeout. There is potential for your test plan to increase by a large amount of time. If you only have a few VMs in your test plan, it may only take 5 minutes to run. Throw in a DC clone and it may quadruple the test plan time, you may also need a script to power off & delete the DC VM when done, too, but that doesn’t add much time. You don’t want to sit around waiting for your test plan to spin up your DR test, which leads me to this…
Scheduling the clone script regularly
In my opinion, this is ideal. Chances of a computer account changing it’s password right before your test are slim, although possible. You can schedule the script to run nightly, or whatever frequency you choose, and schedule your tests around the cloning time. Now, whenever you run a test plan, there’s a DC already waiting for you on your test network.
The principal draw back to this method is there’s a potential for you to be down while a DC is down being cloned. Preferably, you’ll have more than one DC at the Recovery Site, so the AD state will be degraded and not offline.
As always, feel free to comment or email me, this is a touchy subject, so I’m very open to discussion and interested in what others are doing here.