Okay, so now that we’ve tested different OS installations, now it’s time to test the real purpose we acquired these blades for: Virtualization

A little info on the hardware: Cisco N20-B6620-1, dual Xeon E5540s, 24GB of RAM, and two 73gb drives

We’re using VMware ESXi 4.0u1 for our testing, and booting from the SAN. Yes, I know, it’s only still experimental with vSphere, I don’t like it, but that’s the path I was lead down by my superiors.

We’ve got vCenter running, 8 hosts (2 chassis with 4 blades in each), and everything seems fine, so I lay down Windows 2008 x64 in a guest with 2 vCPUs and 4GB of RAM, and it loaded in the expected amount of time, nothing spectacular here.

After I had this lone guest running, I decided to try some load testing from within the guest. I simply googled for powershell load generation and stumbled upon this: Measure-Command {$result = 1; foreach ($number in 1..2147483647) {$result = $result * $number}} taken from Here

Since it isn’t multithreaded, I needed something more to work my second vCPU. Using similar logic, I created this: $calc = 1; foreach ($number in 1..2147483647) {$calc = invoke-item c:\windows\system32\calc.exe}

This line basically instantiates 2,147,483,647 instances of calc.exe, which I later discovered didn’t work. After testing many times, the most I could get of any process was 852 (whether it was notepad, paint, or calc).

Okay, so back on topic, I opened two powershell windows and ran each of the above in their own window. This quickly pegged CPU and ate up all memory. I decided to leave this overnight to see how it did, but upon returning to work this next morning I discovered something a bit odd: two UCS blades were listed as Not Responding in vCenter.

One host was where my lone VM lived, the other seemed random. After some investigation, I discovered that right around 8 hours, BOTH hosts dropped offline. A ping to the ESXi host resulted in strange latency numbers and dropped pings. Now that was an odd coincidence, so I powecycled both blades, made sure they were online again, brought up my lone guest and started my powershell load gens again.

Much to my surprise, 8 hours later I had the two hosts offline again, the EXACT same two as before. I brought them back online, moved my lone guest to a different blade and tried again with the same results.

Now, why would a single lone guest impact the hardware in this way? Cisco couldn’t tell me either. I worked with Cisco back & forth on this, and they really didn’t have an answer right away. I mean, srsly, I wasn’t maxing out resources at the host level, only my puny 2vCPU/4GB guest. There was no I/O, neither network nor storage, just simple CPU cycles and RAM utilization.

Now, you’ve got to ask yourself one question: Do I f…wait, wrong question, here we go: Is this something I really want running in my production datacenter? My answer: HELL NO!!!!!