Virtualization

Bug in VSAN 6.2: De-dupe scanning running on hybrid datastores

Written July 25th, 2016 by
Categories: SDDC, Virtualization, VSAN
No Comments »

UPDATE
VMware has posted a KB about this, which I did not realize at the time of writing the blog. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2146267

We’ve been testing out VSAN here at work and noticed that one of the clusters we rolled out had serious latency issues. We initially blamed the application running on the hosted VMs, but when it continued to get worse we finally opened a case with VMware. Here’s a chart of the kind of stats we were seeing (courtesy of SexiGraf):

VSAN Cluster 1 before

Read latency in particular was very high on the datastore level, IOPS weren’t great, and Read Cache Hit Rate was low. We also saw that read and write latency was high on the VM level. After we opened a ticket with VMware, they discovered an undocumented bug in VSAN 6.2 where deduplication scanning is running even though deduplication is turned off (and actually unsupported in hybrid mode VSAN altogether). They provided the following solution:

For each host in the VSAN cluster:
1. Enter maintenance mode
2. SSH to the host and run: "esxcfg-advcfg -s 0 /LSOM/lsomComponentDedupScanType"
3. Reboot the host

After we applied the fix, the cluster rebalanced for a little while and came back looking much, much better. In the below graph, you can see right when the fix was applied and see read latency drop, IOPS increase, and read cache hit rate jump to the high 90-percents:

VSAN Cluster 1 after

And for good measure, this is how it’s looked since:

VSAN Cluster 1 after 2

So to summarize, if you are running hybrid VSAN 6.2, you should definitely check your latency and read cache hit rate. If you’re experiencing high latency and poor read cache hit rate, go through and change /LSOM/lsomComponentDedupScanType on all your hosts to 0. I can’t take credit for actually discovering this, so thank you to my coworker @per_thorn for tracking it down. And thank you @thephuck for letting me write it up on this blog!

@vDudeJon

vSphere Fault Tolerance Role Privilege names have changed from vSphere 5.5 to 6.0

I was playing in my lab today and ran across something I thought was strange. I exported the privileges from a test role in one lab, which happened to be vSphere 5.5, then tried to create a new role in vCenter 6.0 with the privileges I just pulled. It worked fine for almost everything, except these two:

Could not find Privilege with name 'Enable Fault Tolerance'.
Could not find Privilege with name 'Disable Fault Tolerance'.

I thought that was kind of strange, so I ran a quick

Get-VIPrivilege | ? {$_.name -like "*fault*"} | select Name,Id

and looked for something similar. Below is the comparison of 5.5 & 6.0:

vSphere 5.5
Name - Id
------
Turn On Fault Tolerance - VirtualMachine.Interact.CreateSecondary
Turn Off Fault Tolerance - VirtualMachine.Interact.TurnOffFaultTolerance
Disable Fault Tolerance - VirtualMachine.Interact.DisableSecondary
Enable Fault Tolerance - VirtualMachine.Interact.EnableSecondary
Query Fault Tolerance compatibility - VirtualMachine.Config.QueryFTCompatibility

vSphere 6.0
Name - Id
------
Turn On Fault Tolerance - VirtualMachine.Interact.CreateSecondary
Turn Off Fault Tolerance - VirtualMachine.Interact.TurnOffFaultTolerance
Suspend Fault Tolerance - VirtualMachine.Interact.DisableSecondary
Resume Fault Tolerance - VirtualMachine.Interact.EnableSecondary
Query Fault Tolerance compatibility - VirtualMachine.Config.QueryFTCompatibility

The difference is not drastic, but one simply word, or even one character, out of place will cause your script to fail. It’s easy to see that “Turn On” and “Enable” sound the same, so the need to rename “Enable” to “Resume” makes sense to me. Same with Disable & Suspend. These are just the two I know about, I really should write another article listing which ones have changed, but that’s for another day :)

Just something to watch out for I wanted to share.

Happy scripting!

VMware Virtual SAN Health failed Cluster health test

Written December 2nd, 2015 by
Categories: Virtualization, VSAN
No Comments »

Here’s the error

While building a new environment for my lab, I ran across an interesting thing yesterday.

I looked at my cluster’s VSAN health and saw this error:

It’s complaining that my hosts don’t have matching Virtual SAN advanced configuration items.

If you click on that error, you’ll see at the bottom where it shows comparisons of hosts and the advanced configurations:

It shows VSAN.DomMaxLeafAssocsPerHost and VSAN.DomOwnerInflightOps as being different between a few of my hosts. Looking at the image above, you’ll see node 09 has values of 36000 and 1024, respectively, while the other nodes 10-12 show 12000 and 0.

I immediately went to the host configuration advanced settings in the web client, searched VSAN and don’t see either of those. I even checked through PowerCLI and can’t see those: Read the rest of this entry »

VMware vSphere 5.5 Web Client authentication fails with ‘cannot connect to the vCenter Single Sign On server.’

Written August 28th, 2015 by
Categories: Virtualization
No Comments »

Earlier this week we were greeted with this awesome message:
webclient-fail-sso.

It’s so descriptive we knew exactly where to start! Okay, yeah, not really. Sarcasm aside, you’d think the culprit would be SSO. I began checking the two SSO servers we have in an HA configuration and they appeared fine. What’s even more strange is the fat clients were all authenticating fine. I started checking logs on the SSO servers and saw several things similar to this:

2015-08-25 23:20:49,538 INFO [ActiveDirectoryProvider] Failed to find user snip@snipPrincipal id not found: {Name: snip, Domain: snip} via ldap search
and
2015-08-26 00:29:37.709:t@21945040:ERROR: ldap simple bind failed. Error(4294967295)

So I assumed it was SSO again, maybe related to the domain we auth against.

Great! So now what?

Read the rest of this entry »

vSphere Replication 5.8 lets you violate VSAN Storage Policies

Written June 8th, 2015 by
Categories: Disaster Recovery, Virtualization
2 comments

I’m sure many of you know VSAN’s Failures To Tolerate, or FTT, is something that adds overhead to both your cluster & your data. It’s no secret FTT of 1 doubles your data, think of it as N+1 copies of your data. You could essentially have two, three, or four copies of your data, redundancy is a good thing!

When you look at the cluster side of it, there is another ‘gotcha’. The host needs becomes 2N+1. Let’s look at FTT of one, that’s saying you need 2(1)+1 hosts, so 2+1 = 3. And of course, FTT2 requires 5 hosts, and FTT3 requires 7.

What’s the problem?

Read the rest of this entry »

VMworld 2015 Public Voting is OPEN!

Written May 14th, 2015 by
Categories: Disaster Recovery, Virtualization
No Comments »

It’s that time of year again, and I know you’re as excited as I am, VMworld has opened up the sessions to public voting!

There are TONS of sessions to choose from, and I submitted a handful listed here:

vmworld-2015

The last one is something special with GS Khalsa, so please vote for us!

If you like any of these and would like to see these presentations on the big stage, please vote for them!

Go to http://vmw.re/1Pj6Lcc and you can either search for my last name or session ID.

Site Recovery Manager Error: Placeholder VM creation error: No hosts with hardware version 10

Written December 5th, 2014 by
Categories: Disaster Recovery, Virtualization
1 Comment »

If you attended my SRM session at VMworld 2014, or one of the VMUG User Conference sessions I’ve presented at, you’ve heard me talk about upgrading SRM and the entire infrastructure.

I stressed the importance of upgrading the Recovery/Target Site’s hosts before upgrading the Protected/Source Site’s hosts.

As you can imagine, upgrading out of order does happen, and I got to see exactly what happens when you’re in that situation. Well, not in a disaster, but in regular day-to-day tasks.

Have you seen this error when trying to protect a VM in SRM, specifically trying to create the placeholder?

Placeholder VM creation error: No hosts with hardware version ’10’ and datastore(s) [datastore01] which are powered on and not in maintenance mode are available

Read the rest of this entry »

Setting EMC’s RecoverPoint SRM SRA to Authenticate over SSL for 4.0 and 4.1

Written November 21st, 2014 by
Categories: Disaster Recovery, Virtualization
1 Comment »

You may or may not know that EMC’s SRA defaults to authenticating over non-SSL communication. It basically hits the RecoverPoint Appliance (RPA) on port 80 when doing anything. RPAs don’t handle a NAT, so chances are your devices are not publicly facing, at least I hope not!

It’s always a better idea to encrypt any traffic containing usernames and passwords, so why wouldn’t you do this? In RPA versions up to 4.0, they defaulted to non SSL, although they refer to it as non-https. RPA 4.1 no longer accepts port 80 and requires you to use 443, or https, or SSL encrypted, whatever you want to say.

This is great, until you try to add RPA 4.1 to a standard install of RecoverPoint’s 2.2 SRA. Why? Well, because it defaults to non-https and doesn’t give you an intuitive way to change it.

Adding RPA 4.1 to SRA 2.2 will give you this error:

“SRA command ‘discoverArrays’ failed. Failed opening session for user to site mgmt IP.
Please see server logs for further details.”

Check the vmware-dr.log and you’ll likely see something like

Error code=”1049″

and what’s funny is I even found “Ouch!” in the log, lol! I love when devs throw things like that out there.

Keep reading for the fix! Read the rest of this entry »

Storage vMotion Compatibility Failed – File is larger than the maximum size supported by datastore

Written October 6th, 2014 by
Categories: Virtualization
10 comments

We ran into an interesting issue last week.  While trying to migrate some VMs onto a replicated datastore for SRM, we got this interesting error:

File [source-datastore]path/to.vmdk is larger than the maximum size supported by datastore ‘target-datastore

file-larger-max-size-datastore

It’s strange because this is a 5.5 vCenter, 5.1 hosts, both datastores are vmfs5 and the vmdks were 60GB & 80GB. From what I could tell, everything was identical. Further testing revealed I could storage vMotion from the SRM datastore to regular datastores, but I could not move anything onto the SRM datastore.

After googling, I found the standard VMware response of ‘select a different datastore’, and further research didn’t show much else beyond the differences of the vmfs versions, and differences between vmfs5 among 5.0, 5.1, and 5.5. That’s nice and all, but none of these applied to me.

Restarted vCenter services on a whim, and guess what? It worked! LOL! Sometimes the strangest of issues can be fixed by the simplest of things.

Public Voting Now Open for VMWorld 2014!

Written May 12th, 2014 by
Categories: Disaster Recovery, Virtualization
No Comments »

When you get a chance, stroll on over to vmworld.com and click the link “Vote for your favorite proposals today”:

VMworld 2014

I’m bringing this up because I submitting a session, 1152 “vCenter Site Recovery Manager: Architecting a DR Solution”.

It’s pretty simple to find, it’s one of only two sessions for Site Recovery Manager at the Advanced Technical Level under Software Defined Data Center in the subtrack Storage and Business Continuity. An easy way to find it is set the keyword filter to “site recovery manager” and mine’s the first one:

Submission 1152 for VMworld 2014

The abstract states:

VMware’s vCenter Site Recovery Manager is the market-leading disaster-recovery management product. It ensures the simplest and most reliable disaster protection for all virtualized applications. However, it is not a turn-key DR solution. Architecting your SRM solution requires deep thought and heavy planning. This presentation will help you with planning and architecting your SRM solution as well as addressing specific configuration and installation challenges. Our goal is to help you deploy and maintain a solid SRM solution to enable your DR Plan.

The agenda will flow like this:

  • Basic SRM Functions
  • Planning
  • Designing & Architecting SRM
  • Deployment Considerations & Road Blocks
  • Maintaining, Migrating, & Upgrading

Everyone knows what SRM, and most understand what it does; however, very few actually understand how it works. This session will briefly cover how each function of SRM works, but will go deeper into how to plan & architect your DR solution leveraging SRM. There are several design considerations you need to keep in mind when planning & building it out, and I’ll also touch on migrating & upgrading your SRM installation to make sure you’re protected, even during an upgrade.

If this looks like something you’d like to see at VMworld 2014, please sign in and vote!

Thanks, and happy scripting!

Designed by ThepHuck
Wordpress Themes
Scroll to Top