Getting Started with VMware vSAN: Hybrid or All Flash?

November 17th, 2016
Categories: VSAN
Everyone hears about VMware’s Virtual SAN and how awesome it is. It’s a very compelling offering and is only overshadowed by their software defined networking solution NSX.

The biggest hurdle: how to get started.

The truth is it’s extremely simple to enable and start using, but that’s not the “getting started” I’m talking about. I wanted to cover off some things to think about when you’ve decided you’re going down the VSAN path.

How do you know how many IOPS to expect, or how much storage you will have or need, should you go hybrid or all flash, and what resiliency or protection options you have, and the impact of those.

First things first: Hybrid or All Flash?

Bug in VSAN 6.2: De-dupe scanning running on hybrid datastores

July 25th, 2016
Categories: SDDC, Virtualization, VSAN
VMware has posted a KB about this, which I did not realize at the time of writing the blog.

We’ve been testing out VSAN here at work and noticed that one of the clusters we rolled out had serious latency issues. We initially blamed the application running on the hosted VMs, but when it continued to get worse we finally opened a case with VMware. Here’s a chart of the kind of stats we were seeing (courtesy of SexiGraf):

VSAN Cluster 1 before

Read latency in particular was very high on the datastore level, IOPS weren’t great, and Read Cache Hit Rate was low. We also saw that read and write latency was high on the VM level. After we opened a ticket with VMware, they discovered an undocumented bug in VSAN 6.2 where deduplication scanning is running even though deduplication is turned off (and actually unsupported in hybrid mode VSAN altogether). They provided the following solution:

For each host in the VSAN cluster:
1. Enter maintenance mode
2. SSH to the host and run: "esxcfg-advcfg -s 0 /LSOM/lsomComponentDedupScanType"
3. Reboot the host

After we applied the fix, the cluster rebalanced for a little while and came back looking much, much better. In the below graph, you can see right when the fix was applied and see read latency drop, IOPS increase, and read cache hit rate jump to the high 90-percents:

VSAN Cluster 1 after

And for good measure, this is how it’s looked since:

VSAN Cluster 1 after 2

So to summarize, if you are running hybrid VSAN 6.2, you should definitely check your latency and read cache hit rate. If you’re experiencing high latency and poor read cache hit rate, go through and change /LSOM/lsomComponentDedupScanType on all your hosts to 0. I can’t take credit for actually discovering this, so thank you to my coworker @per_thorn for tracking it down. And thank you @thephuck for letting me write it up on this blog!


How to create an NSX CLI user, API user & set up NSX Plugin for vROps

June 23rd, 2016
Categories: NSX
TL-DR: See below for details on these commands

Create a local user in the NSX Manager’s CLI, then use the API to grant CLI privileges to that user.

Here’s how using a linux machine:
ssh admin@[nsxmanagerIP]
user vrops-readonly password plaintext notrealpassword
user vrops-readonly privilege web-interface

Log out of the NSX Manager (type exit) and stay logged into the linux machine.
Create cli-auditor.xml that contains this (replace brackets with greater/less than):
[?xml version="1.0" encoding="ISO-8859-1" ?]

Add the user as an auditor in the NSX Manager as a CLI user:
curl -i -k -u 'admin:password' -H "Content-Type: application/xml" -X POST --data "@cli-auditor.xml" https://nsxmanagerip/api/2.0/services/usermgmt/role/vrops-readonly?isCli=true
Add your domain/vCenter user as an auditor in the NSX Manager (NOT as a CLI user):
curl -i -k -u 'admin:password' -H "Content-Type: application/xml" -X POST --data "@cli-auditor.xml" https://nsxmanagerip/api/2.0/services/usermgmt/role/ReadOnly@THEPHUCK.COM?isCli=false

Details for creating the NSX CLI user for vROps

VMware Virtual SAN Health failed Cluster health test

December 2nd, 2015
Categories: Virtualization, VSAN
Here’s the error

While building a new environment for my lab, I ran across an interesting thing yesterday.

I looked at my cluster’s VSAN health and saw this error:

It’s complaining that my hosts don’t have matching Virtual SAN advanced configuration items.

If you click on that error, you’ll see at the bottom where it shows comparisons of hosts and the advanced configurations:

It shows VSAN.DomMaxLeafAssocsPerHost and VSAN.DomOwnerInflightOps as being different between a few of my hosts. Looking at the image above, you’ll see node 09 has values of 36000 and 1024, respectively, while the other nodes 10-12 show 12000 and 0.

