Everyone hears about VMware’s Virtual SAN and how awesome it is. It’s a very compelling offering and is only overshadowed by their software defined networking solution NSX.
The biggest hurdle: how to get started.
The truth is it’s extremely simple to enable and start using, but that’s not the “getting started” I’m talking about. I wanted to cover off some things to think about when you’ve decided you’re going down the VSAN path.
How do you know how many IOPS to expect, or how much storage you will have or need, should you go hybrid or all flash, and what resiliency or protection options you have, and the impact of those.
First things first: Hybrid or All Flash?
When investing capital into a new platform, it’s common to keep spend under control, and “All Flash” just sounds expensive. You have to understand how you’re gauging expense, though. If you’re talking about $/GB of storage, SSDs or other flash devices will be expensive. However, the inverse is if you’re looking at performance, these flash devices are a better deal than spinning/magnetic disks, when looking at $/IOPS.
One key thing to keep in mind is that Deduplication & Compression are ONLY available on All Flash, not Hybrid. It’s also imperative to understand how Deduplication & Compression work in All Flash. As pointed out here, when data is moved from the write buffer to the storage/capacity tier, it is deduped at 4k blocks. Meaning, if it can find other 4k blocks that contain the same 1s & 0s, it just points to the existing blocks instead of writing new ones. If your environment has standard OS images, there will be a lot of deduplication going on. How much exactly? You’ll have to try it to find out, but it will be significant. Specifically 2:1 if it can find an identical 4k block to point to instead of writing a new 4k block every other 4k write.
Also, when destaging those 4k blocks, VSAN tries to compress them to a 2k block. If it can, it will; if it can’t, it won’t. That’s a potential space savings of 2:1. There are some things that don’t compress well, typically because they’re already compressed, such as audio & video files, or file archives.
It has been mentioned that you can expect somewhere in the area of a 3:1 ratio after dedup & compression. Those two things combined tell a compelling story to skip Hybrid and go straight to All Flash.
What about performance?
Another question that comes up often is how to determine the performance you’ll get out of this new solution. For hybrid, it’s fairly easy. If you’re using an SSD in the cache tier capable of 97k random 4k block reads and 65k random 4k block writes, that pretty much answers the question on a per disk group basis. This is why I advocate running at least two disk groups per node, you’re essentially doubling the available IOPS per node. Three disk groups are still a 50% increase over two, but the returns diminish as you scale out disk groups. Using the disk referenced above, say you built a small 4-node cluster with two disk groups each. The theoretical maximum 4k random read IOPS is 776k for the cluster, and 520k for 4k random writes. Will you achieve that? Probably not, since the numbers supplied by the drive manufacturer are in a perfect, “best case” scenario. Cut the numbers in half and you’re still at 388k/260k read/write IOPS. The key thing here is 30% of the SSD is used as your write buffer, and 70% for your read cache. If the data you’re accessing is not in that read cache, your read IOPS will suffer, as it has to pull from spinning/magnetic disks in the capacity tier. I’m unclear what will happen if the write buffer fills up because it can’t destage that data to the magnetic disks fast enough. My assumption is write IOPS will suffer, due to it not being able to actually ack those writes at the expected SSD speeds. Your wait times might increase, since it’s waiting on other IO, but again, it’s my speculation.
Now, when looking at an All-Flash setup, your write IOPS are the same as above, since all writes still go through the cache tier, but the entire cache tier is used as a write buffer. You could have 400GB or 800GB of write buffer.
Now, when it comes to read IOPS? You can expect 1.21 giga-iops :P No, not really, well maybe. Anyway, so you take the number of capacity disks in the disk group and add up their potential read IOPS. Using that same SSD I referenced above, if you had seven of them, that’s a potential of 679k IOPS per disk group. However, that’s theoretical, again, and may not come close to what you actually get, since the disk objects may not be spread out evenly, etc. In a perfect world where you had seven objects all being read symmetrically off the capacity tier and all things aligned perfectly, you probably still couldn’t get there. Although, I don’t think 50% of that is too far off, how does 2.7M read IOPS sound? That’s based off seven SSDs in the capacity tier (679k IOPS), two disk groups per box (1358k IOPS), in a four-node cluster (5432k IOPS), and taking 50% of that number (2716k IOPS).
I don’t know about you, but YES PLEASE!
What’s the verdict?
Either way you look at it, VMware’s Virtual SAN is a very compelling story, and the truth is, you’ll benefit from either Hybrid or All-Flash.
That being said, my recommendation to you is to skip hybrid and go to All-Flash, build tomorrow’s cluster today!