This article was originally written as a guest blogger for intense School IT educational services. One of the topics I’d like to discuss throughout this article is VDI and some of the common issues we (might) run into when it comes to storage, IOPS and image management. At the same time I’d also like to point out some possibilities, or better said, technologies, we have at our disposal in addressing these issues and talk a bit more on IOPS, block vs file level storage and image management. During part one I’ll primarily focus on VDI in general, describing its use and some of the common pitfalls we might encounter with part two primarily focusing on some real world solutions.
First I’ll try and explain what a Virtual Desktop Infrastructure is about. Think of it as (virtual) infrastructure (see the different vendors below) from where VM’s based on client operating systems, hosted and managed in the data center, get published out to your users. Your users could be working on thin, zero or fat clients, it doesn’t matter, all the ‘real’ processing takes place on the virtual machine in the data center. Compressed graphics and key strokes get send back and forth over the line, keeping network traffic small and fast. In a nut shell, that’s basically it. All majors vendors offer their own VDI solution, Microsoft’s Remote Desktop Services, for example, Citrix XenDesktop or perhaps VMware View, all slightly different but VDI none the less, have a look at the example below, it’s based on VMware View.
When VDI got introduced a few years ago, it started out as a very promising technology and to some extend it has lived up to its expectations, but still, it didn’t flourish the way they’d hoped and perhaps expected. Some reasons why it didn’t grew as big are partly to blame on the complexity it brings when it comes to image management, especially with dedicated desktops, storage requirements and the accompanying IOPS that go with it. As far as IOPS go, the same can be said for pooled desktops, we’ll go into these a bit more later on.
Pooled vs Dedicated desktops
Since it’s not ‘one size fits all’, virtual machines can be configured and offered in different ways, with pooled and dedicated (also referred to as persistent) probably being the two most popular ‘types’. No matter which type is used, VDI VM’s are provisioned and based of a so called master, or golden image, so they all start out exactly the same. Different vendors use different techniques to make this happen. Without going into to much detail, I’ll explain the differences below, regardless of the technique used it works like this:
Pooled: With pooled desktops all changes made to the underlying operating system (master / golden image) are discarded on log-off or reboot. Meaning that installed applications or updates etc.. will be gone once a user logs off or reboots, the VM will again be ‘clean’ and put back in its pool waiting too be re-used. This also goes for all personalized settings, so a good user profile solution needs to be in place.
Dedicated / persistent: All changes made to the underlying operating system (master / golden image) are preserved when a user logs off or reboots, again, different vendors use different techniques to accomplish this. However, this also means that a dedicated VM is bound to a particular user as apposed to pooled desktops where you can chose what to do, tie it to a particular user or put it back into the pool for reuse, these options also differ slightly per vendor. Not a bad thing per see, but worth a mention.
Besides VDI there are a few other things I’d like to address throughout this article since they’re all closely related. Of course I’ll explain IOPS in some more detail as they relate to VDI and storage in general, but I also like to make a note on block vs file level based storage. Block level storage is widely used within VDI deployments, and all sorts of other architectures as well, think of Storage Area Networks (SAN’s) for example, I just think that there’s still a lot of confusion around the differences between block vs file level based storage. Let’s see if I can clear up some misconceptions.
Block level storage is based on raw storage volumes (blocks). Think of it this way; a bunch of physical hard drives, as part of your SAN solution for example, get split up into two or more (this can also be one big volume just as easy) raw storage volumes / blocks (using specific software) which are remotely accessed through either Fibre Channel or iSCSI, this way presenting itself to a server based operating system. It doesn’t get more flexible then this.
Basically each raw volume / block that gets created can be controlled as an individual hard drive. Format it with NTFS, NFS or VMFS (VMware) and support almost al major applications out there like VMware, different sorts of databases, Exchange and more. Although very flexible, they’re also harder and more complicated to manage and implement. Not to mention more expensive then most file level storage solutions which we’ll have a look at next.
File level based storage is all about simplicity, which in most cases, is implemented in the form of a Network Attached Storage (NAS) device. Think of file level storage as one big pile to store raw data files, nothing more. A central repository to store your companies files and folders accessed using a common file level protocol like SMB, CIFS or NFS used by Linux and or VMware. Just keep in mind that file level based storage isn’t designed to perform at high levels, meaning that if your data gets a lot of read and write requests and the load is substantial, then you’re better of using block based storage instead.
Easy to set up and a lot cheaper as well, but… A NAS appliance has it’s own file system (a non standard operating system) and handles all files and folders that are stored on the device as such. Something to keep in mind when thinking about user access control and assigning permissions. Although most NAS and/or other file level storage devices support the existing authentication and security mechanism already in place, Active Directory in most cases, it could happen that you run into one that doesn’t.
As you can see, both have pros and cons, a lot depends on the use case you’re presented with. I mean, deploying tens or even hundreds of client OS based VM’s as part of a VDI on file based storage just won’t work. That leaves us with the more expensive SAN solution, or am I wrong?! Lets continue and find out.
IOPS in some more detail
I’ll start with a quote from Wikipedia: “IOPS (Input/Output Operations Per Second, pronounced as eye-ops) is a common performance measurement used to benchmark computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN). As with any benchmark, IOPS numbers published by storage device manufacturers do not guarantee real-world application performance”.
Typically an I/O operation is either a read or a write with a series of sub categories in between like; re-reads and re-writes which can be either done random or sequentially, the two most common access patterns. Depending on what needs to done a single I/O can vary from several bytes to multiple KB’s. So you see, it’s basically useless for a vendor to state that their storage solution will deliver a certain amount of IOPS without stating the read vs write percentage used during the benchmark, the size of the I/O’s being processed, and the latency that comes with it (how long does it take for a single I/O request to get processed). As stated on this Blog (make sure to check it out, take your time and read it thoroughly, it’s one of the best storage related resources around) it’s near impossible to claim a certain amount of IOPS without also mentioning these additional parameters, since there is no standard way to measure IOPS, and there are a lot of factors that could influence the amount of IOPS generated besides the ones mentioned above. Makes perfect sense to me.
Of-course all this doesn’t matter if you don’t know your sweet spot, how many IOPS do you need, and more importantly, what kind of IOPS do you need. There are various file system benchmark tools available that can assist you in determining the amount and type of IOPS needed for your specific workload. Also, the Internet is full with Blogs and hundreds of other articles describing how to determine the IOPS needed by certain applications. Just remember, don’t go with general estimations, just ignore them, make sure you test and evaluate your environment thoroughly! Remember, tooling can get you a long way, give this some though.
I already mentioned pooled and dedicated / persistent desktops as part of VDI deployments. If we look at VDI designs today most are primarily based on the pooled desktop design. There are several reasons for this, let me try and explain why. I’ll use Citrix XenDesktop as the underlying VDI technology, the one I know best. Although other vendors might use slightly different technologies, the storage (IOPS included) and (image) management issues we encounter when using dedicated / persistent desktops, unfortunately, don’t just go away. When provisioning multiple desktops based on a master image using XenDesktop, pooled and dedicated, one technology it will use is differencing disks to store all writes made to the virtual machine. If it’s supported by your storage solution these disks can be thin provisioned, otherwise it will be as big as the base (master) virtual machine mentioned before. Be aware that each VM provisioned by MCS will get its own ID and differencing disk.
Knowing this, you can probably imagine where potential storage ‘issues’ could come in. First imagine this, when managing a few hundred Pooled desktops, used at the same time, and for now I’ll just assume that your storage solution does support thin provisioning, these can all potentially grow as big as your underlying base (master) image, that’s a lot of storage. In practice this probably won’t happen that much, and even if it did, this isn’t something to worry about because when a Pooled desktop gets rebooted all changes made to the VM (stored on the underlying differencing disk) will get deleted (a nightly scheduled reboot perhaps?) This way you end up with a fresh and empty VM, again, waiting to be (re)used.
Using the pooled model, and with the above in the back of your mind, you could consider to under-commit your VM’s as far as storage goes since it’s highly unlikely that your machines will grow beyond several GB’s during the day. If you do, make sure to implement some kind of daily reboot schedule. Monitoring your VM’s for a few days will tell you how far you can go as far as under-committing is concerned. This way you won’t have to allocate all of your storage right away. If thin provisioning isn’t an option you might want to reconsider your storage assignment, but since we don’t live in the stone ages anymore we should be fine. Read on…
Now picture the same scenario but this time we use Dedicated desktops. We start out the exact same way, only this time when the VM gets a reboot all the writes to the underlying differencing disk won’t get deleted! The user logs off or shuts down, he or she comes back into the office the next day, the same VM gets fired up and the user logs on, they work throughout the day, again making changes to the underlying base (master) image (written to the differencing disk), perhaps installing new applications or updates etc… but now nothing gets deleted. No matter how many times the VM gets rebooted, the underlying differencing disk will keep expanding taking up more free space, till it’s full of-course. If you look at the above example it’s obvious that these Dedicated VM’s will consume a lot more of allocated storage to begin with (and IOPS) than their Pooled opposites. This also means, no under-committing! Size accordingly when using this solution.
If storage isn’t an issue (and often it isn’t) then management might be. In the end you’ll also need to manage these Dedicated desktops on an individual basis. This is partly because with Dedicated desktops it isn’t possible to update the underlying base (master) image without destroying the accompanying differencing disk, just take my word for it. With pooled VM’s we don’t have to worry about any persistent data, we can just update the base image, assign it to our pooled machines, reboot and we’re again ready to go! Next to that, it’s only a matter of time before each user starts installing their own applications, making each desktop just a bit different from the other. Of-course there are all sorts of automation tools out there that can assist you with these kind of tasks, but I’ll leave that up to you. Still, this solution offers some real big advantages over ‘normal’ hardware based desktops, it’s just that it might seem a bit more ‘romantic’ on forehand than it turns out to be once you implement it.
Fortunately there are several vendors out there, Citrix included, offering us some smart technologies to overcome most, if not all, of the issues discussed. Simplified management, less or no IOPS, storage saving solutions and more. In part 2 of this series I’ll discuss XenDesktop Personal vDisks, Citrix PVS, Pernixdata and Atlantis ILIO as potential live savers when designing, troubleshooting and or upgrading your virtual desktop, or whatever, infrastructure troubled by one of the mentioned pain-points. Some of the concepts discussed can be kind of hard to get your head around without going into to much technical detail, never the less I hope this has been somewhat informative, giving you at least a basic understanding on storage, IOPS and VDI in general, including some of the challenges that may come with it. Stay tuned.
VDI, storage and the IOPS that come with it. Part 2 of 2
Welcome back. In part one we covered the (VDI) basics including some of the most common issues that could interfere with building a solid, fast and reliable virtual desktop infrastructure. I must note that most of the talked about technology is applicable to other sorts of infrastructures as well. During part two I’ll discuss some interesting solutions handing us several options to overcome, or eliminate, IOPS, complex image management and some of the other challenged we might face.
Although it doesn’t directly solve our IOPS problem, it’s a step in the right direction, at least from a management perspective it will make life a bit easier. Again, I’m just focusing on XenDesktop for now, other vendors might, or will, have their own solutions. XenDesktop offers us Personal vDisks, or PvD’s in short. With Personal vDisks we still use a base (master) image just like before, differential disk included, but we now get an extra virtual disk (the Personal vDisk) attached to our VM as well which will hold all of our ‘personal’ changes. These will include but are not limited to: all file level and registry changes like: installed or streamed applications provisioned by SCCM, App-V (cache) or XenApp for example, but also things like: desktop wallpapers, start menu settings, favourites and other user ‘profile’ related settings. Small(er) companies might use PvD’s as their user profile solution at the same time, or instead of, now I’m not saying that it’s ideal, but it can be done.
When creating dedicated VM’s with PvD’s you’ll still see a differential (and ID) disk attached to the VM as well, but, just as with ‘normal’ Pooled desktops, it’s stateless, meaning that it will be emptied on every reboot or log-off. The best thing is, it will hold all delta writes made to the underlying base (master) image OS, keeping the PvD, which only holds all of our personal changes as explained above, small(er) in size, the way we want it to be. Another (big) advantage is the ability to update the underlying base master image without destroying or loosing any personal data / settings what so ever, it will just blend in after being applied, and the user won’t notice a thing.
What more can we do?
There must be something else we can do right? A lot of organizations don’t consider storage to be an issue, there’s plenty to go around. And with technologies advancing, think of data de-duplication (on storage level) for example, some, or most, of the scenario’s described above are getting easier, and more realistic to implement by the day. The one thing we still have trouble with are IOPS, yes, there they are again. Even though block based level SAN’s offer great performance, in many cases it still isn’t enough. If we really want to make a difference when it comes to IOPS we need to address (or relieve) the storage layer, since that’s where it all happens.
Citrix provisioning services (PVS)
PVS uses a software streaming technology where a base image (just like with XenDesktop as described in part one) gets streamed over the network to multiple VM’s at the same time. Although this is fundamentally different then the pooled and or dedicated desktop model, there are some similarities as well. Just as with pooled and dedicated desktops there needs to be some way that we can store our writes to the base image, since it’s read-only. PVS has something called write cache to handle these writes, comparable to the differencing disk technology explained earlier. Write cache can either be stateless or persistent just as with the pooled and dedicated models. Note that PVS can also be used to provision XenApp servers using a standard image, in fact, this is probably one of the most common use cases out there. Now for the interesting part, we can choose where, and how, we want to create the write cache, we have the following options:
We can place the write cache on the device’s hard drive (stateless or persisted), cache in device RAM, cache on device RAM with overflow on hard disk (which is only available for Windows 7 and Windows Server 2012 and later), cache on server disk stateless and cache on server disk persisted.
The above methods offer us a lot flexibility but again, it all depends on the use case you’re presented with which might work best for you. But since we are looking to eliminate IOPS as much as possible, you can probably guess which of the above methods we don’t want to use. Remember that we’re talking about (VDI) VM’s here, so if we choose to cache on the device’s hard drive, either stateless or dedicated, this means that the write cache will be placed on the virtual hard disk of the VM, thus on the SAN (in most cases anyway) where the VM is provisioned. We could also select to store the write cache on the PVS server hard disk, again, either stateless or persistent, although this would relieve the SAN of extra IOPS, it would also increase the I/O and network load on the PVS server.
That leaves us with the ‘cache in device RAM’ option, and when using Windows 7 in combination with Windows Server 2012 or later we could choose to select the ‘overflow on hard disk’ feature, which would make sense since you’ll probably see some blue screens if you run out of memory to store your writes. Using RAM for write cache will speed up operations immensely and will free your SAN and PVS server of IOPS, but… using this technology will only be use-full when using pooled desktops since RAM isn’t designed to store information permanent. Also, when we say ‘cache in device RAM’ we’re talking about the memory of the target device, which in the case of a VM is the RAM of the hypervisor host server where the VM is running on, so you need to size accordingly.
Another thing to keep in mind is that when your hypervisor host crashes, a second, or third etc… host server will most likely take over, but your writes in RAM will be lost, meaning that your users might lose some work in the process, something to consider. This also applies when you choose to store your write cache on the PVS host local hard disk(s), if the PVS server dies, you will loose your write cache along with it. Using this solution only leaves us with the base image (and user profile data) which also needs to be stored somewhere. PVS is smart, when it reads the master image and starts streaming it out to your VM’s (on request of the VM), what it will do is, it will cache all reads in memory, but this time it will use the RAM of the PVS server itself, so when it needs to read and stream out the exact same blocks of data to VM 2, 3, 4 etc… it will read from RAM, again, no extra IOPS and extremely fast. Of course it goes without saying that your network needs to be capable to handle the PVS stream, but as long as you keep in local, preferably on your private LAN, you should be fine in most cases.
This should give you an high level overview on the possibilities of PVS when it comes to eliminating IOPS and other storage related issues. Just as with the pooled and dedicated desktops that use differencing disks, or a similar technology, PVS also has some pros and cons when it comes to updating the master base image, especially if it’s used to provision dedicated desktop, as we saw earlier, but for now I’ll leave at this.
Pernixdata offers us FVP (and that’s their whole portfolio) check out their Data Sheet here, it’s awesome! Their main focus is to reduce the IOPS bottleneck, and improve overall storage performance where they can, basically by using one big server side caching mechanism built up out of fast SSD like storage. If you go to Pernixdata.com they’ll tell you that administrators need a way to efficiently scale storage performance using virtualization, much in the same way they scale server compute and memory, and that, Pernixdata FVP does just that. Their revolutionary hypervisor software aggregates server side flash (SSD’s for example) across an entire enterprise to create a scale-out data tier for the acceleration of primary storage. By optimizing reads and writes at the host level, PernixData FVP reduces application latency from milliseconds to microseconds. And since a picture says more then a thousand words:
It’s easy to install and manage, it supports all major storage vendors and can be installed on all known hypervisors. It accelerates both read and write operations (IOPS). FVP can be configured to first write changes to flash and later to persistent back-end storage while in the mean time data loss is prevented by synchronizing all flash devices on peer servers. It’s fully compatible with (almost) all existing infrastructures, and believe me, I’ve seen it in action, it really works.
Their portfolio is a little more advanced, so to speak. They offer Atlantis ILIO for, persistent VDI stateless VDI (XenDesktop and VMware View), XenApp and Atlantis ILIO center, their central management solution. Here’s some information from their website; atlantiscomputing.com Atlantis Computing’s unique In-Memory Storage technology forms the foundation for all Atlantis ILIO products. In virtual desktop environments, Atlantis ILIO delivers better-than-PC performance while enabling linearly scalable VDI and XenApp environments that are fast and easy to deploy and do not require any changes to existing desktop images.
Although the infrastructure needed to support these kind of deployments is a bit more complex when compared to the Pernixdata solution, it also offers some huge additional advantages. Besides eliminating IOPS almost completely it also reduces your storage needs up to 95% by leveraging their unique In-Memory Storage technology, thereby eliminating the use of differencing disks, linked clones and or PvD’s. Leaving us with just user profile data, our master images and some persistent VDI data (when applicable) all managed by the so called Replication Host which is a central VM that maintains a master copy of each user’s data blocks.
On top of that, in-line deduplication, wire-speed compression and real-time write coalescing are some technologies used to shrink and speed up the data. As far as the infrastructure goes, Brian Madden wrote an excellent article discussing their Persistent VDI solution, giving you a basic explanation on the technology used and the infrastructure needed. He also briefly discusses their VDI diskless solution. To go short, if you want to know more (and yes, you do) make sure you read his article, you’ll find it here There is only one drawback, the licenses needed don’t come cheap, but I guess this also depends on your reseller, something to keep in mind before getting to enthusiastic. Never the less, it’s innovative and ahead of it’s competition by miles, excellent technology.
Wrapping up part one and two we again discussed quite a few concepts and technologies. Although one perhaps more advanced then the other, fact is, we’re moving forward at warp speed. All of the products discussed offer free evaluation or demo licenses for you to give them a try, XenDesktop and VMware View included, so I suggest to do just that. I already highlighted some of the possible pitfalls and possibilities that each product brings, and to be honest there’s not that much to add. Below you’ll find some of the references I used when putting together the above, make sure to pay them a visit as well, there’s so much more to explore!
Bas van Kaam ©
Reference materials used: VMware.com, Microsoft.com, Citrix.com, Blog.synology.com, Recoverymonkey.com, Atlantiscomputing.com, Pernixdata.com and Ngn.nl