CarahCast: Podcasts on Technology in the Public Sector

Nutanix HCI for State and Local Government and Education

Episode Summary

Hyperconverged infrastructure (HCI) can lower cost, improve performance, and increase efficiency all with a smaller datacenter footprint. In this podcast, Luis Gomez, Sr. System Engineer at Nutanix, discusses the value of Nutanix HCI brings to you and your organization.

Episode Transcription

Speaker 1: On behalf of Nutanix and Carahsoft, we would like to welcome you to today's podcast focused around Nutanix HCI for state, local and education organizations, where Luis Gomez, senior systems engineer at Nutanix, will discuss the benefits of hyper-converged infrastructure.

Luis Gomez: I’m here at Nutanix, and I'm here to present a hyper-converged architecture and a little bit about how our solution works in the hyper-converged realm. So I'm going to give you a 101 level explanation of what this is, what it's not, and a few features that are just almost self-explanatory in the sense that those features are the reason why hyper-converged is so popular nowadays.

To begin with, cloud has really whet our appetite. So we all want the public cloud-like simplicity, and we've heard of features, we've seen features in the cloud. And in some sense, our own data centers are losing out to the cloud. The old architecture that I'll highlight in just a bit is really losing out. And there's reasons why cloud-like simplicity leads to rapid time-to-market. So you're able to stand up VMs and applications quickly in the cloud. You're able to pay as you go and fractionally consume the services so you don't have to build out an entire rack, something physical to actually establish a foothold in the cloud. So, that's really nice. Everybody loves that. So it's a lower barrier of entry.

And so moving on here to the one-click simplicity, meaning that a one-click is kind of a metaphor for the simple management of the infrastructure, and actually in the cloud, there is no management of the infrastructure really. So what does that look like? Well, it's all software driven. So there's less headache on the infrastructure side. Really, the headache is the hyper scalers, the cloud manufacturers. But can you have that locally? That's the big question. Can you have a cloud-like experience on-prem and get continuous innovation, which is one of the tenants really kind of the software defined realm? Steve Wozniak, the co-founder of Apple, always said being an engineer is having the ability to invent or innovate yourself out of any problem. And that's a beautiful concept because certainly in IT there's a lot of challenges.

So this is a hardware defined data center. This is circa 2005. As you can tell, I mean, obviously this is very old architecture or old systems, but the architecture is still here today, conceptually speaking, the framework, storage area, networks, fiber channel switches, you got your tape backup library here. You can pretty much look and point to any one of these devices and define what it does because it's hardware defined. I mean, literally the hardware is provisioned for that specific task. And that's what your traditional data center looks like. And there's probably, right around the corner, there's a big refrigerator sized box with a bunch of drives in it and so on. And this is your three tier, hardware defined data center. And in the evolution of the data center, there were some challenges here. It was hard to troubleshoot, hard to integrate, interoperability issues, expensive. It was really hot, so you can see the floor panels right there, they have a bunch of little holes in it. You can probably bake bread on the reverse side of this rack with such high heat.

Moving along, there was this concept of converged data center. So not to be confused with hyper-converged. Converged data center looks a lot cooler, really. I mean, it's a lot neater, it's a tighter integration with the components that make up ... The fundamental building blocks of a data center are compute, storage and networking. Those discrete components are actually still here in a converged data center. So there was companies like BCE, there's companies out there that create, Dell has FX2 chassis, and all the major manufacturers came out with their converged data center offerings, but they really fundamentally still had these three components separated. In other words, you can point to the compute, the server, the networking and the storage. They're all still separate, but integrated in kind of a fancy shell. You still had multiple management consoles, multiple complexities, and inter-operability issues with this architecture.

So let's take a look at what the hyper scalers were doing back in the day. This is 2010, 2012, even earlier than that on Amazon. So what the hyper scalers did, and this is a data center, this picture here, I'm pretty sure it's Google's data center, but it's one of the hyper scalers. And what they did is they actually tried out SAN technology back then, because that was the best thing going. They tried storage area networks, they tried to put their data and workloads on, and it just didn't work. It didn't scale. And really, I mean, if you take a look at the architecture, what is a bottleneck in a three tier legacy architecture? Your storage. Your storage tends to be a bottleneck. It's not scalable. And we're talking hyper scalable here.

So here we see a software defined data center where you can point to any one of the boxes and really, these are pizza boxes and you don't know what they're doing. You don't know because the software is actually defining their capabilities. So that's what Amazon did, the big hyper scalers. And back in 2015, Amazon, they separated out their profits, or not their profits, but they separated out their numbers in terms of the top line revenue that they were generating with AWS versus the rest of the mothership of Amazon. And they were making about $7 billion in 2015. And three years later, Amazon AWS made $28 billion, which is more than McDonald's corporation. And there's a reason why this is happening. This hyper scaling technology, this way of doing your workloads, actually works. It works, it scales, and it's fool-proof in the sense that you don't have a single point of failure. You're software defining so you're innovating as you go. And what this uses, and one of the tenants of what they call web-scale architecture, is using commodity hardware, commodity servers, commodity networking, ethernet networking, commodity Linux, to put together a solution where the real innovation and creativity happens at the software level.

So continuing on here, and one of the founders of Nutanix actually worked at Google and helped create that Google distributed file system so that hyper scaling architecture could occur. And what we did, what Nutanix did, and there seems to be a lot of followers here, we have hyper-converged infrastructure that we integrated, or we introduced, into the corporate realm. So what we do is we introduce a single compute brick that really has networking, security, storage, and compute built into the server, software defined. The differentiator between us and the web scale is that we added virtualization. So virtualization, obviously, that took off with VMware. And virtualization, the addition of that allows for us to have everything on one system and maintain those VMs on those compute bricks. And doing away on the left-hand side, you have the three tiered architecture with it's various components, the fiber channel switching, you have a lot of application specific integration chips, ASIC chips, on the left-hand side. And we don't have that here on the right. With Nutanix, we have commodity servers, again, commodity networking, and the software on top of it. And that's how we operate.

So we're going to take a look a little bit under the hood. And so this is the way it works. So we take a single node. So you have your VMs there, your hypervisor, and we're going to add a controller. What we've done is we've actually virtualized the controller itself. The controller that traditionally is a physical device that resides on a physical controller for the storage, we virtualized that controller. And what they do is they kind of conspire with each other. They look at the local drives of the machine of each server. So if you look at this diagram, you have node one, node two, and node three. Node is kind of synonymous, interchangeable with server. So you take a look at that and the CVMs themselves are going to talk to each other and they're going to create a single storage pool out of the local drives that are on those servers.

So this is a break from the traditional three tiered architecture, where you weren't even using the local drives. The local drives off those machines under a SAN, kind of storage area network architecture, really, there was a bunch of empty bays. In fact, you would boot off the storage area network, if you could. In this case, it's totally a reversal. So it's kind of a non-intuitive way of operating. And initially, there was a lot of skepticism about this. But if it weren't for the hyper scalers kind of paving the way, this architecture itself is just proven out to be scalable. And I'll talk a little bit about the scalability in a minute.

But if we take a look at these CVMs, what they're doing is they're actually creating a storage pool that they turn around and present to the hypervisor, whether it's a hypervisor that's ESXi or our own hypervisor, AHV, or Hyper-V, they turn around and present a single shared storage pool to those hyper visors. So the hypervisor itself doesn't really know anything about the storage other than it's centralized and shared. And that's kind of the prerequisite for a hypervisor. So what you get out of it is you get this distributed storage system that's using the local drives, but no physical SAN. You're still able to do your HA, your high availability. And really, that's one of the primary goals here, is to provide that HA ability to your VM so that if you do have a failure, and in the event of a failure, and of course, everybody gets their silicon from the same place and the stuff's going to break. So it's not a matter of if, it's a matter of when. And so when this application or this node goes down, your VMs will automatically get booted up elsewhere. And that's obviously just HA. In your HA configuration, the hypervisor doesn't know the difference, that it's just basically a storage pool that's sourced in local drives.

Other additional benefits, again, this is software defined shared storage. This is a part of what the hyper-converged story's about. And you get snapshots, locality, discuss that in a little bit. Clones, tiering compression, DR. DR and replication to a DR site is standard in our product, and it should be standard really in any hyper-converged product and you get exceptional performance. And there's a good reason for that as well that we'll get into. So, the workloads that you want to have on this are tier one workloads, tier two workloads. This is powerful enough to actually ... SAP, Splunk, big data, SQL databases, exchange servers. We started off in VDI, and VDI was one of those candidates, primary candidates, for hyper-converged because the storage area network was the single most expensive product or component in a VDI project. So when this solution was introduced, a lot of our VDI customers, we've expanded to multiple workloads, including tier, one mission, critical big data, high throughput workloads.

So achieving this without a SAN was an achievement. And there's other architectural considerations here, benefits, through using just hyper-converged in general. But we have a little bit of a twist. We have the scale-out architecture. So scale-out means that in our architecture, you can do two types of scaling. You can do vertical scaling or horizontal scaling. Our architecture really is enhanced for horizontal scaling. In other words, you pop in that so-called pizza box and you add another node, and then you pop in another one for scaling. So it's a scale-out architecture. The addition of nodes and the addition of compute capacity is simplified in this environment.

We fine grain the metadata. So we actually break up the data in four megabyte chunks, and we spread it out throughout the entire cluster. There's high resiliency to hardware failure when you do this, and I'll present that in just a minute. Data locality. So we make sure, and this is key to the performance, we make sure that the VMs that are running on node one have their data on node one. Even though we replicate that data somewhere else in case of failure, we make sure that that data is there in node one. So we're talking about an IO::Path that does not include the network when it comes to reads. It does include the network when it comes to rights, because we do replicate that data, but we get much faster performance. And if you think about a high-end SQL server, most of those interactions with the IO::Path are read interactions, so you get exceptional performance with that. And of course, we support NVMe and Optane drives, SSDs and even hard disk drives. We do not use RAID cards. Our own software controls the parody, it controls the number of replicas that we have. So we don't actually use RAID. You'll find that our our CDMs have a direct path to a storage IO.

So real quickly, this is what Nutanix, AOS is what we call it. An Acropolis Operating System, I believe in stands for. And that's the storage, that's our operating system, basically, that runs the hyper-converged system. And we give our customers choice. Any hyper-converged really should allow the customer to choose their own hardware. So we're commodity hardware. So we the NX brand that you see there, that's actually SuperMicro. But Dell, any major manufacturer, you use their hardware, use the virtualization of choice, and then we manage everything with Prism. So Prism is the single management console that we provide our customers to manage all of this.

And so our system is simple, secure, resilient, and flexible. It has to be simple. So if we give you a bunch of software defined capabilities, but it's really hard to manage, it kind of defeats the purpose because complexity is the enemy of a lot of things. It's the enemy of security. Complexity is the enemy of scalability and stability. Complexity leads to customers misconfiguring things, and so on. So in the realm of simplicity, when it comes to deploying any hyper-converged infrastructure, and in particular ours, we have a process that's kind of a single install, that is capable of installing your operating system, your AOS software and the hypervisor remotely for on-prem deployments or through a robo remote office site. So your deployment is in hours.

And this is so interesting to me because I come from the old school, three tier where if you're deploying a fiber channel based storage area network, you're going to be there several days. And the deployment of a hyper-converged system like Nutanix literally takes a couple of hours. It can be done in less than an hour, depending on how large your cluster is. You can take a long lunch and come back and there you have it. You have a cluster that's ready for your on-prem tier one workloads. It is much simpler to deploy because you don't have to worry about the integration of all those disparate pieces.

So the scaling of this infrastructure, and this is a primary tenant of HCI, is it must scale linearly, which means that if you add another node, you're adding that much more capacity. I give an example of VDI. Let's say you support one single compute node or one single server supports 100 users, and so there's there's three nodes here in this one cluster. Adding a fourth one would support another 100 users. So you now get the total computing capacity of 400 users. That's scaling linearly, obviously.

And so what we do is we make it easy for customers to be able to add those nodes easily. And when you add them, it adds to the little meter here that you see on the right, the compute capacity in IOPs. In other words, I liken this to adding a cylinder to your engine. So if you had a software defined car engine, you could add horsepower by being able to add just another cylinder, another piston, if you will. So you go from a L4 to a V8. And and that's possible in this system here, because you can just add nodes. And you have the ability to add nodes that are different, that are maybe appropriate to what your needs are.

So in this particular example, you can add nodes that are different models, not different brands, so you stay within Dell or you stay within HP, but you can add different models within those lines. And in this case, it's a storage heavy node. So you're storage heavy, and it could have three times the storage of the rest of the cluster, and it'll automatically rebalance the data to those nodes. So you just have to add it, and minutes after adding it, you'll have access to the storage that's on that system by way of the system just automatically including the local drives into the total capacity, into the storage pool. So that's the scaling part of it. And this is something that hyper-converged infrastructure should have all the time.

And then we add finally storage only node and all flash node. Lots of options here. In a typical scenario, you'll have just homogenous clusters with some outliers like storage heavy nodes, perhaps. Our clusters can scale. There's theoretically no limit to the number of nodes you can have in a cluster. But typically, you'll have clusters that are anywhere from minimum three to 20, 30 nodes in a single cluster. And again, the more nodes you add, the more power you're adding to the system.

So we do have other capacity optimization techniques like deduplication, compression, and then coding, erosion coding, which is our own parody based capacity optimization. So it reduces the footprint of our data. So we want to get some space back. And because of time, I'm just going to go ahead and skip over some sections here. What this slide presents is just basically our management plane. So our management plane is a sharp contrast. It's a single management plane that you use to manage your hardware, you manage your firmware, you manage your hypervisor, the updates of your hypervisor. You deploy your VMs from here. You manage your networking from here. So that's the efficiencies, the operational efficiencies, that you get with hyper-converged is that you can build one single console to troubleshoot, to update, to maintain your operations of your data center. So we're providing an operating system for the data center with Prism.

Security. We do support encryption of drives, encryption of the drives through software based encryption. We actually include a built-in software encryption key manager, or you can use your own third party. We support FIPS 140-2. From a ransomware perspective, our operating system creates snapshots of recovery points in time so that if you are hit with ransomware, you can simply go back to a recovery point that does not include that ransomware. We do have anomaly detection and micro-segmentation as a feature built into AOS, so that you get firewalling between one VM and another that will stop or prevent the spreading of the malware. And then the recovery points are something you can use to recover from malware.

So lastly, I'll just discuss this resiliency. So one of the things that is a frequent objection is like, well, without a SAN, that means you don't really have a single storage fabric that manages all your data. What happens if a server goes down? Do I lose my data? I mean, servers have their own drives obviously. What we do is we actually replicate the data. And that's what this little diagram shows, is a replication of blocks, storage blocks A, B and C. As you can tell, they're spread out throughout the cluster. So if you do have a system that goes down, in this case, it's a VM that consists of blocks A B and C, what will happen is HA will kick in, boot up the machine on the next node, and there's still the data A, B and C is spread out throughout the entire cluster. And what will happen is the machine will start up automatically. And it's sourced data. It will get some of it from the local drives. Others, it will get from the actual network.

So at this point, your performance is like if you were using a SAN. So that's the worst case scenario. All the controller VMs are actually going to contribute to rebuilding or re hydrating the replica data so that you do have healing, self-healing occurs. You have a VM that now has its data locally again, and it's replica data is found throughout the cluster. If you have another failure, this has happened from time to time, you have a secondary failure the next day, because you self-healed, you can still boot up that machine elsewhere and that data will be there for the machine to take advantage of. And then self-healing occurs again and so you have a replica. And so, as you can imagine, if you lose two nodes, you would think, wow, there's a huge possibility that we're going to lose data, but not with self-healing. Very resilient.

So with that, I'm going to go ahead and stop because we're out of time. But this concludes the hyper-converged infrastructure presentation. Thanks.

Speaker 1: Thanks for listening. If you'd like more information on how Carahsoft or Nutanix can assist your state, local or education organization, please visit www.carahsoft.com or email us at nutanix@carahsoft.com. Thanks again for listening and have a great day.