CarahCast: Podcasts on Technology in the Public Sector

Optimize Your Nutanix Deployment with Prism Pro

Episode Summary

Listen to this podcast to hear Jason Malnar, Senior System Engineer at Nutanix, discuss the enhanced features and capabilities of Nutanix Prism Pro and how it can dramatically improve your virtualization and infrastructure management with predictive analytics and task automation.

Episode Transcription

Speaker 1: On behalf of Nutanix and Carahsoft, we would like to welcome you to today's podcast focused around optimizing your Nutanix deployment with Prism Pro, where Jason Malnar, senior systems engineer at Nutanix will discuss how you can optimize your Nutanix deployment with Prism Pro with enhanced features and capabilities.

Jason Malnar: Today, we are going to be covering Prism Pro and how to make the most out of our Nutanix deployments, right? Just a one minute recap on HCI and Nutanix in general, you can see HCI, basically the convergence of compute of virtualization and storage all put together in one package, right? And so that provides a lot of advantages in the data center, as far as consolidation, ease of management and breaking down traditional IT silos. We're really going to see how that plays out when it comes to advanced IT operations and automation with Prism Pro, right? So the general feel for what we'll be talking about today, during our time together will be this general flow of you have your infrastructure that generates data about the infrastructure, which then can be translated into signals, whether those be alerts or KPIs, which then translates into an action that's taken based off of that signal, right?

So we get an alert, we conduct an action. And so there's this IT operations, efficiency and automation that is conducted on top of the HCI platform. What are some of the challenges that customers are facing that this kind of automation and efficiency would help with? More moving parts, right? So, I see organizations have, IT is not getting simpler, right? It's getting more complex. Applications are becoming larger. They are much more intertangled with each other. There's far more moving parts and Nutanix aims to simplify IT both from an HR perspective, but also from the automation and management perspective. But there's no question that in general, as we look across the IT landscape and look across technologies, not only are we getting more complex from our data center technologies, we're getting more complex when it comes to multi-cloud or hybrid cloud cloud native applications, right?

So, definitely more moving parts. Dynamic demand applications are expected to be able to grow and contract pretty rapidly on demand. So we, into our busy season financial season of some time, we expect these applications to be able to grow in resource size and then contract when they're no longer needed, or we have the expectation to be able to spin up an application in multiple locations or create one very quickly, right? With dev ops cycles and security cycles, we are expected to move code very fast. So there's dynamic demand. Gone are the days of building an application over six months, putting it in production and then let it sit there for three years, right? It's a much more dynamic world and demand for these applications goes up and down and we need to be able to adapt to that. Noise and silence, right? So there's a lot of, because there's a lot more moving parts, all those moving parts have their own data streams.

They all have their own logging, they'll have their own alerts, right? And so IT teams are getting faced with just a ton of alerts, a ton of noise, a lot of red lights. What does all that mean? What's actionable? And how do we do that? And then additionally, those IT teams are often siloed, right? So storage team is getting storage alerts. Web ops team is getting web alerts, database team, getting database alerts right, virtualization team, getting their own alerts. So a lot of alerts coming in from a lot of places, creating a lot of noise, and then finally on the bottom, right, it's over complex, and there's a lot of manual operations. Again, going back to the more moving parts, a lot of noise, dynamic demand, all of this adds to an overly complex situation that IT teams are trying to deal with, and they're doing a lot of this stuff manually that they don't have to do.

So, instead of spending their day, making the world a better place, instead of making the infrastructure more efficient, they're spending their time just doing manual stuff that, at this point, in the evolution of data center IT and cloud IT, we're able to continue with automation. All right. So how does Prism approach this and what are some of the ways that we're really going to help with these particular challenges that customers are facing? So, first of all, is broad observability, right? Because it's an HCI platform and we combined so many different components of the data center together. We have a broader observability than other infrastructure platforms do, right? Because we have storage because we have compute because we have a networking components in virtualization, all in the same place, we're able to get a lot more data in one central place. Right? So back when I was a virtualization infrastructure lead at a federal agency we had performance issues or VDI issues.

We would get many, many people into a room the storage team, the VDI team, the active directory team, right. And the virtualization team, the server team would all happen, and they have all their own screens, right? And so there's a lot of singular observability, but not broad observability. And having that broad observability really helps when it comes to automation, as well as managing your infrastructure more efficiently. Actionable singles, right? Going back to that noise. What's actually important? What do we actually need to know? And having signals that we know are important and actionable is critical when we start doing operational efficiency, what do we actually need to respond to? What alerts actually matter? What are our KPIs for our systems that actually matter everything else is just noise, or maybe we just need to keep it for retention. But these are the things that we really need to respond to. Automation with ease.

Historically. Data center automation has been very code driven. You have to know scripting, you have to know code. We're going to see some examples today that share that that's just not the case anymore. We can do automation with ease with Nutanix Prism, and finally seeing this operations experience, going back to that hyper-converged solution, where we can see compute, we can see storage, we can see virtualization, both on-prem and in the cloud, all from one place, a seamless operations experience. What does this look like from a use case example? So what are some examples of this actually being applied in front of a customer, and the benefits they would get? Efficient use of resources, right? Detecting waste with machine learning, knowing what machines are over-provisioned or under provisioned and preventing machines that are over-provisioned from soaking up resources that they don't need to, being as efficient as we can. Automating optimization with proper approval.

So letting the system take care of a lot of mundane tasks, for your IT teams, obviously with approval, but being able to automate those things, right, being efficient with our resources, and then admins can rapidly build and tailor the automation to IT policies, which we're going to see later in this presentation, I have a walkthrough of what this automation actually looks like in a Nutanix system. And so we'll walk through that. Cost optimization, in case Prism allows us to define costs for particular systems, which allows organizations to conduct charge back. So if you're a central IT provider or your very large IT shop, and you want to be able to identify costs for certain applications and determining what is costing your IT systems the most or the least we can do that through Prism Pro. Report costs visibility based off of TCO, deliver chargeback reports by cost centers, right?

So, we can define, hey, this is one group they're costing the IT infrastructure this much, they're incurring this much cost and then in control spending and budget with policies, defining what organizations can spend, how much with cost optimization in Prism Pro. Proactive performance remediation, so we're going to see an example of this, again, further along in the presentation. I think this is one that is kind of near and dear to my heart. The techonomics, based on performance behavior, triaged with full-stack visibility and prevent problems from causing a bigger impact. Again, we'll dive into this more later, but this is the ability to provide real time performance protection for your applications. So if a database server is running really hot or a set of web application servers is running really low on CPU, we can automatically fix that with automation. Automatically take care of that automatically scale those VMs and add whatever they need.

Right? And we're going to see examples of that later, but being able to allow the automation to fix a problem before it becomes an IT ticket, right? Before a human has to intervene. And that ticket is created, an SLAs are engaged, right? We can automatically resolve a lot of things and protect our performance organizations. And then crowd-like capacity expansion. So, Prism Pro has a lot of functionality around being able to determine how much capacity you have, how much you will need and the ability to create scenarios. So, for example, VDI obviously has become real important over the last year with COVID. And so a lot of our customers have run scenarios inside of their cluster saying if I were to add, for example, a thousand VDI users, what would that do to my cluster? And what would I need? We can determine that.

And then that way you're only procuring the infrastructure you need based off of the modeling you've done. And that's a cloud-like capacity expansion, right? We're not going to overbuy, we're not going to under buy. We're going to buy what we need and expand rapidly when we need to. So just revisiting all that and kind of putting it into one slide. So, observability giving us the full stack view, being able to see down to the storage, the compute, the virtualization all from one place, the operations administration or automation, higher productivity, right? My admins are no longer adding VCPUs or increasing memory or adding storage space. They're worried about proving the environment. They're not having to take care of things that our automation engine through Prism Pro can take care of. Cost management. Needs to reducing waste. We're able to see what VMs costs we're able to see what VMs are over-provisioned and are costing us more than they're worth, more than they're using, so we can reduce waste.

Capacity optimization, right? Only procuring what we need, using what we need, and being able to accurately forecast and model out future projects gives us higher return on investment. And proactive remediation gives us healthy services. If we're auto fixing things before they become an issue before customers and users notice, then we're providing everybody healthy services and protecting the performance of our core and important applications. So, I just want to touch on some of our, now that we kind of covered the basis of what Prism Pro can do, I want to cover some of the, some of the customers that are using it and how they're using it. And then I'm just going to touch on a couple here. So, Home Depot, they've had a 50% reduction in internal chargeback. They've had a zero on-plan on capacity, right?

So, I mean, you got to imagine they have busy seasons and seasons that are not so busy, right? In the summer, people in spring, people are coming out to do a lot of work. They know that they're capacity is going to increase their websites and their systems are going to be hammered hard. They've been able to accurately forecast what they're going to need using Prism Pro for their IT infrastructure. Moving on to the, we can go to the far right one Western Washington university, right? 20 hours unlimited for infrastructure management, by just being able to take 20 hours a week off the table, because they don't have to add VCPUs because they don't have to do a lot of these low-level tickets that virtualization administrators or storage administrators are typically doing. We can do that through automation or just sort of one click of a button, increased visibility with single pane of glass and gain the insights for over-provisioned VMs, right?

For small organizations, ensuring that you're getting the most out of your VMs and not over provisioning of VMs is a large deal. So if you're with a smaller IT team or you're with, or even a larger one, if you have lots of VMs taken up or lots of infrastructure taken up resources that they don't need, and you don't have the insight that, you could be wasting a lot of infrastructure and wasting a lot of money. But having good insight to be able to restrict that down to what it needs and it gives you back and it gives you more ROI on your current infrastructure. Okay. So now we're going to jump into what actually looks like. So I've talked about how some customers are using it. I've talked about what it can do in the core concepts, but I want to show you just one example, and I'm going to walk through three or four slides here that kind of show out how easy it is to build out this automation.

And this is a more simple example. We can get as complicated with this as we want, but I think this is kind of a simple example to show, how we can protect, like going back to my protecting performance, how we can do that for like a core application here. So, in Prism Pro, this is the example of triggers, right? So we have triggers that kick off automation, right? Having the infrastructure provide data, which provides a signal, which provides an action, right? We're going to follow that flow here. So, in this case, our trigger you're can be in a work, which is a signal, or it can be manual, or an event that happens, a specific time, a web hook, an API call, right? So, there's a lot of things that can kick off our automation. But in this case, we use an alert, right?

That's data that the infrastructure has given us. In this case, we'll be using high CPU utilization, right? So, high CP utilization or memory utilization, it's an alert that the infrastructure is providing the data, and we're going to take that as an actionable signal. Our light code or code this automation provides a lot of the things that you would script or code out of the box through a gooey without having to know a bunch of code. So, whether it's things like powering on a VM, taking a snapshot, you can see at the bottom, adding a CPU, just normal things that we would do our infrastructure, and you don't have to code. You get an alert that a VM has not enough memory, it's at a hundred percent memory, we'll just click the button here to add it to a workflow, to add more memory, and we'll see what that looks like.

A couple that I do want to highlight are the Powershell and SSH with the ability to remotely execute PowerShell or jump into a box with SSH, there's a ton that we can automate, right? So if you do have scripting experience or you do have coding experience, you can take this to the next level where let's say there is an event inside of Prism, it's given you some data, it's given you a signal and you want to kick it off a PowerShell script, you can do that. And there's even more actions that we can conduct and we can do web hooks so we can talk to Service Now. So let's say, again, going back to our example of the VM running hot, we can issue a ticket and then add the CPU or add the memory and then close the ticket all through automation, all through the interaction of Prism Pro and Service Now, which I know is a very popular IT management system across the federal government.

So, again, lots of actions, right? So now we have our signal and we're going to pick our action here, which is going to be to add memory to a VM. So, here's a example of a playbook, right? This is what we call the automation, playbooks. The kind of, when we have them all, all the different steps laid out and all the different... The trigger, the actions, right? So in this case, there's going to be a trigger and it's going to be alert, and you can see on the right the alert policy is at the memory's constraint, this VM is constrained on memory. It doesn't have enough memory. It's probably effecting its performance. And so end users might be getting a slower application, longer login times, whatever it is, this particular VM needs memory, it's starved on memory, it's using all of its available memory. So we can just follow the simple workflow of your automation.

Prism's going to take a VM snapshot for us, just in case adding the memory causes issues. We have a snapshot to roll back to, just some protection there. We'll go ahead and add the memory to the VM and with modern OS's, modern versions of Linux and monitor versions of Windows, you could hot add memory and CPU. So you can do this on the fly, or you can do this during the day Windows and Linux can handle the ability to add additional resources on the fly. And then we'll resolve the alert. We'll go into Prism. Prism will check the alert for itself. And then we'll send off an email saying, "Hey, I've added resources for you." Just so the IT team knows that this is being done on their behalf, right? And we can add all kinds of actions in here against Service Now, or send a message to Slack, things like that.

So there's all kinds of different things we can do here, but that's an example of a playbook. And then one of the things we just added to Prism Pro was the ability to meet KPIs with what we call autopilot. So, similar to alerts, we can just monitor their performance in certain VMs, and when they fall outside of the KPIs that we defined, we'll then take an action. So in this example, this is actually a snapshot that I took out of my lab yesterday. I got two VMs running, and I've defined a KPI of memory usage that if they go over 70%, go ahead and add memory. You can see the two VMs in the bottom there the two lines are well within the KPI. So Prism's going to take no action, but if they were to arise outside of the KPI, that they rise outside of the target metric that we want them for, Prism will automatically add memory.

So we can just define, "Hey, this is what healthy looks like to me. This is what healthy looks like for this application or this application. I'm going to define those KPIs." And anytime you get outside of those, just auto add resources. So we're protecting the performance of our VMs. Our admins are not having to monitor ICP usage or any of that. Again, as a former virtualization lead, spent a lot of time looking at charts, making sure VMs were performing properly, Prism Pro can do that for us. We don't need to do that. It's hands off and we can send it to where we want it. We can approve those actions but Prism will take care of the automation for us. So, just kind of a licensing, just to touch on the different versions. Every Nutanix deployment comes with Prism Starter. That's where you get the one-click upgrades.

We haven't quite talked about that today, but that's the ability to very easily compared to other IT platforms, upgrade your infrastructure and do all of the large scale management API management with Prism Starter that comes with all checks implements. With Prism Pro that's where you get the machine learning forecasting, anomaly detection, the low code, no code automation that I just showed you and capacity optimization, being able to determine what VM's are over-provisioned under provisioned. And then Prism Ultimate includes application visibility, application discovery, and the cost and cost chargeback that we've talked about. Application visibility, application discovery, we haven't quite talked about that in today's presentation, but that's the ability for Prism Pro to actually determine what applications exist in your environment and see how they're mapped to each other from a networking perspective, that's all in Prism Ultimate.

So just know there's Prism Starter, all Nutanix deployments have that. Prism Pro, when you get a bulk of the additional features, including the stuff that we talked about today, when it passed the optimization, and really that low code, no code automation, that's the big one here in Prism Pro. And then moving into Prism Ultimate the big one, there is the cost monitoring and charge back for those who need it, but those are our different licensing levels. Journey to improve IT operations efficiency, so where were we? Where have we come? Where's Prism Pro brought us, right? So, back in 2016, we started using machine learning to forecast in the tech things and do proactive monitoring right? Back then it was really about using the data efficiently, getting good information from the data and then humans interacting on that. And then we've moved into smart automation where we have actionable signals. So we get that good data that we got out on earlier stages and we're able to actually conduct some automation and rapid automation based off of that. And that was kind of the 2019 frame, right?

And then into 2020 in service availability machine learning from multi-dimension and full-stack visibility, right? Being able to see storage, being able to see application layer stuff like SQL server, being able to see compute metrics like VM and CPU, right? And then having service impact protection, really being able to protect the performance of our core and most important applications and databases. And then finally, where we're at now, I'm talking, going back to that autopilot and those KPI self-tuning performance under human control, right? Where Prism is going to take care of tuning performance for us under the guidelines that we set for it, and quality and productivity boost, right? We're going to massive productivity priests out of our IT teams because they're not having to do so much of this handholding of applications of databases, Prism will do that. They set the KPIs. This is how I want it to perform. And then their hands free.

So, prism compared to other virtualization platforms, other infrastructure platforms out there, we've got the simplest and easiest to use user experience. Again, I come from managing a very large infrastructure for a federal agency and many different consoles. I lived in consoles and Prism, by far, is the most elegant and usable and efficient interface and experience that I've seen from an infrastructure provider. That's got that true cloud-like experience that we're coming to expect out of our IT solutions. It's got an elegant approach for full operations and it's easy to achieve ROI with subscriptions, right? So just huge advantages with Prism.

Speaker 1: Thanks for listening. If you'd like more information on how Carahsoft or Nutanix networks can assist your organization, please visit www.carahsoft.com or email us at nutanix@carahsoft.com. Thanks again for listening and have a great day.