10. All About SAM: System Alerting & Monitoring (Luca Ravazzolo)
In this episode, Luca Ravazzolo — product manager for cloud and container technology — joins the podcast for his second appearance. He's telling us all about SAM (System Alerting & Monitoring), a new component of InterSystems IRIS that users will want to hear about.
To try out SAM, visit this GitHub repository: https://github.com/intersystems-community/sam
For more information about Data Points, visit https://datapoints.intersystems.com.
Derek Robinson 00:00:02 Welcome to Data Points, a podcast by InterSystems Learning Services.
Make sure to subscribe to the podcast on your favorite podcast app. Links can be found at datapoints.intersystems.com. I'm Derek Robinson. And on today's episode, I'll welcome back to the podcast Luca Ravazzolo, Product Manager for Cloud and Containers here at InterSystems, to talk about Systems Alerting and Monitoring in InterSystems IRIS.
Derek Robinson 00:00:39 Welcome to Episode Ten of Data Points by InterSystems Learning Services. Before we get started, I wanted to give a quick reminder to subscribe to the podcast, using the links at datapoints.intersystems.com. You can find links for Apple, Spotify, Google, and Stitcher there. I also wanted to mention that the Developer Community, community.intersystems.com, is the place to interact with us about future topics for the podcast. I typically post every episode as a Developer Community post, and you can feel free to leave comments on those posts. I might also make a discussion post at some point, asking for more ideas to try to give our listeners a chance to submit topics they would like to see covered. So feel free to join the conversation. Back to the episode, I'm joined today by Luca Ravazzolo. You might remember Luca from Episode Two of Data Points, where we covered Kubernetes. Today, Luca is telling us all about Systems Alerting and Monitoring, SAM for short, which is a new component of InterSystems IRIS that you can try out.
Derek Robinson 00:01:39 All right. And welcome back to the podcast Luca Ravazzolo, joining us for the first time since way back on Episode Two, I think it was, of Data Points. Luca, how's it going?
Luca Ravazzolo 00:01:47 It's going very well, Derek, thank you very much. Yeah, it was early days and podcasts from then, right?
Derek Robinson 00:01:53 Yeah, exactly. You were you were one of the first three guests we had in that initial launch, which generated a lot of excitement around it, and I think people are still enjoying the podcast episodes, and we're happy to have you back. So Luca, I know that one of your areas of expertise here at InterSystems is cloud and containers technology. And so what we're going to talk about today is related in some ways, but also a little bit into the system administration area, and that is SAM, or Systems Alerting and Monitoring. So Luca, can you tell us first for the audience, what is SAM? And you know, as we've mentioned offline, it's coming out very soon for InterSystems users.
Luca Ravazzolo 00:02:29 Yes, absolutely. Yeah. We are counting the hours. So, hopefully
everybody can start playing with SAM with the preview that will be launching soon. So System Alerting and Monitoring, or as we call it in a more friendly way, SAM, for short, is just a simplified way of monitoring your InterSystems IRIS clusters. And we wanted to make it very simple. That is really is the main objective. And you know, the reason for that is because we are painfully aware that there are many sites and installation around the world where people don't have time. They don't have resources to either provision and work with a commercial solution or with an open-source solution that you need to cobble together and make sure that it stands up and it works and is fault-tolerant and all that, to monitor systems. So we said, you know, we need to end that era. Everybody needs to be able to monitor their system and not call the WRC, not call support because they have a moment of downtime, because they forgot, if they run out of storage space. People should be really, followed up so they can have alerting and monitoring. And this is what SAM is for. It's very, very simple to set up and run.
Derek Robinson 00:03:53 Nice. So I think we'll get to a little bit of what makes SAM so easy in
comparison to some of those complex setups that people could have for their monitoring and
alerting, but for starting the conversation with some existing InterSystems users, maybe the
people that have been using InterSystems Caché, Ensemble, and now IRIS, IRIS for Health, for a long time, how does SAM differ from what was already available and InterSystems IRIS, both as far as functionality and then kind of transitioning into, like you said, maybe the bigger part, which is how easy it is?
Luca Ravazzolo 00:04:22 Yeah. Well, it's very different. And the reason for that is that basically
you're going to have a UI, a web-based user interface that graphically shows you the metrics
and what they're doing. And this doesn't mean that any of the tools that people are already using now, any of the libraries that we have within InterSystems IRIS, are being taken out. No, everything stays exactly the same. We just added this capability, so that if you've got something that you built on those tools, carry on and that's fine. And the main difference is that we have all those metrics inside our inside our engine. And you know, they're very, very powerful. We have all kind of metrics, but the problem is how do I externalize those metrics, and how do I visualize over a time series, you know, stretch of a screen to understand a trend, for example, and that's always been the problem. You have to externalize them, then maybe you point Excel to that, or some other tool, you do some massaging of the data, but all of that takes time and effort, and we want it to be a lot more intuitive and easy. And that's what SAM is. It's very different because it offers you a UI that very easily connects to InterSystems IRIS on the back end.
Derek Robinson 00:05:40 For someone who is using a bunch of third-party tools, or maybe they
have a setup right now that they use for monitoring, for reporting, and the way I kind of think of it is that it's complex enough that the next person would need to be trained pretty heavily to figure out how to use all this stuff they have connected, right? How does SAM make that easier for those people that have these existing setups, and then maybe what are some of the details and the reasons behind why it's simpler?
Luca Ravazzolo 00:06:04 Yes. So we've used SAM, System Alerting and Monitoring, as an appliance. So you just take it the way it's configured, and now , and you just run it. And as you run it, it has a series of components that it puts together and makes sure they are well configured, and basically your job as somebody that wants to monitor a cluster of IRIS instances, either with many nodes or many sharp nodes there, is just to tell SAM where those endpoints on the back end are. All you need is just an IP address and a port and a socket to define where we're going to pick up those metrics, and that's it. Right? So from an external point of view, it's really that simple. You run it up, we do all the configuration of the different components — I'll go into that in a second. And you just tell SAM where your IRIS instances are, that's it. And then start enjoying yourself in terms of looking at what the metrics say, and drilling down into the single instance from a cluster, -level point of view, to see how your instance is doing on the backend, and it's graphical and intuitive. And that's it. Very, very easy. So, going into the details of this appliance that we just run up and it does everything, right? So, first of all, it's based on containers technology. And we leverage some open-source technology that's out there, probably best of breeds in their individual fields. And we offer this as a Docker-Compose definition. So Docker Compose is just a small engine that reads a YAML definition. Within that YAML definition of the components that we have, there is first of all what we call a SAM Manager. So it's an IRIS instance that deals with a lot of complexity and managing and configuration files, for example, and talking to these other components. And then we have Prometheus. Prometheus is a leading cloud-native computing foundation, promoted project, that basically you can find in any cloud and any Kubernetes engine that runs out there to monitor the platform. So we leverage that as well, just because it does a very good job at it, right? So we're leveraging that because it's very good at metrics. And then for the graphical part, for the more graphical part, from the single metrics, we leverage Grafana — again, probably one of the best metric visualizers out there, again, very, very well-known, and . Now, if you are an expert, or if you're already using Prometheus and Grafana and other things, then you can go in, and of course, tune it to your heart's desire and really do exactly what you want. But even without knowing them, you can appreciate all of a sudden, you have a very powerful solution in your hands that allows you to really, graphically see what's happening in your IRIS instances. However, I would like to underline another fact: that if you just read only metrics from an instance, you only have a partial picture of what's happening out there. And that's why the name of the product is System Alerting and Monitoring. So we monitor metrics, we can show you the metrics, but we also show you the alerts that come from the back ends. And why is that? Well, because if I ask you, well, what does that mean that that particular system is running at 85% CPU? Is that a problem, or is that okay? Maybe it's Black Friday, and you are very happy because you're becoming rich because there's a lot of orders coming through your website, right? And so the point here is that if that 85% is not accompanied by any alerting, then it's probably good. You know, you can maybe go and have a look at the global references, and maybe you can also piggyback — and this is another nice thing about SAM, your application monitoring on the SAM infrastructure. So you understand you'll be able to see, oh, these are effective transactions happening on my system. So that is a good CPU peak that I see. If, on the other hand, I see an 85, 90% CPU peak, and I also see, for example, some IO latency on the disk, and also an alert coming through saying that maybe there is some write problems on one of the storage systems that I have. Then I know that something is looping there or something's stuck, and you know, that system needs attention ASAP. And so by combining alerts, strings, and metrics, we're able to offer you a picture, still a partial picture, but it's a more comprehensive picture of what's happening, for you to understand and maybe act on it. And I think that adds a lot of value. And as I said, you can also piggyback on the same infrastructure, your application-specific metrics that you might want to count.
Derek Robinson 00:11:24 Yeah, it sounds like there's a lot going on there. And I think, one of the things that jumped out to me earlier in that answer was, and I think I might've seen this in some other materials on that one way you could describe it is that it is native, but it's open. And it's a really good that, like you said, may be expert in these other technologies that are connected or may not be, and just really need a simpler way to access the performance of their system and the essential things that they need to see about how their applications and their systems are performing. It sounds like a really, really useful tool for them.
Luca Ravazzolo 00:12:00 Yeah, right. And like underlining what you just said in terms of the open and the native nature of the SAM solution. So it's native in the sense that it's InterSystems native solution. You take that appliance, you run it, and you're off monitoring your systems, but it's open in terms of the open source technology that we're leveraging. So for example, if a user, even if he's not an expert says, yeah, okay, I appreciate the default template that you've given me, InterSystems, for some of these metrics, but I would like to see display of these other two metrics, for example. And I would like it to be displayed with a different type of charts, right? Well then, you don't even have to call us; you can go on the internet and Google Grafana, chart, xyz that you like, how do I set it up — and just do it yourself, it's just all there! So there's so much support, so much availability. Grafana's just come out two days ago with a new version, version seven, I believe. There's really, the community is really working hard. And I think that there are really fantastic tools, and by leveraging them, we offer the best of both worlds.
Derek Robinson 00:13:06 Yeah, absolutely. That sounds great. So moving to one more question
before we kind of get to the last part about how people can try this out, what types of roles, as
far as the people that are interacting with these systems at a typical organization, potentially, what types of roles is SAM going to be most beneficial for and provide the most new advantages for, and what are some of the examples in how those people will benefit from using the SAM product?
Luca Ravazzolo 00:13:31 Yeah. Good question. Yeah. Thank you. So originally the idea was that,
when you monitor a system, you have this image of this black room, we call it the NOC rooms, where you have these massive screens up on the wall, and you've got a few rows of chairs and this expert that keeps looking at these big screens, right? And that's the NOC room operator, right? And those rooms, you can't even walk in sometimes; they're really like, they're very segregated. Only this particular user can go in there because you can also drill down into data, et cetera, et cetera. So that's one type of user, and those users usually have very sophisticated products, usually commercial products. And this brings me to another underlying factor of the , that every InterSystems IRIS instance from 2019.4 actually has an inherent, a built-in Prometheus exporter. What does that mean? It means that you can just point your tool — it might be a commercial tool, an open-source tool, a cloud-monitoring tool. Most of them support the Prometheus format because it's very simple, and they can monitor IRIS systems. And this is very, very important. And also consider that you don't have to go and find the exporter, download it, install it, configure it; it's built into each IRIS instance. So this is very powerful. So for the NOC room operators, they can use what they have, or they can install SAM and tune it to their liking because there's Prometheus and Grafana in there. But having said all that, remember that in the new world of DevOps, where there is accelerated development pressure and springs…with agile methodology as well, the CI/CD provision in pipelines. You know, everybody wants to know how it works. So if I'm a DevOps manager and I have a couple of teams, right, that are running with building a couple of microservices or a couple of new add-ons to an old monolithic solution, I want to make sure that from the moment they check in code to the moment that gets built and start in a quality assurance, testing phase, to a pre-production to UAT, all the way to production, I want to be able to monitor every single environment. Now, with one single installation of SAM, I can define multiple clusters, and so I can view multiple cluster instances and how they are doing. So this is very powerful, too. And to underline again the type of user that can leverage SAM, well, within the SAM solution, the appliance will also have an alert manager. And so, we can have system administrators being paged, or messages and alerts can be sent to Slack channels, et cetera, et cetera. And so, you know, all kinds of users can benefit from an installation of SAM.
Derek Robinson 00:16:36 Nice. So that's a good overview of kind of those key areas and where they can draw the most value from this product. So I think a lot of people listening might fall into one of those boats that you just described. So how can people try it out and learn more
about SAM, which as we mentioned is, by the time you're listening to this, may already be available to try?
Luca Ravazzolo 00:16:53 Yes. So, I think that by the time you wrap this up and clean my English here, (laughs) everybody should be able to pull this. So we have the SAM Manager, which is built on IRIS, will be a container available from Docker Hub. So it will be, with a community edition, built-in license. And then with that, you will need also a few configuration files. So there's a Docker Compose, a YAML definition, a few configuration files, of Prometheus, Grafana, alert manager, et cetera, and Nginx. And those you'll be able to find them at github.com/intersystems-community/sam. So there you can just download these few files. We'll be providing also a small tarball. And with that, just run the Docker up. Everything will be described in there, what you have to do, and all the containers will be pulled automatically from Docker Hub anyway. So you don't have to worry about anything. It should be all automated.
Derek Robinson 00:17:51 Nice. Well, that seems like a very easy way forward and a great way for people to try it. And we'll make sure we include the links to those resources in the podcast
episode description as well, to make it easier for everyone listening to go try it out and get in
touch if you'd like to explore more. So Luca Ravazzolo, thank you so much for joining us, and we'll see you next time.
Luca Ravazzolo Thank you, Derek. It has been a pleasure. Take care now.
Derek Robinson 00:18:14 Thanks again to Luca for the insight about SAM. As we mentioned, SAM is being released right around the time of this podcast episode, so if there aren't any links yet in the podcast description, keep checking back for those. As soon as it's released and you can try it out for yourself, we'll be sure to update the links in the podcast description. That'll do it for Episode Ten, and we'll see you next time on Data Points.
Brought to you by InterSystems Learning Services of Data Points