Episode Transcript
Derek Robinson 00:00:01 Welcome to Data Points, a podcast by InterSystems Learning Services. Make sure to subscribe to the podcast on your favorite podcast app. Links can be found at datapoints.intersystems.com. I'm Derek Robinson, and on today's episode, I'll chat with Thomas Dyar about AI Link and IntegratedML.
Derek Robinson 00:00:34 Welcome to the Data Points podcast by InterSystems Learning Services. On today's episode, I'm joined by Tom Dyar, who is a Product Manager at InterSystems for IntegratedML, the SQL-based machine learning tool within InterSystems IRIS. Now, Tom has joined us for multiple episodes in the past, and so if you're interested in exploring those previous conversations about IntegratedML, about AI and analytics, machine learning in general, you can check back through our episode list and find the ones that he was on. Today we're talking about a few different topics still all around that AI and machine learning category. Notably, Tom will share with us a new feature called AI Link, which is currently in an early access stage. And among other things, it bridges the gap between data scientists and business analysts. So we'll talk a bit about that, and this really ties in heavily with the conversation we had last episode with Carmen and Mary Ann about the semantic layer in your data. We'll also get a bit of an update from Tom on the state of IntegratedML, and hear about the latest developments there after we've talked about that in past episodes. So without further ado, let's hear from Tom.
Derek Robinson 00:01:38 All right, Tom Dyar, welcome back to the podcast. I think your third time joining us. How's it going?
Thomas Dyar 00:01:42 Very good. Thanks very much, Derek.
Derek Robinson 00:01:44 So today we're gonna kind of follow up actually with a similar topic to what we had in our last episode with two of your colleagues, Carmen and Mary Ann, about Adaptive Analytics. And today our focus is shifting from the analytics side to the kind of the machine learning side and some of the bridging technologies that we have, right? So, just to give a little roadmap of the conversation, we're gonna talk about both this new feature AI Link that has an Early Access Program for it, as well as kind of circle back to IntegratedML. Sound good?
Thomas Dyar 00:02:12 Great. Thanks.
Derek Robinson 00:02:12 So just to get it started, you know, just for the people that maybe haven't heard our earlier episodes or are brand new to all of this, can you just give us the kind of 10,000-foot view on what IntegratedML is as far as the, that's really kind of your heart and soul at InterSystems, right? You're a Product Manager for that product. So give us a little bit of a kind of just quick refresher on what IntegratedML is.
Thomas Dyar 00:02:34 Yeah. So IntegratedML is a toolkit within our (InterSystems) IRIS data platform that's really just all SQL, and very simple syntax to build and train and then use machine learning models right on your data that's in IRIS already. So, if you have a table or a view that's got a bunch of columns that relate to a single column that you want to kind of predict, you want to understand the relationship within that table or view, you can use IntegratedML to quickly build a model. And once that model is built, it's going to be able to predict new rows in that table. Say there are orders or something like that, and you have some machine learning model that's going to predict some aspect about those orders, and you get a new row in, and you can just use a very simple SQL function to get that prediction—super general, and super easy to use.
Derek Robinson 00:03:30 Right. Nice. Nice. And so I know from the content that we helped to build in that as well, a key part of that audience is really new people to machine learning, right? People that have SQL experience but are not data scientists. They want to have low barriers to entry with machine learning. And so kind of just having all that right from a SQL interface has been, you know, very easy in my experience with it, and I think is intriguing when it comes to the use cases you mentioned there. So kind of shifting that conversation, and we will circle back to IntegratedML bit, but you know, you have that IntegratedML machine learning side, and then you have that analytics side with the kind of like understanding and being able to see your data and everything with Adaptive Analytics and all those other applications of it. So tell us a little bit about AI Link and what that is, currently in an early access stage, and how that might bridge the gaps between, you know, what you mentioned there as machine learning, running models, making predictions, and that holistic view of your data and being able to make business decisions and everything like that.
Thomas Dyar 00:04:29 Yeah, so AI Link is a new feature within our kind of Adaptive Analytics suite. And what it is, is it's a Python-centric, or a Python package, mostly, that has a bunch of functionality to do that bridging. So for example, you can, from the Python side, if you are working in a notebook environment, or you're trying to develop some data engineering kind of pipelines, you have access to that entire BI model, the semantic layer, as well as all the data that's in IRIS. And you get holistic functions that then allow you to make features, make new columns within your data set, which are just calculated measures in the BI world. And you can then pass those back and forth between whatever other tools are in your stack. If you're working with some other machine learning platform such as DataRobot, DataRobot can do some of this data engineering—it does data feature engineering, automated. And when it comes back with features that really drive that prediction, or drive that insight that you're trying to find, you can then take that feature that DataRobot found and publish that in your AtScale model back into the semantic layer and use that as an insight in any dashboard you want. So it's very powerful to bridge between the machine learning side on the one hand and that semantic layer and the business view of your data on the other.
Derek Robinson 00:06:09 Right. Nice. Nice. And so I think, the way I think I just heard you describe that basically, DataRobot is kind of an equivalent example to IntegratedML in this case, right? Like, you can really use any other endpoint for machine learning there that, you know, with AI Link, it's not just a, a direct partnership with IntegratedML, like a specific integration. You could use other machine learning tools as well.
Thomas Dyar 00:06:32 Yes, very general. And what the key functionality within Adaptive Analytics and AtScale is, is really translating the BI view of the data. Say you have a time series or a temporal kind of data set. So for example, within our Early Access Program, we've got an environment set up with Adaptive Analytics that has a huge data set that was published by Walmart and has been used in Kaggle competitions, to kind of benchmark machine learning algorithms on predicting the sales of various items within their catalog, essentially, and predicting the inventory levels that you might need to satisfy that demand. So you can use any number of like additional bits of data that match, say it's weather that can, that affects that demand. And, you're gonna put all that data within Adaptive Analytics, and once you define those relationships, like where is the time axis, what is the units that you want to keep track of, like stock units or inventory…and define those dimensions within your semantic layer, Adaptive Analytics automates a lot of the time management. So you wanna look at, say, the rolling average of demand over a certain period of time, say 30 days before the time when you need to make the prediction. That can be a little bit daunting to program up in SQL and then incorporate into some pipeline. Typically people pull out the data, do some manipulation in Python, and then put the data back in. You get a bunch of, you know, data management issues right there. So Adaptive Analytics actually has all that logic built in and makes it much higher level…exercise, to manipulate that data and be able to get insights from it.
Derek Robinson 00:08:45 Right, right. That's really cool. I think, you know, from our past conversations, there's a few key things that stand out to me there. One is that, you know, like first of all, as you're sitting there talking about time series, I think it's evident that we could, you know, you could easily have a much longer follow-up conversation about the details, and really kind of fleshing out the use case you just described. And what that says to me is that it's actually increasing the audience that this pairing of technologies can apply to, right? From what we talked about originally with IntegratedML being really targeted at the low-barrier, like very easy-to-use machine learning, right? And then unlocking more of these capabilities to be able to use in conjunction with Adaptive Analytics, and really having an option to do something with your predictions more than just kind of a SQL statement or the things that we have in our learning exercises and stuff like that, right?
Thomas Dyar 00:09:35 Yeah. So, one thing that is really cool about Adaptive Analytics is that since it's so SQL-centric, it allows you to just define calculated measures as any kind of SQL, as well as MDX expressions that you wanna put together that are more geared for those kind of rolling averages, the means and sums, and aggregated functions that are more, you know, more verbose than SQL.
Thomas Dyar 00:10:01 So another measure that you can make is just a prediction. So you can actually use the SQL function, the IntegratedML surfaces for your predictions, and you can just put that, just drop it right into a calculated measure. And now you can line up your predictions against the data in the past and build a dashboard on all sorts of plots in Tableau or whatever your BI tool is. And so you are, as you mentioned, just kind of expanding the kind of the utility of the base SQL as well as interesting, additional things like IntegratedML. It's very, it's very cool.
Derek Robinson 00:10:41 Yeah, absolutely. And, and I think to look at that a year after or two years after we kind of very first talked about IntegratedML, it's cool to see, right? Because that was one of the questions people might have had in the beginning is like, this is cool, what should I do now with this prediction? And I think this is a really well-developed example of what, you know, the other technologies can help enable you to do. So coming back a bit to IntegratedML, you just talked a bit about time series there, kind of like the complexity of that. We're gonna try to keep this at a higher level than going into the details, but tell us a little bit about if that's something to look for in IntegratedML coming in 2023, potentially. Kind of basically the latest updates, you know, for the people that might have listened to our earlier conversations about IntegratedML, what's kind of the current state and the new things people should know about IntegratedML outside of the entire AI Link conversation?
Thomas Dyar 00:11:32 Yeah, so, we do have an upcoming feature. It's gonna be time series support. So what that means is that there are these problems similar to the Walmart stock prediction problem that are very heavily time-dependent. And when you translate those kind of problems into machine learning kind of tools and toolkits and methods, you get, there's specialized methods and specialized techniques to deal with that temporal component. For example, when you need to do validation of a model and testing on different subsets of data, you have to take into account that the data should be temporally valid. You cannot use information that is available in the future to make your predictions, if you're thinking about the past. It's called like information linkage. So you have to, you have to account for that, and you need special tools. So we've developed a pipeline to kind of handle that. So the way we designed IntegratedML originally is in this very simple kind of tabular auto ML format. You have one table or view, like I mentioned the beginning, and you've got one column that you're gonna predict on, and you're expecting that you get new rows, or you know, that this table is growing row by row and you're gonna predict one of those columns. So that's the shape of the data in a normal auto ML problem. Now, with time series, however, you really wanna get new rows because you want to predict forward in time if you had that same table and you wanted to think about how multiple rows are going to evolve, and how the data evolves over time. So we had to make new SQL to kind of capture that. And so instead of just being able to predict one particular point in time, which is kind of how we set up the Walmart problem to use IntegratedML in our EAP with Adaptive Analytics and AI Link, you'll actually be able to predict multiple points in forward in time. And that's the typical way that time series forecasting methods traditionally, like statistically, using a ARIMA methods, any of these auto-regressive methods, that that use one prediction and build on that for the next prediction and build on that for subsequent predictions.
Derek Robinson 00:14:05 Right, right. Nice, nice. So that's exciting. And I think, I'm sure people who are newer to this hear some of those, you know, buzzwords, and might need to go Google some of them and, and kind of learn more about it. I think, like we said, you know, we don't have the time to dive into all the details, but it's just, it's interesting and exciting to hear that those things are being built in, right? And you'll have the ability to kind of increase the scope of usership of that product versus just kind of the, like you mentioned, very, very straightforward SQL table that you kind of started from.
Thomas Dyar 00:14:34 Right, exactly.
Derek Robinson 00:14:35 So kind of wrapping up here, looking forward, I know you just talked about one major thing to look for in 2023, but another thing that's been a theme across many of our products has actually been this move to the cloud. So kind of my ending catchall question is for you to tell us a little bit about what IntegratedML in the cloud is gonna mean in the future in 2023, and then just in general, if you want to tie into that same answer, what people can do to learn more as far as, you know, what to look for in the Developer Community, if there's any opportunities to use the cloud service, things like that.
Thomas Dyar 00:15:08 Yeah. So, very exciting times around here because we are going to be offering more and more of our functionality through the cloud. And one of those additions to our product portfolio will be a fully managed IntegratedML-as-a-Service, essentially. And I got a chance to look at a preview, and we actually used a preview in a hackathon here, in Cambridge. And it was really well received. Kids…kids, I say <laugh> because anybody that I see at a college, looks like <laugh>, very young. So anyways, these hackathon participants were able to quickly just get set up, import their data, and use IntegratedML—create, train, validate models, and incorporate them into a unique solution that they developed over 24 hours. It was really interesting to see. So we're going to be rolling that out. There should be some Early Access Programs coming up that you would be able to see in the Developer Community, as well. We're going to have some programming contests as we always do. And it should be exciting next year.
Derek Robinson 00:16:31 Nice. Awesome. So, we will as usual put the links that are relevant for that into the podcast description here. So, certainly your ability to go check out the early access opportunities that you have with different technologies, including these couple that we've talked about. So Tom, thank you so much for joining us, and we'll have to catch up with you again when we see more exciting things develop.
Thomas Dyar 00:16:50 Absolutely. Thanks very much.
Derek Robinson 00:16:55 So thanks again to Tom for joining us for another great conversation about machine learning, about IntegratedML, some more detailed conversations in this one, talking about time series functionality, things like that. I think there's a lot to go over from that, and I think it was a good follow-up conversation from the one we had last episode with Carmen and Mary Ann about that semantic layer in Adaptive Analytics. I think the AI Link features that Tom talked about really show the way that you can open up some of that functionality to the various other users in an organization that might have more usage for Adaptive Analytics than the, you know, SQL-based interface that they'd be using for IntegratedML, right? I think it's showing a growth from the first time that we talked about IntegratedML, when it was very simply running a SQL query and getting results.
Derek Robinson 00:17:41 I think seeing a place that you can put those results and actually utilize them and make business decisions based on them and forecasting and things like that, was very interesting to hear Tom talk about. The other thing that we'll kind of close with here is the ending conversation with Tom about cloud services and IntegratedML-as-a-Service. You're gonna see this more and more, and I've mentioned it a few different times now on podcasts. The cloud service presence of InterSystems technologies is growing, and InterSystems is investing in getting these product functionalities available as cloud services, discrete cloud services, that you can launch and be able to, you know, use on demand, right? So if you're interested in more of that, certainly check out the Early Access Program page that is linked in the podcast episode, and be able to explore the different ways that you can get involved early—and just reach out if you have any additional questions about that. That'll do it for our episode on AI Link and IntegratedML. We'll see you next time on Data Points.