A Deep Dive on Mirroring with InterSystems Products (S2, E10)

[00:00:01] Derek Robinson: Welcome to Data Points, a podcast by InterSystems Learning Services. Make sure to subscribe to the podcast on your favorite podcast app. Links can be found at datapoints.intersystems.com. I'm Derek Robinson, and on this episode, we're interviewing InterSystems Service Executive Chad Severtson and J2 Senior System Engineer Greg King about mirroring in InterSystems products. [00:00:35] Derek Robinson: Welcome to the Data Points podcast by InterSystems Learning Services. I'm Derek Robinson. Today's episode is a lengthier one, but it's an episode that I think fills an important gap for many of our learners. The conversation today is all about mirroring, and it features two conversations with experts. The first interview you'll hear is with Chad Severtson, one of our service executives here at InterSystems. Then you'll hear my interview with Greg King, a senior system engineer at J2, one of InterSystems partners. I'll preview the combo with Greg a little bit later, but first, let's hear Chad give us a high-level overview of mirroring and then dive into some of the important considerations learned from pain points that he's seen with customers. [00:01:22] Derek Robinson: All right, Chad Severtson, Service Executive here at InterSystems, joins us on the podcast. Chad, thanks so much for taking the time. [00:01:28] Chad Severtson: It's great to be here. [00:01:29] Derek Robinson: So one of the topics we were just talking offline before we started recording that is most commonly submitted to support, that is a part of the conversation with InterSystems users that needs to be addressed, is mirroring. So that's kind of the topic we're going into today. So kind of starting, for people that may not be innately familiar with this, give us the 10,000-foot view from the beginning of what mirroring is and why people, specifically InterSystems users, should care. [00:01:55] Chad Severtson: So mirroring is kind of classic for us. It's the great power, great responsibility paradigm that we know and love so much. But mirroring is really just the replication of journal data between two or more instances. That's it. That's all it really does. And there's some background mechanics for high availability failovers and disaster recovery, et cetera. But really we have this economic system where we can quantify exactly how much we value something. And to me, data availability is no different. It's a business decision at the end of the day. So it's incredibly expensive to not have access to your most important data. I don't just mean dollars. There's human costs as well. I've been in hospitals late at night when their systems are down for one reason another, and it comes to a grinding halt. So there's plenty of situations where waiting to restore a backup isn't a viable solution. And you have specific recovery point objectives or recovery time objectives that you have to meet for business reasons, especially when that time will grow as a function of data volume. So mirroring kind of fills a gap there. So a lot of the shiny modern paradigms like Kubernetes Pod auto-restarts or storage replication don't actually insulate against some of the more common failure modes. Mirroring basically solves physical data integrity issues, and it's a great way to protect against ransomware with a little extra work. [00:03:22] Derek Robinson: Taking a little bit of a closer look at what we might see in a typical scenario, what are we working with? I know people are probably familiar with normally having their databases, whatever servers their systems are running on, maybe they're running in the cloud. Just from a general standpoint, tell us a bit about the architectural setup of what a mirrored kind of deployment looks like. [00:03:45] Chad Severtson: To me, the key to successfully using mirroring is identical twins…or triplets or quadruplets, if they are really out to get you, or you just really value your data. Make every aspect of the systems as identical as is feasible. So I use the "everything is code" philosophy here quite extensively. You need to have a single source of truth and then deploy out of that so that there's no drift between your instances. That's often where folks get into trouble and ended up calling support. So the other piece with that is that you have to make sure your twins aren't conjoined in any way. Mirrored systems shouldn't have any shared dependencies. The real-world example I remember from the early days is you need to make sure a janitor can't unplug one thing that takes both mirror members offline to plug in their floor buffer. That did in fact happen to some customers. So mirroring works by sharing journal files, like I said, just the transactional logs that are really the heartbeat of our technology, and that's why mirroring is immune to degrades. It's logically transporting rather than physically transporting the blocks. [00:04:55] Derek Robinson: Right, nice. So kind of going back to kind of a little bit of it you mentioned in your first answer, but kind of returning to the why, what are probably maybe two or three of the biggest benefits that you can see that come with mirroring? Because I assume that it's not something people might think of on day one. And it's like, what is the incentive? Why should people, beyond just good practice, what are some of the benefits that come with it when you're using InterSystems products? [00:05:23] Chad Severtson: Yeah, that's a great question. So as it says on the tin, mirroring can provide a high availability solution with automatic failover. But to me one of the biggest benefits is that using mirroring encourages some healthy practices. So to use mirroring successfully, you need to have coherent change control processes. I mean, this isn't really controversial in 2023, but it was a pretty big deal back in 2010 when we first launched the feature. And as I mentioned earlier, there's many different failure modes where mirroring provides much better insulation than other technologies. I mentioned the dreaded database degrades, knocking on wood. There's also things like the Linux out-of-memory killer where it's going to take a whole lot longer to restart the instance and possibly the operating system than it does to failover. So for the most critical applications, this is cutting that pause time down to seconds rather than having an outage that lasts minutes. [00:06:19] Derek Robinson: Right? Yeah, that makes a lot of sense. You mentioned a few times in that answer databases, and obviously databases make up the real heart of most people's applications and what they're doing with their business and the things that they're solving in the world. But there is another piece of this, which is a lot of times there's stuff that is sitting outside of those databases, right? What's the answer, and how does that factor into mirroring, when you think so database-centric when you're thinking of the concept of mirroring? [00:06:47] Chad Severtson: Yeah, this is absolutely where I've seen the most challenges for customers. As you said, nothing outside the databases themselves will be mirrored, and nothing is transmitted other than the journal files…the data within the journal files, to quibble. We have some great new features coming in this area, but currently things like instance configuration, security configuration, even users within IRIS itself, system tasks and external files, are not eligible for mirroring. I think I said earlier that everything should be code, and this includes that -- that type of configuration. Regardless of what your deployment strategy is, you need to have a coherent strategy where you can deploy using the buzzword of "item potently" to make sure that it's consistent, not just between deployments, but between mirror members as well. For things like external files, though, there's plenty of different options that can be adapted to whatever your strategy is, and wherever you're running it. I've seen customers use shared storage, I've seen file replication technologies, I've seen customers use third-party solutions, more like a drop box type situation. But a lot of customers will actually bring those files into the .DAT files as well. That's fully supported with a technology we call streams, where you can take a blob and bring it inside the .DAT file itself. [00:08:15] Derek Robinson: Right, nice. It sounds to me like a lot of the elements of that answer tie in very closely with one of your benefits you mentioned earlier, which is encouraging good practices. A lot of those things that fall outside of your database go hand in hand with the things that maybe even if you weren't using that configuration would be good practice from the standpoint of maintaining your system and building your applications. [00:08:36] Chad Severtson: Right, yeah, exactly. It's not something that we talk about too often, but to me, the healthiest customers are the ones that are using mirroring because of those fringe benefits. [00:08:46] Derek Robinson: Right, exactly. So moving to a few topics that I think a lot of people who may like myself even, who haven't been necessarily hands-on with mirroring but have learned about it and familiar with conversations around it, two concepts that tie very closely to mirroring are high availability and disaster recovery. However, I don't think, I get the impression that they're not exactly just the same as mirroring. So tell us how those two concepts relate to mirroring in this conversation. [00:09:12] Chad Severtson: So to me, high availability and disaster recovery are the specific business objectives, and mirroring is the tool that you're using to achieve those. And I often find that the line between the two is blurred. And depending on how you have things set up, one can meet some of the needs of the others, depending on your specific needs, of course. But with our mirroring product, our async mirror DR members can be promoted to full synchronous members. This can be incredibly seamless during periods of degraded services. I always think of a squirrel nibbling on the network cables, but if there's a total outage, like hedge clippers were involved and really caused some network isolation, you need to more carefully decide if you want to promote the DR members. Depending on the nature of the outage, you may choose to accept some data loss for greater availability. It's ultimately a business decision at the end of the day, and not one you should be making at 2:00 a.m. [00:10:14] Chad Severtson: So high availability to me is guaranteeing access to the data. Disaster recovery is just a more extreme version of that, where you're saying that all of plans A, B, and C have failed. Now what? And mirroring does make that process quite a bit more seamless. So I mentioned earlier there's one area where I diverge from the identical twins paradigm. Ransomware is an all-too-common disaster these days, and mirroring can be a way to insulate against that particular form of evil. One strategy I've been suggesting to customers that are worried about that, is setting up an asynchronous DR member, but with completely separate security controls, so that if your main systems are compromised, this one can still be merrily running somewhere else. The only thing that should flow into that black box is updates from the mirror. That way you'll always have a fresh copy of your data even if the rest of your infrastructure has been taken down. [00:11:12] Derek Robinson: Right. Yeah, that makes a lot of sense. So kind of an innovative way to be making sure that you're guarded against different points of failure that may exist. All of that sounds like a great way to prevent data loss, to keep yourself from a disaster. But one of the…I don't know if I want to call it a downside, but one thing I'm hearing here is a lot of doubling up of systems, right? So tell us a little bit about the impact that mirroring might have on my cost of operation if I, let's say I implement every single thing you've described where I've got the four identical twins, and you've got all the different systems that really match each other. Tell us about the impact that it does have on your cost of operation. [00:11:49] Chad Severtson: No, you're absolutely right. Mirroring increases your cost. It's a business decision at the end of the day, and it's up to each organization to decide how much access to their data is worth, and how much that downtime costs. I mean, common factors are do they have SLAs? Will they need to pay penalties if customers or partners lose access to that data? What are the opportunity costs? I've seen a wide range of studies about how much the downtime actually can cost a business. They're older numbers, but $50-$75,000 per hour is common. But beyond that, mirroring can actually end up saving money by the end of the day by reducing the need for outages for scheduled maintenance. It's not just that future hypothetical cost; it's real world ongoing costs that it helps reduce as well. I know site reliability, engineering and error budgets are all the rage, and I'm on board with that. And that's fine for consumer-facing services, but a lot of our customers are in critical industries, and they have higher standards for availability of the system that extends beyond: does my smartphone currently have signal? One other thing that's worth mentioning about costs. One of the major sources of cost in cloud is related to storage. So it's immediately unappealing to have to have multiple copies of your data. There are ever-increasing numbers of offers out there to allow some sort of deduplication technology, and I would urge some amount of caution around that because part of the value of mirroring is that they are two completely separate and logical copies of the data rather than physical copies of the data. So I have some skepticism on that front, but at the end of the day, it really depends on what your business objectives are. [00:13:38] Derek Robinson: Yeah, that makes sense. So different angles to think about there with the cost. Kind of turning it a little bit on its head from the idea of this financial cost, what, if any, performance cost you end up paying? Is there an impact on your performance of your application, your system, anything like that? Positive or negative, I guess, when it comes to mirroring, on top of all of the kind of safety and prevention kind of mindset that you have here. [00:14:01] Chad Severtson: So, I keep a few props next to my desk. One's a copy of the NIST 1984 green book that describes minimum password lengths based on the number of guesses per second for a 300-baud modem in the 80s. The other is a speed limit sign for the speed of light. The major limitation for performance is network latency. To ensure data durability, the transfer of the journal data really needs to be synchronous. That really requires a round trip. It's the old networking joke. Want to hear a UDP joke? [00:14:35] Derek Robinson: Yeah, sure. Let's hear it. [00:14:38] Chad Severtson: I don't care if you don't get it [laughs]. But beyond that, I've seen customers demote their synchronous mirror members when performing a bulk load of data to lessen the impact. So there absolutely is a network latency cost you need to factor in, but it's not for every operation, only particular ones. On the other hand, that means it's generally a poor practice to put synchronous mirror members in separate regions, in cloud speak. Anything really more than separate zip codes. There's basically no impact for asynchronous DR members. Those can easily be in a different cloud region or even a different continent. [00:15:14] Derek Robinson: Got you. So, yeah, good things to consider there, but overall, nothing that people necessarily need to consider as like, oh man, mirroring is going to slow down my application. It's really all about that latency and being aware of where the pieces of your system are residing. So kind of wrapping it up here with the final question for you, what other considerations or best practices come to mind, especially coming back to the real, specifically InterSystems users, what other considerations should they leave with from this conversation when it comes to mirroring in their products? [00:15:47] Chad Severtson: Test, test, test, test, test. There's an expression among my people, "it's always DNS." Networking issues are a common cause behind all sorts of weird symptoms, and mirroring depends on your network and specifically the ability of all of your ancillary applications to be repointed to the correct mirror member at the right time. We provide a couple of different technical solutions for this and a couple of different reference architectures for that. But making sure that you've exhaustively tested both high availability failovers as well as disaster recovery asynchronous mirror member promotions well in advance of when you need them, is absolutely the most important practice when it comes to mirroring. One more is to avoid drift at all costs. I mentioned before, treat everything like code, have that single source of truth. Differences in resources or configuration can be a really big problem at 02:00 a.m. And this happens most frequently when you think of one particular mirror member as the one that should be always the primary. So whatever you do, don't name your instances things like "primary" or "backup." It gets really confusing when their roles shift over time. I think it's best to consider a transient role rather than an identity. If you have any other questions, we have great sales engineers, some helpful learning resources, and a world-class support organization to help you out. [00:17:24] Derek Robinson: Right, absolutely. And I can vouch for those things. You mentioned everyone in my experience and most customers' experiences, I think everyone's very helpful whenever you reach out for help on these topics. Chad, thank you so much for joining us and sharing all your expertise on mirroring. And we'll have to circle back soon to have some more in-depth conversations on it. So thanks again. [00:17:43] Chad Severtson: Great. Thanks, Derek, this was a blast. Appreciate it. [00:17:48] Derek Robinson: So there's the interview with Chad, and I want to thank him again for taking the time. Next we'll hear from Greg King. Much like Chad, Greg has a wealth of hands-on experience with mirroring. This conversation starts a bit beyond the basics and dives into what Greg has seen in the field with his implementation projects. And if you stick around till the final question, I think you'll find that Greg's key takeaway has some striking similarity to Chad's. [00:18:17] Derek Robinson: All right. And we have Greg King joining us from J2. Greg is a Systems Engineer for J2. Greg, how's it going? [00:18:23] Greg King: Good to be here. Thank you, Derek. [00:18:25] Derek Robinson: Yeah, thanks. So glad to have you on the podcast. We have been lately trying to feature more of our partners and non-InterSystems folks on the podcast to get more perspectives on some of this content. So it's great to hear your perspective here, and I guess we'll start there by having you tell us a little bit about your background and kind of what your role is at J2 and your history with InterSystems technologies. [00:18:46] Greg King: Well, starting with the history, I've been using InterSystems technology since…I think this is my 26th year. So I've seen some changes along that time. And in the last five years, I've been working with J2 as a system engineer. So I help our mutual clients, right? I help them set up the system side of using of their installation, and it's all IRIS-based. [00:19:19] Derek Robinson: Nice, nice. So in today's conversation, we're kind of honing in a bit on mirroring, a topic that is really frequently used but oftentimes lacks some of that in-depth content as far as our domain goes, to be able to really give people a deep dive into it. So first we wanted to talk a little bit about how mirroring actually plays into your role. So when you are working on implementations and using IRIS or other InterSystems products, tell us a bit how mirroring at a high level plays into what you do as far as your implementation projects go. [00:19:51] Greg King: Oh, sure. It's a vital part of all of our projects, and for the most part, the projects I work on are HealthShare, so the HealthShare stack, but there's quite a few that are in the IRIS for Health, but it's all at that level. So there's always this need in the live environment for high availability. Right? That's the biggest system-level question that comes up. And that's where we can use mirroring. There are other ways, but we can use mirroring to ensure the highest possible uptime for their live environments. It's always in there, no matter...also, whether we're VMware, virtual machine kind of infrastructure or we're up in the cloud, even had a couple of container-based installations, mirroring always plays a role. [00:20:59] Derek Robinson: Right. Obviously an important part to consider whenever you're deploying systems like that. One of the key concepts you just mentioned is high availability. And so when it comes to the concept, like you mentioned, there are a few different ways that you could achieve high availability. But when it comes to the concept of failing over to a backup mirror member. Tell us some of the ways that you look at it when you're implementing to try to increase the likelihood of that failover going smoothly, to make sure that no problems arise during the process of failing over from the primary to the backup. [00:21:29] Greg King: Sure. First, I think I would start off by saying a shout out to InterSystems Learning Services, the Documentation team, the folks that wrote the documentation, because look at the High Availability guide. That's key, right? The piece in there that we start with to ensure high availability is topology, like how are we hooking up the networks, what size are our machines? All of those pieces are the first line. So first, getting everything sort of installed and where it needs to go. And again, the High Availability guide, really every release is a good thing to reference. So get that right. So start there, and start with having a robust high availability isn't just about mirroring, but having a robust set of policies, procedures, and setup, just so that the primary side of your mirror is going to be working well. But ultimately, if you think about mirroring, and we're just talking about failover mirroring here, at least for high availability purposes. Disaster recovery would be a different concept. But for high availability, we want that failover scenario where we lose our primary side of the mirror to have our backup member automatically take over, right? I mean, without human intervention, everything just works going forward. So after you've got everything set up and the basic mirror stuff set up, keeping an eye on that idea that some things are not mirrored. So IRIS mirroring is a technology that mirrors the databases, your IRIS.DAT, the changes to a mirrored IRIS.DAT get sent over to the other side. But what about all the other stuff that's not in a mirrored IRIS.DAT? And working in the HealthShare stack, right? So that's our application, HealthShare. That application has lots of stuff that it needs to properly run that is not stored in a mirrored database file. So how do we deal with that? And that's part of that setup process, and then it's part of an ongoing process. Every time something changes, needs to change configuration-wise, code-wise, other things within HealthShare, we need to make sure that that gets to both sides of the mirror. [00:24:37] Derek Robinson: Right, cool. Yeah, that makes sense. I think going off of that, it sounds like sort of the key things, keeping that in mind, making sure that you're monitoring what is going to be covered by the mirrored database, all the things that are in your database, but also making sure you're keeping tabs on those things that fall outside of that purview. When it comes to things that are stored in other places. [00:25:00] Greg King: To build on that last bit, I wanted to make sure that there are ways of automating or setting up your mirror systems so that it can take care of as much of that stuff that isn't quote "mirrored" automatically. So trying to deal with things that are not mirrored because they happen to be in a mirrored database. We're talking about, in this case, items that are in your %SYS or IRISSYS database, right, we can't mirror that. Then the other piece would be files. So files at the file system level. And depending on how your application is structured, but HealthShare does use a fair number of CSP or the web files that end up at the operating system level. So we need some way of keeping those in sync when they need to change. Now, a web page might not change all that often, but it changes with a code update. There are lots of reasons it might change, so we need a mechanism to keep those files. Another piece is certificates that are used for your TLS and security. That's oftentimes a file. We need to make sure all of that is synced across all the different members of the mirror set. And there are operating system tools, right? There are some operating system tools that can do that. Linux would have Rsync, maybe you set something like that up. Or another option is that you have a shared disk system underneath, and you map all of your CSP or all your files that need to be the same on both sides of your mirror. They go to a shared disk underneath. So there's only one copy of it under there. But you're guaranteed that both sides of the mirror can see the exact same copy. That has to be taken into account. Now, another thing that folks do, depending on complexity of the setup, I suppose, is automating, or during deployment and progression of code up from lower environments, like from a non-mirrored development environment to a mirrored stage or UAT environment. They just are sure to deploy code to both sides of the mirror, depending on your deployment techniques. But that requires everyone to remember that and have a policy. So having an automated way of keeping that stuff in sync. And then the third thing about keeping those non-mirrored items up to date. In the HealthShare, IRIS for Health, and HealthShare realm of the InterSystems products, we've got a nice, fancy toolkit for us that's come out the last few years. I don't exactly remember when it started, but there's a mirror monitor agent that comes included. If you have an hslib database in your installation, you're going to have this. Its job is…reading the description of it in the docs, the job is to keep non-mirrored things that are in IRIS mirrored and over onto the other side. Understanding that that's there, and if you're just running IRIS for Health, it's something you have to start yourself. It's not going to just be there. If you're running the HealthShare stack, it'll be there for you. Relies on a special mirrored database that's already there, and it gives us a tool, a class, the HS.HC.SystemConfig API class that allows you to make a change on the primary side of the mirror, and then the mirror monitor agent makes sure that change flows over to the other side. Really cool. It gets better every time. There's more added to it with every release, last few releases for sure. Very stable. And if you add another mirror member, a DR member, it automatically just syncs that stuff right across over to the third one. It's well thought out. So we use that in our setup. So setting up the mirror to begin with and then we need to create CSP web apps, we need to create user accounts and roles and resources and all these things that are stored in that IRISSYS area database that's not mirrored. If we use the API, they just automatically go across. And one last thing about that, in that same vein, we also end up using…we've created some tools at J2 that we can use that check both sides, right? Moving forward, we now have the…if you're using our tools, we can keep tabs on the differences between the two sides, and either depending on your tolerance, either make the change if we see a difference, or just alert someone and say, hey, this side has 40 roles and this side only has 39. This is the one that's different. Why? Go look. I think that covers that. [00:30:55] Derek Robinson: Great. Yeah. So it sounds like there's more and more tools coming out from both InterSystems, from J2, like ways to be able to keep track of those things that you need to monitor after your initial setup of your primary and your backup, with all the things that need to be there that are not automatically mirrored out of the box. [00:31:13] Derek Robinson: So one more topic to ask about there is the concept of arbiters. Can you tell us a little bit about what those are and where they fit into the equation? [00:31:20] Greg King: Yeah. So small insignificant piece of software, it seems, but so important. Again, well highlighted in the doc. So the arbiter is the piece of software that allows the two sides of the mirror to decide is the other side up or down? This is my simplistic thought about this. And therefore when a primary side is failing, the arbiter can help the other side decide: am I alone, or do I know the primary is down? What do I know? Some failovers can automatically occur if you do not have an arbiter. But most of the failure situations would work better. It would be more seamless if you do have an arbiter properly placed such that in your network, such that if you lose one of the primary side, you're not also losing your arbiter, right? If some failure causes your primary to go away, it's not also causing your arbiter to go away. So the network topology again comes into play there. [00:32:44] Derek Robinson: So kind of stepping back for a minute from this failover scenario that we've been talking about and coming back into a normal operation, a scenario where you're not failing over to the other member of the mirror. What are some things, if any, that you need to consider when you're accessing a primary mirror member versus just if you're not in a mirrored setup and you're just accessing IRIS or HealthShare or whatever it is without a mirrored setup. What are considerations when you're accessing that primary? [00:33:13] Greg King: So the key concept, I think to grasp, to answer that question, is that only the primary side of your mirror is both read/write. So we think of it in the HealthShare realm. Only the primary side does anything, right? While you could technically use the backup member as a read-only kind of a database, someone might have a use case for that, so go for it. But if we think about the primary side being the real side, that's where data exchange, that's where we store everything, where we read everything. Whatever access your users, so not a system engineer administrator like me, but your user base or where your interfaces are coming into or something, they need to always hit the primary, even if the primary is the other side. By default, or the typical way, maybe the way to say it, is that IRIS and mirroring present a virtual IP, right? And they do that by assigning a known IP to the server. It's the primary side the mirror is starting on. So then that IP follows the primary as it bounces back and forth, potentially in failovers, and then your user base can connect. And that's a really general term, thing I just said. But your user base can connect to the VIP, the mirror virtual IP, and be guaranteed to be connecting to the server that has the primary side of the mirror. Now, it doesn't always work. And where it doesn't work by default is in the cloud. So Amazon and Google and Azure and Microsoft don't allow us to assign a VIP in the same way that we do in our own VMware or bare-metal type environment. So we have to do something else. And there are a couple of different something elses out there. And here's where I want to throw a shout out to the Developer Community or the InterSystems community, where there are some really nice articles about how people have gotten around that in cloud environments. So there's some clever use of Embedded Python is one of them. I looked at it the other day and it's great, but if you can't go down that route, what we do in the HealthShare realm these days, and HealthShare, again, if you think about HealthShare, it's actually a bunch of mirrored pairs, right? So HealthShare is a federated installation, so a bunch of IRISes all mirrored to another bunch of IRISes to create that. So what we do is…and then access to HealthShare comes in two major forms from that user community, and one is https. So coming over a secure web connection. And that's for hitting the system Management Portal or the portal specific to HealthShare pieces, or it's our APIs; exposed FHIR APIs are exposed over https. So what we have in that case is we have the IRIS Web Gateway. So the IRIS Web Gateway sits out there running on a web server, right? So it's paired with our web server, and we get to put that out in sort of a dmz -- sort of a little zone that you're not directly hitting the servers where the mirror is running our primary side, you're coming in and hitting this via load balancer and via firewalls. Further out, you come in through there. And the IRIS Web Gateway can be mirror-aware. So as a connection comes into one of our CSP pages, right, system Management Portal, HealthShare, Clinical Viewer, anything as it comes in, the IRIS Web Gateway takes over, and it knows about both sides of our mirror. And in HealthShare we have a bunch of them, so it knows about all of them. And depending on the URL coming in, it can direct that user connection off to the primary side of the mirror. That's our key way to do it. And that's in the cloud, by the way. So many things are in the cloud these days; that's kind of normal. The last one though, the other way that folks come into HealthShare or any interoperability production out there, so interfaces kind of thing, will be coming in through a TCP port, so not https. So they're either connecting to an HL7 interface or JDBC. Let's say they want to query our Health Insight installation. Well, they have to connect to the primary too. We can't have a VIP, and we're not up for some of the fancy stuff that's in the community; maybe we will be someday. But they'll connect to, we can utilize what like Amazon calls a network load balancer. And so what it knows is that something coming in onto port 1972, trying to hit a virtual IP that the load balancer presents out, tries to hit at 1972. The load balancer says, oh, for 1972, I'm supposed to hit mirror member A or mirror member B, but I don't know which one. Well, how does it know which one? InterSystems supplies us with a health check. Https, so it hits both sides, and the IRIS Web Gateway that's running on the database servers themselves will return either success or failure. And then the load balancer will only transmit that data, that connection, to the side that returns success. And it returns success if it is the primary. Does that make sense? So it's called a load balancer, but we're not really balancing the load. We're always hitting the one that says success. And then if you looked at the load balancer, it would say one of your targets is down but come in that way. So now you've done that. So you've got the IRIS Web Gateway for your https, you have a network load balancer for your non, other TCP, but you as a system administrator may need to hit both sides. Right? Because you as a system administrator might have to go in and connect to your backup member. But these two methods I just said always take you to your primary side. So you have to have a backdoor. In the IRIS Web Gateway, you can either use the same one we're talking about or have a different set. But we just let them know, we tell them not to be mirror-aware. If I want to connect to B, I can connect to B, but it's a limited number of people that would do that. [00:40:55] Derek Robinson: Right, makes sense. So really keeping the for those end users, you really have to make sure that that's set up correctly so that they always end up at that primary. But you need to have those other options for your administrators and the people that actually need to get that access. [00:41:07] Greg King: Yes, so that's right. So you have to have that access. And I want to just point out or remind folks that rightly so I think, but InterSystems is removing the built-in private web server. So that's something we've been taking into an account at J2 for a couple of years now because you shared, we know that, right. You've put out that this was coming, and now it's close -- that that will be removed. So we've had to think about where do we put our web server, our proper web server, our properly secured web server? And that's part of this whole discussion. So that's been removed. So you're not going to be able to rely on that as a system administrator. But you'll have to think about where in your topology you'll have… [00:41:58] Derek Robinson: Yeah, totally. And that is a very important. One of my colleagues is working on multiple videos right now for that learning point about removing the private web server. So a good thing to call out there. So one more key question before we kind of wrap up here. You mentioned earlier documentation, this was something I had stumbled across in documentation as something that was also highlighted there, but you see ^zmirror entry points referenced in a few different spots. Tell us a little bit about what those are, why they're important when you're initializing a mirror member. [00:42:28] Greg King: Yeah. So let's start off with saying that you might not need to, right? It's always been a powerful part of using the InterSystems technology, is that it allows you to do things yourself. It allows you to figure out to be creative, maybe. And I think ^zmirror in that routine code and all of those entry points fall into that. The other piece is that it's your application, right? I'm not sure who the listeners are, but whatever their application is, maybe they need to use some of those. For HealthShare, so there's an entry point. So ^zmirror's routine, if it exists, it doesn't have to, if it exists and you use one of the defined entry points…okay, there's one entry point called "notify become primary." And this is the one that we have utilized most in the HealthShare realm, and where that gets called, so if there's any code in that routine in that entry point, it will be called when that mirror member becomes primary. So it's all the negotiation and figuring out what the arbiter, and all of that's over, and it's going to be able to serve up data. So that's when that entry gets called. And where we use that is, at that point, we can put a little bit of code in there to send an email or a text, or write to a log file, a very specific log file, like something we want. That code can go in there, and someone knows that a failover occurred in a very timely way. Now that information is in a lot of other log files, but it's not a necessary thing. Maybe in your case, in anybody's case, but it's a nice thing. We have also used it to actually start our interoperability productions. So that would be a nice time to do that, right? Is that I become primary, I can do some checks maybe, and then I can start my productions. These days, in the interoperability pieces of HealthShare, now have an auto start feature. In the HealthShare realm, though, we have to make sure though, that they start in the proper order. So for the most part, that works just fine to just not use ^zmirror and just let HealthShare do the starting by saying "auto start this one." So that's the biggest thing I've seen with ^zmirror. We have had some installations where if you're using InterSystems IRIS Business Intelligence, there might need to be a little bit of just a quick call to redo some tables. This is documented, but it's those kinds of activities that you might put into ^zmirror. [00:45:37] Derek Robinson: Yeah, that makes sense. So some good stuff to know there about kind of those potentially edge cases, but also things that might be relevant for certain audiences with their startup or their failover, things like that. So kind of wrapping up here, just to kind of conclude, any other final thoughts or kind of generic lessons learned that you'd want to share with someone who's listened to all this and maybe has a project coming up, they're going to be starting to use mirroring, and any kind of general lessons that you'd say that stick with you over time. [00:46:05] Greg King: Yes, there's one big one, and that's testing. So you can do all this planning and you set it up and you have your arbiter and you think that you're copying the code across, actually having in the plan, at least when you are first installing, and at any major upgrade, make sure to do a failover test. And what we'll do, and with InterSystems blessing, because it's where we got this idea, but there'll be a stage environment or an environment just before your live environment where it looks just the same in a topology, your infrastructure. So the same number of mirrors, you have a web server sitting out there, at least one, but a web server sitting out that needs to find the primary side. You have a load balancer that allows your JDBC connection to come in or whatever, and you test it. So they deploy the code, the initial round of code, all your web pages, all your new productions, your set of 50 roles, and you fail over. You literally shut it, hard failover. Maybe if you get fancy, do something to break the network so that you can test that the other side comes up, at least once. Sometimes it gets lost, right? So once you have the infrastructure set up, everyone forgets that there was a lot of work that went into that, but it's not really tested until the actual application. [00:48:02] Derek Robinson: Yeah. There you go. So that's number one takeaway, probably. Make sure don't overlook that step. Make sure you test, test, test. So Greg, you have tons of expertise on this topic. Thank you so much for sharing it with us. And we'll have to talk to you again at some point about the new features and the new developments in this space. So thanks again. [00:48:24] Derek Robinson: Thanks again to both Greg and Chad for spending the time to tell us all about mirroring. It's an important topic, and hopefully you learned something from their perspective and experience. That'll do it for this episode. We'll see you next time on Data Points.

Show Notes

Episode Transcript

Other Episodes

Episode

LEAD North's Journey with InterSystems Technologies (S2, E9)

Episode 5

5. Mirroring Databases for High Availability (Bob Binstock)

Episode 9

9. Healthcare Interoperability: Part 2 (Russ Leftwich)