|
Blogs
I'm starting with an explanation of some term, so in case you are already familliar with them you might want to skip a few headlines. High Availability, what's that?When talking about High Availability (HA), we talk about the capability to keep a system up and running under all circumstances. Well, that's the goal and it get's obviously more complex the more serious you take what I just said. Usually HA means to avoid interruption in case of hard- or software going down. In the extended case of losing whole buildings, meaning it is not enough to have two computers that are located side by side, we talk about disaster recovery and not any longer about HA. Also, at least for SAP context, we always divide HA in two groups of cases. We say there is HA for planned and such for unplanned downtime. Unplanned is the actual case of protection, when something goes wrong with soft- or hardware. Planned downtime is the downtime that occurs at the time when you do specific actions in the system (most well known all kinds of updates, patches etc.). Today the latter one usually is the more painful one, as many of our customers have so called "closing windows", which means environments that need to run around the clock. How does it work?To keep a system up and running is fairly easy as long as there is no state somewhere in that system. For example, if we want to protect a simple website that does only contain static web pages, we just may run two web servers. If one of them goes off-line the traffic just is redirected to the other one. This already includes the use of load balancers because usually we do not want to bother the user about those two servers and only want to provide a single address with one port to access the system. It gets more complex once our server carries a state or lets say the sessions handled by the server do so. Then we cannot just have the server falling and go to another one, as that server might need some context information about the ongoing applications. For this we usually use a so called switchover mechanism, that is provided on a operating system level. This basically is a polling program that checks whether the leading computer still is running. In case it finds this is not the case it starts up the protected program on that reserve computer. In addition it does some magic to take over also the IP address of our failed system and some additional work for example for network drives that have to be reassigned to the new place. In the smallest scenario only the database is protected with such a switchover system, so that the database comes up again once the switch over happened. Obviously this means that any transaction that was not yet closed (committed) in the failing system will be rolled back and the according work is lost. There are several scenarios how databases handle this, that depend on the database used and the level of effort put into this. And what has my App Server to with it? However, there also may be a state in the application server, specifically in those parts that handle server locking. To those who know their SAP system quite well, that is the Enqueue server. The Enqueue server is quite small, but contains a state, as it does database independent locking. Loosing such locks during a switch over could result in fatal system problems. For this a replication system for the Enqueue was developed and today this is mandatory for all HA SAP systems. This means that Enqueue replicates all links on the second computer and in case of a failure just takes over. Don't think this does not involve you, as you are not using such locks. On both ABAP and Java servers these locks are used without your explicit knowledge. Switchover in less than a minute?As initially the two components of the SCS instance have been part of main process of an ABAP server, customers were used to do the switch over for complete systems. This involves at least the start of the server on the switch over system. As this takes a couple of minutes it usually was not a big issue. Nowadays the situation has changed. Many systems run Java servers also. As the Java servers of the 7.0 generation do a lot of preload work, a startup takes much longer for them. But even for an ABAP server many customers today would like to push the downtimes beyond the scale of minutes. And this is possible. Since version 7 the architecture of the system has been changed toward an isolation of the SCS (server central services) instance. After installing the system this way it's easy: switchover is only needed for the SCS instance any longer. And as this involves only two quite small services, that instance restarts definitely beyond the one minute border. Now, usually this does not help much if the database takes more time than this. That's true, but fortunately today there are databases available that support even continuous availability. With those your SAP system can guarantee to be available always with downtimes less than a minute - for the unplanned downtime case! Planned downtime unfortunately still has issues that go beyond this. As in such case an upgrade/update is running, we cannot be sure whether there may even be table changes with such and that needs significant time to process. Although such issues may be handled but only by people, not yet by computers. Ask your support for help in that case as SAP Global Support has developed specific consulting solutions for such cases in the meantime. Benny Schaich-Lebek
|