logo novis

Experts in digital innovation
experts in sap

SAP services on AWS cost optimization with NovisCloud platform

Last updated : May 19, 2021
Did you like our article?
SAP services on AWS cost optimization with NovisCloud platform

In this article we review architecture solution alternatives that combine Amazon Web Services and Novis automated processes for cost optimization of our clients’ Public Cloud SAP® services.

Introduction

The COVID-19 global pandemic has brought economic consequences with serious impact on markets and, consequently, the business revenues of many companies. Today more than ever it is important to reduce costs wherever possible. In this context, IT areas are under pressure to retrench and adapt to the new economic conditions.

The elasticity of Amazon Web Services (AWS) infrastructure offers new possibilities of availability and protection levels that are adequate for the requirements or many organizations, without having to resort to complex and costly high-availability and disaster recovery configurations. This document discusses architecture solution alternatives combining AWS and Novis automated processes that enable our clients to optimize their SAP service costs in the Public Cloud.

NovisCloud platform

Harnessing the Cloud’s benefits requires radically different approaches from traditional on-premises operations. For this reason, we decided to create NovisCloud, a platform for automated SAP services on the Cloud, conceived with the vision of delivering a Cloud experience for SAP applications on AWS, and based on the following pillars:

NCloud F1

NovisCloud plays an essential role in improving SAP availability on AWS by integrating SAP monitoring metrics with CloudWatch (AWS monitoring platform) and through the implementation of automated service recovery mechanisms for different failure scenarios (auto-healing), extending thus AWS’ native recovery capabilities to SAP applications. With this platform it is possible to orchestrate recovery processes and perform automated validations of the recovered systems.

SAP architecture

Systems based on the SAP Netweaver application server ABAP have a monolithic or stateful architecture, where the session or status information for each process is stored in the application server where it is processed.

When a user connects to SAP via SAPGUI or a browser, the application runs a load-balancing process and redirects the user to an application server, where he remains connected until the session ends (some users work all day in the same session, connected to the same application server).

This feature implies that an application server’s failure has a direct impact on the user’s or process’ work. The affected user (or process) will have to login again (or relaunch the process) and resume work from the last incomplete transaction.

High availability, Fault Tolerance, and Disaster Recovery

  • High availability is the capability of recovering IT services when a failure occurs, in a limited time, by means of a proven and automated procedure that includes the detection of the failure event and service recovery. Service loss may occur in the event of a failure, but it should be kept within the RTO (Recovery Time Objective) agreed upon with the organization. In general, in a High Availability scenario there should be no data loss (RPO = 0).
  • Fault tolerance is the capability to endure some failure scenarios without interruptions in the IT service.
  • Disaster Recovery (DR) is the capability to recover from a disaster, an element which causes permanent or prolonged damage to the IT infrastructure.

Amazon Web Services Global Infrastructure

Amazon Web Services (AWS) provides high performance compute and block storage services from a series of data centers distributed throughout the planet (https://aws.amazon.com/es/about-aws/global-infrastructure/regions_az/).

NCloud F2

AWS has the concept of a Region as a physical location where data centers are clustered. Each group of logical data centers is identified as an Availability Zone (AZ). Each AWS Region consists of several isolated and physically separate AZs within a geographical area. Unlike other cloud providers, who often define a region as a single data center, the multiple AZ design of every AWS Region is an advantage for clients. Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra-low latency networks.

AWS’ varied services may have global, regional, and zone redundancy. This enables the construction of architectures and services with different levels of protection.

Reference architectures

The following reference architectures are simplified designs for SAP Netweaver systems with SAP HANA databases (e.g. S/4 HANA, Suite on HANA), that demonstrate protection levels for different failure scenarios.

The compute resources are dimensioned as an initial reference for productive environments. For more information on the types of instances recommended for SAP HANA on AWS, see SAP HANA on the AWS Cloud: Quick Start Reference Deployment. For more information on the types of instances recommended for SAP Netweaver applications, see the SAP article 1656099 – SAP Applications on AWS: Supported DB/OS and AWS EC2 products.

For mission-critical systems, SAP recommends the implementation of a high-availability architecture that involves the separation of NW services in different servers or virtual machines (Databases, Central Services, and Application Servers), the protection of single points of failure with cluster mechanisms (Databases and Central Services), and the protection of other components with redundancy. In addition, a disaster recovery solution must be implemented, replicating the database and backup servers in a contingency data center. This scenario, shown below as Architecture 1, provides the highest service protection levels, but is highly complex and costly.

Novis proposes two alternative architectures that, while not equaling the previous architecture SLAs, deliver high service levels compared to on-premises solutions with significant cost reductions. In Architecture 2 the database is replicated to a smaller instance and the application servers are replicated with CloudEndure. Architecture 3 uses AWS’ native recovery capabilities to restore the service when one instance fails, and backup storage redundancy (AWS S3 and EBS snapshots) for service recovery when an Availability Zone fails.

The following are the AWS infrastructure diagrams for each of these architectures.

Architecture 1: ADVANCED

NCloud F3

Architecture 2: STANDARD

NCloud F4

Architecture 3: BASIC

NCloud F5

Escenarios de fallas

There are different scenarios that may impact a SAP application’s service to the end user, from a functional application change, to the user’s workstation.

In this article we review three of the most representative failure scenarios when evaluating High-Availability, Fault Tolerance, and Disaster Recovery concepts:

  • A. An EC2 Amazon instance failure
  • B. An AWS Availability Zone (AZ) failure(AZ)
  • C. An AWS Region failure
 A.
EC2 Instance
B.
AWS Availability Zone
C.
Region
Architecture 1
Advanced
(Cost 300%)
RPO: 0
RTO: 3 min
RPO: 0
RTO: 3 min
RPO: 5 min
RTO: 1 hour
Architecture 2
Standard
(Cost 150%)
RPO: 0
RTO: 10 min
RPO: 0
RTO: 20 min
RPO: 30 min
RTO: 2 to 5 hours
Obs.: Depending on the database size
Architecture 3
Basic
(Cost 100%)
RPO: 0
RTO: 10 min
RPO: 20 min
RTO: 1 to 4 hours
Obs.: Depending on the database size
RPO: N/A
RTO: N/A

 

A. EC2 instance failure

Architecture 1

In case of an EC2 instance Database failure, the synchronous data replica prevents data loss when the HANA primary node changes and the users will only experience service degradation for the duration of the failover process.

In case of an EC2 instance SAP Central Services failure, the lock table replica ensures that the information persists in the secondary host and all traffic is redirected. The users will only experience service degradation for the duration of the failover process.

In case of an EC2 instance Applications Server failure, users and processes in execution will be disconnected. Users and processes connected to other application servers are not affected. When they reconnect, they will be redirected to the available Application Servers.

Architecture 2

In case of an EC2 instance Database failure, the synchronous data replica prevents data loss when the HANA primary node changes and the users will only experience service degradation for the duration of the failover process. The database failover is done with NovisCloud functionality which allows for the resizing of the EC2 instance and the failover is executed at the SAP HANA level.

In case of an EC2 instance SAP Central Services or Applications Server failure, the users and processes in execution will be suspended until the virtual machines are restarted (automated with Auto-Recovery) and the SAP processes are available again. NovisCloud handles the orchestration of the SAP application recovery process and the dependency validation between different services.

Architecture 3

In case of an EC2 instance Database failure, the users and processes in execution will experience service degradation until the virtual machines are restarted (automated with Auto-Recovery) and the Database service is available again.

In case of an EC2 instance SAP Central Services or Applications Server failure, the users and processes in execution will be suspended until the virtual machines are restarted (automated with Auto-Recovery) and the SAP processes are available again. NovisCloud handles the orchestration of the SAP application recovery process and the dependency validation between different services.

B. AWS Availability Zone (AZ) failure

Architecture 1

In case of an availability zone failure, the Database and Central Services cluster mechanisms ensure service continuity as described previously.

For the Application Servers, a distribution of 150% of the required resources for normal operation is considered, distributed between 3 availability zones. In this way, if one zone fails, the available resources will support 100% of users’ and processes’ requests.

Architecture 2

In case of an availability zone failure, the synchronous HANA data replica prevents data loss when the HANA primary node changes. For Central Services and SAP Application Servers instances, CloudEndure takes “snapshots” of the existing volumes. These instances are replicated at a regional level, so they are present in all the availability zones to recreate the affected servers based on the last snapshot. NovisCloud manages the infrastructure and its configuration, for the execution of documented and parameterized processes during the incident.

Architecture 3

In case of an availability zone failure, the database backups are stored in S3 and replicated at a regional level, so after reconstructing the EC2 instances from the AMI images and the disks from the snapshots, the database restore process is executed up to the last log available in S3. NovisCloud manages the infrastructure and its configuration, for the execution of documented and parameterized processes during the incident.

C. AWS Region failure

Architecture 1

In case of a region failure, the asynchronous HANA data replica will reduce the transactional data loss when the HANA primary node changes to a minimum. The images and snapshots already available in the destination region allow the automatic reproduction of the infrastructure from the last information replicated.

NovisCloud manages the infrastructure and its configuration, for the execution of documented and parameterized processes during the incident.

Architecture 2

In case of a region failure, the Database backups replicated to the secondary region allow the infrastructure reconstruction with the same original characteristics, but without the information that is not stored in the database (deployments in the case of Stack JAVA, and control files in the case of Stack ABAP).

NovisCloud manages the infrastructure and its configuration, for the execution of documented and parameterized processes during the incident.

Architecture 3

In case of a region failure, only the infrastructure characteristics information is available; is not possible to recover the application’s information.

CONCLUSION

When the replication points are reduced, the single points of failure increase, and the objective recovery and data loss levels decrease. However, the service objectives in the most basic scenario compare to those of on-premises architectures, optimizing the costs of IT departments of clients who wish to leverage cloud services by migrating their SAP systems to AWS.

Please contact us for more information about our services.

Article by Cristian Marín, Technology Assistant Manager, and Patricio Renner, Chief Technology Officer.