Architecture of SAP HANA
The architecture of SAP HANA database system has the following as main components: four management services, three database engine components, Three In-Memory Storage Engines and Persistency Layer.
1.1Four management Services
The four management services in the architecture of SAP HANA database system are :
a) The Connection and Session Management component manages the sessions/connections for database clients. Clients can use a variety of languages to communicate with the HANA database.
b) The Transaction Manager component helps with ACID (atomicity, consistency, isolation, durability) compliance by coordinating transactions, controlling transactional isolation and tracking running and closed transactions.
c) The Authorization Manager component handles all security and credentialing
d) The Metadata Manager component manages all metadata such as table definitions, views, indexes and the definition of SQL Script functions. All metadata, even of different types, is stored in a common catalog.
1.2 Three database engine components
The three database engine components in the architecture of SAP HANA database system are :
a) Calculation Engine component executes on calculation models received from SQL Script and other compilers. It create logical execution plan for calculation models and execute user defined functions. The Calculation Engine will break up a model, for example some SQL Script, into operations that can be processed in parallel rule based model optimizer. Then these operations will be passed to the database optimizer which will determine the best plan for accessing row or column stores, algebraic transformations and cost based optimizations based on database statistics.
b) Optimizer and Plan Generator component parses and optimizes client requests.
c) Execution Engine component invokes the various In-Memory Processing Engines and routes intermediate results between consecutive execution steps based on the optimized execution plan.
1.3 Three In-memory Storage engines
The three in memory storage engine components in the architecture of SAP HANA database system are :
a) Relational Engine which supports both row and column oriented physical representations of relational tables. The DB optimizer produces physical executing plan and access to row and column store.
b) The Graph Engine which supports the efficient representation and processing of data graphs with a flexible typing system.
c) Text Engine which provides text indexing and search abilities, such as exact search for words and phrases, fuzzy search which tolerates typing errors, and linguistic search which finds variations of words based on linguistic rules.
HANA has the following client libraries provided such as JDBC driver for Java clients, ODBC driver for Windows clients, especially for MS Office integration, Specific DBSL for running the ABAP application server, Client module for Python, implementing Python DB API, ODBO (OLE DB for OLAP) provider for MDX, SQLDBC library which is the basis for MDX provider and for Python integration.
1.4 Columnar and Row based storage
Relational databases organize data in tables, which contain the data records. Row based storage stores a table in a sequence of rows. Column based storage stores a table in a sequence of columns. Since Computer memory, is organized as a linear sequence, in SAP HANA it is possible to create a table as to be stored by column or by row.
Row store is used when the table has a small number of rows such as configuration tables, when many selects or updates of single records, when application access the complete record and when the aggregations and fast searching are not required.
Column store is used when Calculations are executed on a single column or a few columns only, when the table is searched based on the values of a few columns, when the table has a large number of columns, when the table has a large number of rows, and columnar operations such as aggregate, scan etc
1.5 Persistency Layer
The persistence layer stores data in persistent disk volumes which are either hard disk or solid-state drives. With the use of persistence layer the database can be restored to the most recent committed state after a restart.
Following diagram provided by SAP has an overview of the HANA architecture.
1.6 Programming Languages
SAP HANA programming languages are:
1) SQL Script, which is extended based on the standard SQL.. This is the main interface for applications to access SAP HANA.
2) R is statistical/graphical language, which can be embedded in SQL Script.
3) BFL (Business Function Library) which is a prebuilt parameter driven, basic building block calculations at high performance, such as Depreciation, Capacity Optimization and Time based functions such as YoY (year over year) and Delay
1.7 Programming Options: SQL Script
The main goal of SQL Script is to allow the execution of data intensive calculations inside SAP HANA. Moving calculations to the database layer eliminates the need to transfer large amounts of data from the database to the application.
Calculations need to be executed in the database layer to get the maximum benefit from SAP HANA features such as fast column operations, query optimization and parallel execution. If applications fetch data as sets of rows for processing on application level they will not benefit from these features.
Compared to plain SQL queries, SQL Script has the following advantages:
1.Functions can return multiple results, while a SQL query returns only one result set.
2.Complex functions can be decomposed into smaller functions. This enables modular programming, reuse and a better understandability by functional abstraction. For structuring complex queries, standard SQL only allows the definition of SQL views. However, SQL views have no parameters.
3.SQL Scripts supports local variables for intermediate results with implicitly defined types. With standard SQL, it would be required to define globally visible views even for intermediate steps.
4.SQL Script has control logic such as if/else and throw/catch that is not available in SQL.
These set of SQL extensions for the SAP HANA database which allow developers to push data intensive logic into the database are called SQL Script.These extensions are keys to avoiding massive data copies to the application server and to leverage sophisticated parallel execution strategies of the database. SQL Script V2 supports stored procedures, which provides enhanced control flow capabilities and is positioned to be more suitable for pushing complex parts of application logic to the database. It can meet some simple requirement for reporting, like join, aggregation, etc. When it comes to data mining and statistic analysis, SQL Script is not suitable for implementing complex algorithms.
1.8 Programming Options: Business Function Library
The Business Function Library is a reusable library which is similar to stored procedures, for business applications embedded in the HANA calculation engine. This eliminates the need for developing such calculations from scratch. The business functions are written in C++ and executed in database calculation engine. BFL has a roadmap for data mining and statistical algorithms.
BFL is a prebuilt parameter-driven, basic building block calculations at high performance and offers a solution to complex calculations. Since BFL is based on C++, for numerical operations it can outrun SQL Script by a significant margin. BFL is called from SQL Script It is designed for significant performance improvements for SAP apps by utilizing new hardware such as multi core, built in vector engine, massive parallel main memory processing and changing the boundaries between application server and data management layer. With in the BFL, grouped into a class referred to as the Predictive Analysis Library, a collection of compiled analytic functions for predictive analytics are available. These are Clustering (K-Means), Regression, Association Analysis (Apriori), Decision Tree (C4.5), ABC Classification and Weighted Score Tables
1.9 Programming Options: R integration
R is an open source software language and environment for statistical computing and graphics with over 3000 add-on packages. R supports data handling and storage, numeric, textual, matrix algebra, hash tables and regular expressions, high-level data analytic and statistical functions graphics, programming language, loops and branching, and subroutines.
It has Cluster Analysis & Finite Mixture Models for Probability Distributions, Computational Econometrics, Empirical Finance, Statistical Genetics, Graphic Displays, Dynamic Graphics, Graphic Devices & Visualisation, Machine Learning & Statistical Learning ,Time Series Analysis etc.
R is not a database, but connects to DBMS. The language interpreter can be very slow, but allows to call own C/C++ code. R is a programming language designed for statistical analysis. It is an open source initiative under the GNU Project. R is integrated in HANA DB via TCP/IP.
HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation. HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQL Script. R environments need to be deployed to use R within SQL Script. Through the R integration solution, developers can leverage open source R’s 3000+ external packages to perform wide-range data mining and statistical analysis.
1.10 Scalability & Benchmarks
Performance and scalability
SAP has stated that customers have realized gains as high as 100,000x in improved query performance when compared to disk based database systems. There have been no independent audits of such claims.
In March 2011, Wintercorp, an independent testing firm specializing in large scale data management was retained by SAP to audit test specifications and results from test runs. The test used concepts similar to those of the industry standard TPC-H benchmark. The test data had between 600 million and 1.8 billion rows and the test ran five analytical query types and three operational report query types. The combined throughput of analytical and operational report queries ran between 3007 queries/hour and 10,042 queries per hour depending on the volume of data.
To enable scalability in terms of data volumes and the number of application requests, the HANA database supports scale up and scale out. For scale up, all algorithms and data structures are designed to work on large multi-core architectures especially focusing on cache aware data structures and code fragments. For scale out, the HANA database is designed to run on a cluster of individual machines allowing the distribution of data and query processing across multiple nodes.