logo novis

Experts in digital innovation
experts in sap

Serverless architectures for analytics and BI solutions in the cloud

Last updated : November 15, 2018
Did you like our article?
Serverless architectures for analytics and BI solutions in the cloud

Public cloud service portfolios are much more complete, safe, scalable, and cost-effective, to analyze and generate value based on your data. It is a revolution that is changing the way of doing Business Intelligence and making it easier to obtain Big Data information to enhance your business.

In the article Six reasons to consider Cloud Analytics we recommend considering a public cloud platform because:

  1. It is cheaper.
  2. Reduces time to market.
  3. It is flexible to escalate and de-escalate based on business demands.
  4. It is based on platform services managed by the cloud providers, designed to provide the highest levels of availability and operation simplicity.
  5. Scalability is more granular, simple, and quick.
  6. Extensive functionality is available.

Here we discuss one of the options for the deployment of Analytics platforms on public clouds: serverless architectures, Characterized for not requiring servers and where BI applications may be built at extremely low costs, with granular growth, unlimited scalability, and simplified management, they consequently eliminate technological restrictions which limit the projects a company may undertake.

What are serverless analytics technologies like?

This concept refers to analytical solutions where the data is stored using a Data Lake* model and analyzed with serverless tools such as Amazon Redshift, Spectrum, Azure Data Lake Analytics or Google Big Query.

* A Data Lake is a repository where all the company data is stored in its native format, typically files or blob objects. The Data Lake contains data in the original raw format of the source system and transformed data for tasks such as reporting, visualization, or artificial intelligence.

Some of the differences between a Data Lake and a traditional Data Warehouse are that in the first:

  • All data is retained, regardless of whether there is a use case defined for it or not.
  • Any type of data can be stored, no matter its source or structure.
  • A single repository is maintained to fill the needs of all kinds of users.
  • It has greater flexibility in adapting to changes.

From a technical point of view, a Data Lake is implemented on an object storage using technologies such as Amazon Simple Storage Service (S3), Azure Blob Storage or Data Lake Analytics, or Google Cloud Storage with storage costs below USD 30 per Terabyte/month. In addition to this, specialized file formats such as Parquet or ORC are required for the storage of transformed data optimized for queries.

Once the information is available in your Data Lake, serverless analytical tools such as Amazon Redshift Spectrum, Azure Data Lake Analytics, or Google Big Query may access it in the form of an external table to start querying the data, in the same way queries are done on traditional analytical databases, and integrate with visualization tools such as Tableau, PowerBI o QlikSense.

Payment for serverless analytical tools is per-use and not for the required infrastructure, so they may have considerable cost advantages for the analysis of information that does not have a continuous access load over time, such as daily sales reports which may be queried once a day and preloaded in the visualization tool.

Normally this type of technology adopts the massive parallel processing model originated in the world of Big Data, breaking down a query into smaller ones which are distributed across multiple servers working in parallel, obtaining thus better response times than with traditional architectures.

Another advantage of this type of approach is its granular growth, since you only pay for the required capacities, in contrast to traditional architectures where you must usually grow in large blocks, associated with hardware capacity. There are also no scalability limits, both for data storage and processing power.

Finally, as the platforms are managed by the cloud providers, the complexity of managing servers, applications, and capacity is avoided, for all of this is covered by the cloud’s services.

However, the solution’s architecture is more sophisticated, as it involves a wider range of tools and it is necessary to evaluate which one is better for each case. The data architect or engineer is who defines what data to store in each platform. These definitions are done by Novis, simplifying the end user’s life, who continues to work with BI tools, based on SQL.

We invite you to contact us to discuss your projects. Please use the contact form in our website.

Author Patricio Renner, Technology Manager.

–>