CloudLabNew Features

CloudLab tools update

As you are well aware, Onesait Platform has an experimentation environment that we offer publicly to carry out developments and test the functionalities offered by the Platform. We periodically update the version of the Platform modules deployed to offer a recent version of the Platform.

In this case, we are going to talk about the rest of the support modules that do not need to be updated for the normal operation of the Platform, but we felt it was necessary to put the latest version to offer the latest version of these auxiliary parts as well. We are going to list the tools that have been updated and what they are used for in the Platform:

  • Zookeeper + Kafka
  • Onesait Platform DataFlows
  • Notebooks (Zeppelin)
  • RealtimeDB (MongoDB)
  • Identity Manager

Zookeeper + Kafka

Zookeeper is a centralised service for the maintenance of configuration information (in our case, it is used for the discovery of the other services), while Kafka is used for bulk data ingestion and processing, which uses queues (called topics) to manage the different input streams.

The upgrade consisted of a change from version 2.7.0 of Kafka to version 3.4. For more information on how to use the module, please refer to this article on the Developer Portal.

Onesait Platform DataFlows

Onesait Platform’s DataFlows module is the tool that provides the Platform with the ability to create and configure data flows between sources and destinations for both ETL/ELT type processes and Streaming flows, including transformations and data quality processes in these flows.

For several years, Onesait Platform has relied on Streamsets Data Collector (SDC) as the engine to implement the Onesait Platform DataFlow module. SDC was an open source software with a low code approach to develop and monitor data flows. During this time we have successfully used this technology in a multitude of projects and products. In mid 2021, Streamsets, the company behind SDC, changed its licensing policy and as of version 4.0, SDC is not open source.

Due to this change in the license, Onesait Platform has created a fork of the SDC open source repository, starting with version 3.23.0. From this release, the Onesait Platform team will carry out both corrections and new functionalities. This new product, derived from SDC, is called Onesait Platform DataFlow and will continue to be licensed under the Apache License 2.

For this reason, the module version has been updated from Streamsets version 3.18.0 to Onesait Platform DataFlows version 3.23.1. We tell about it in detail in this article on the Developer Portal.

Notebooks (Zeppelin)

The Notebooks module provides data scientists with a multi-user web environment in which to build analysis models of the information stored in the Platform with their favourite languages (Python, Spark, R, Tensorflow, Keras, SQL, etc.) in an interactive way.

In this case, version 0.8.2 that was installed in the environment has been upgraded to version 0.10.1. The new features are detailed in this article (is Spanish for the moment).

RealtimeDB (MongoDB)

MongoDB is used in the CloudLab environment as RealtimeDB. MongoDB is an open source document-oriented NoSQL database system, which instead of storing data in tables stores it in BSON (similar to JSON) data structures with a dynamic schema. Mongo achieves a perfect balance between performance and functionality thanks to its content query system. We explain it in detail in this article on the Developer Portal.

In this case, it has been updated from version 3.4 (in which Quasar was used in the Platform for SQL queries) to version 6.0 with the SQL library directly implemented in the Platform code. The following graph shows the functionalities that have been added in the intermediate versions.

Identity Manager

The Identity Manager is the Platform module used for authentication and access management on the Platform. Previously, the OAuth Server module was used to perform this function, but it has been upgraded to use Keycloak.

Esta decisión se ha tomado debido a que, a partir de la versión 6.0.0-Vegas, el módulo OAuth Server desaparece y solamente se utiliza Keycloak como Identity Manager. Para adelantarnos a esta situación en la futura actualización, se ha realizado ya el cambio de Identity Manager. La pantalla de inicio de sesión muestra un nuevo aspecto, pero el acceso se realiza de manera muy similar:

This decision has been taken because, as of version 6.0.0-Vegas, the OAuth Server module disappears and only Keycloak is used as Identity Manager. In order to anticipate this situation in the future update, the Identity Manager has already been changed. The login screen has a new look and feel, but access is very similar:


Header Image: Erik Mclean at Unsplash.

✍🏻 Author(s)

Leave a Reply

Your email address will not be published. Required fields are marked *