Using Datalakes with MinIO and PrestoDB
Onesait Platform, as a data-centric platform, has a Data Lake centered around the combination of two widely used Open Source solutions:
- MinIO, as file repository.
- PrestoDB, like SQL query engine.
In a previous post, we commented on how MinIO had been integrated into the Platform as a persistence engine for file storage.
In terms of MinIO‘s characteristics, its distributed storage, file replication, high availability, volume of information stored, horizontal scaling, and transfer speed, make it ideal for use as a storage base for the Data Lake: users can create their structures of directories, store the information in the form of files and automate the loading and processing in a way that is integrated with the platform (Api S3, Dataflow, FlowEngine, Notebooks, etc).
PrestoDB also plays an important part since it provides a distributed SQL query engine that can use the files stored in MinIO, so that all the information stored in the form of files is available to be consulted and viewed quickly and efficiently through SQL statements.
The Data Lake is provided to users like any other Entity on the Platform. You simply have to create the entity indicating that its information comes from the Historical Database:
And build the table in PrestoDB, structuring its origin as files stored in MinIO:
Once the Entity is created, all the information in the MinIO directory, as well as that which will be added in the future, is available to be consulted easily via SQL, and can be used by the rest of the information exploitation engines of the platform (Dashboards, Jasper Reports, Notebooks, Dataflows, etc.), constituting an important tool for the BI support of the organization.
Here is a video on how to do all of the described above:
YouTube | Presto Integration in the Onesait Platform