One more week, we are continuing our series of posts on Data Lake. We already have what it consists of and what benefits it brings us, how it differs from a Data Warehouse, and what types of Data Lakes we find.
Today’s post is going to be short, as we want to briefly comment on the relationship between a Data Lake and the cloud, for next week we will finish the series with the support provided by Onesait Platform.
Data Lake and the Cloud
Until a few years ago, Data Lakes were mostly implemented On-Premise and, as we explained last week, on Hadoop in many cases. But the use of local infrastructures has certain problems:
- The configuration: The process to adquire hardware and data center setup is not straightforward and can take weeks or months to get up and running.
- Scalability: If there is a need to scale up storage capacity, it takes time and effort (approvals, space on the DPC).
- The calculation of requirements: since scalability tends to be a complicated factor in local environments, at the beginning of the project it is important to calculate the hardware requirements correctly, and since data grows unsystematically every day, this goal is very difficult to achieve.
- The cost: mounting an enviroment locally requires an initial outlay, while in the Cloud we can pay as we use the infrastructure.
In other words, mounting a Data Lake in the Cloud is:
- Easier and faster to start: The cloud allows users to start gradually.
- It is cost effective, with a pay-for-use model.
- Easier to scale up when needs increase, eliminating the stress of calculating requirements and obtaining clearances.
- Data Lake as a service: the different cloud providers offer Data Lake services, some based on Hadoop such as GCP DataProc or Azure HDInsight, or on their own technologies such as Amazon S3 or GCP BigTable.
As we can see, the advantages of the cloud are important, since as in other cases, it allows us to scale according to our needs.
As we mentioned at the beginning, now that we have all the clear concepts, next week we will tell you how we support Data Lake on the Platform, introducing the concept of Data Fabric. Don’t miss it!