Trabajando con un Data Lake en la Onesait Platform (parte 4)

10/12/2021 LuisMi Gracia

Una semana más, continuamos con nuestra serie de entradas sobre Data Lake. Ya hemos en qué consiste y qué beneficios nos aporta, en qué se diferencia de un Data Warehouse, y qué tipos de Data Lakes encontramos.

La entrada de hoy va a ser cortita, pues queremos comentar brevemente la relación entre un Data Lake y la nube, para ya la semana que viene terminar la serie con el soporte que damos en la Onesait Platform.

Data Lake y el Cloud

Hasta hace unos años, los Data Lakes se implantaban mayoritariamente On-Premise y, como hemos explicado la semana pasada, sobre Hadoop en muchos casos. Pero el uso de infraestructuras locales tiene ciertos problemas:

La configuración: la adquisición de hardware y la configuración de centros de datos no es sencilla y puede tardar semanas o meses hasta ponerlos en funcionamiento.
La escalabilidad: si existe la necesidad de ampliar la escala de capacidad de almacenamiento, se requiere tiempo y esfuerzo (aprobaciones, espacio en el CPD).
El cálculo de requisitos: dado que la escalabilidad no es sencilla en entornos locales, al principio del proyecto resulta importante calcular los requisitos de hardware correctamente, lo cual no es fácil datos crecen de manera no sistemática todos los días, este objetivo es muy difícil de lograr.
El coste: montar en entorno local requiere un desembolso inicial, mientras que en Cloud podemos pagar conforme usamos la infraestructura.

a man holds his head while sitting on a sofa — Imagen de Nik Shuliahin en Unsplash

Dicho de otra forma, montar un Data Lake en Cloud es:

Más fácil y rápido de iniciar: la nube permite a los usuarios empezar paulatinamente.
Es rentable, con un modelo de pago por uso.
Más fácil de escalar al alza cuando las necesidades aumenten, lo que elimina la tensión de tener que calcular requisitos y obtener autorizaciones.
Data Lake como servicio: los diferentes proveedores Clouds ofrecen servicios Data Lake, algunos basados en Hadoop como GCP DataProc o Azure HDInsight, o en tecnologías propias como Amazon S3 o GCP BigTable.

Como vemos, las ventajas de la nube son importantes, ya que como en otros casos, nos permite ir escalando según nuestras necesidades.

white and black letter t-print — Imagen de Pablo Arroyo en Unsplash

Tal como hemos comentado al principio, ahora que tenemos todos los conceptos claros, la semana que viene os contaremos cómo damos soporte desde la Plataforma a los Data Lake, introduciendo el concepto de Data Fabric. ¡No os lo perdáis!

Imagen de encabezado de Philipp Katzenberger en Unsplash

✍🏻 Author(s)

LuisMi Gracia

See author's posts

Cookie	Duración	Descripción
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duración	Descripción
pll_language	1 year	The pll _language cookie is used by Polylang to remember the language selected by the user when returning to the website, and also to get the language information when not available in another way.
ugid	1 year	This cookie is set by the provider Unsplash. This cookie is used for enabling the video content on the website.

Cookie	Duración	Descripción
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_127650363_5	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duración	Descripción
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duración	Descripción
atlassian.account.ffs.id	1 year	No description available.
atlassian.account.xsrf.token	session	No description available.
cloud.session.token	past	No description
pvc_visits[0]	1 hour	This cookie is created by post-views-counter. This cookie is used to count the number of visits to a post. It also helps in preventing repeat views of a post by a visitor.
SESSION	session	No description

Data Lake y el Cloud

✍🏻 Author(s)

LuisMi Gracia

También te puede gustar

Trabajando con un Data Lake en la Onesait Platform (parte 2)

Gobierno del Dato: Data Classes

Plantillas de Microsoft Word para correos electrónicos

Deja una respuesta Cancelar la respuesta