Загрузка страницы

Data Lakes in a Real-time bidding environment - David Garty, Spotad

As Spotad is supporting millions of queries per second, in order to make data reliable and easily accessible, a well-designed data lake is one of our most important business aspects. In this presentation, I'll focus on key aspects of data lake architecture, cost, data-based optimizations, and clusters. It is well-known that well-partitioned data helps reduce query costs and improve performance by limiting the amount of data a query needs to scan to return the results.

In particular, I'll cover known and less known aspects of data partitioning, idempotency of data workflows, and caching aspects to support your business goal. Planning and optimizing are some of the strongest tools for maintaining a well-designed data lake while keeping the cost at a minimum and performance at its best.

The most important aspect of those is to always know what is going on with your data. This includes monitoring query runtimes at all times, checking for the most and least queried data sources, checking clusters utilization, and optimizing based on these results. I will discuss and demonstrate the importance of developing auto-monitoring tools and using the results for optimization. In addition to this, I will also discuss spot nodes utilization tools such as heterogeneous cluster nodes, and setting the maximum price in the context of cost-reduction and stability.

Видео Data Lakes in a Real-time bidding environment - David Garty, Spotad канала Qubole: The Cost-Efficient Data Lake
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
25 ноября 2020 г. 19:22:50
00:17:55
Другие видео канала
Talk #2: Adobe Experience Platform Insights on Achieving High Scale Using Apache AirflowTalk #2: Adobe Experience Platform Insights on Achieving High Scale Using Apache AirflowData Leadership: Innovation and Value Creation - Kirk Borne, Booz Allen HamiltonData Leadership: Innovation and Value Creation - Kirk Borne, Booz Allen HamiltonOracle - Justin Wainwright - Data Platforms 2017Oracle - Justin Wainwright - Data Platforms 2017A Hub and Spoke Approach to Scaling Storage - Mark Senerth & Mohan Naidu, The Walt Disney CompanyA Hub and Spoke Approach to Scaling Storage - Mark Senerth & Mohan Naidu, The Walt Disney CompanyWorkbench: Create, Execute and Save a CommandWorkbench: Create, Execute and Save a CommandKarthik Panel - Data Platforms 2017Karthik Panel - Data Platforms 2017[Ai4 Webinar] Modernizing ML & AI Operations to Advance Healthcare[Ai4 Webinar] Modernizing ML & AI Operations to Advance Healthcare'Data Governance in Multi-Tenant Data Lakes - A Tech Perspective' - Sathish K S, Zeotap'Data Governance in Multi-Tenant Data Lakes - A Tech Perspective' - Sathish K S, ZeotapThe Open Data Lake Talks Optimizing Costs in A Changing WorldThe Open Data Lake Talks Optimizing Costs in A Changing WorldBuilding a Real-Time Decision Engine Using ML on Apache Spark Structured StreamingBuilding a Real-Time Decision Engine Using ML on Apache Spark Structured StreamingWebinar: Unlock AI Use Cases - Ignite Spark with Jupyter NotebooksWebinar: Unlock AI Use Cases - Ignite Spark with Jupyter NotebooksPresto Summit India 2019 - "Towards GDPR CCPA compliance with Hive ACID"Presto Summit India 2019 - "Towards GDPR CCPA compliance with Hive ACID"What is Qubole? As Told By QubolersWhat is Qubole? As Told By QubolersRunning Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia GroupRunning Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia GroupNexla - Data Platforms 2017Nexla - Data Platforms 2017Building and Scaling a Data and Analytics Ecosystem - Prabhu Prakesh Ganesh, CTO, MiQBuilding and Scaling a Data and Analytics Ecosystem - Prabhu Prakesh Ganesh, CTO, MiQAnalytics on Analytics: Leveraging Metadata in the Big Data Landscape - Kent Buboltz, Expedia GroupAnalytics on Analytics: Leveraging Metadata in the Big Data Landscape - Kent Buboltz, Expedia GroupData Lakes Fundamentals and Best Practices - Lessons learned in Planning, Strategy, and ExecutionData Lakes Fundamentals and Best Practices - Lessons learned in Planning, Strategy, and ExecutionQubole: AWS Graviton Processor SupportQubole: AWS Graviton Processor SupportDave Wilby (Return Path) showcases innovative AI and Machine Learning platform powered by QuboleDave Wilby (Return Path) showcases innovative AI and Machine Learning platform powered by QuboleDecentralized Data Platform at Bukalapak - Hafiz Badrie Lubis, BulkalapakDecentralized Data Platform at Bukalapak - Hafiz Badrie Lubis, Bulkalapak
Яндекс.Метрика