Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal (Paytm)
Quby is the creator and provider of Toon, a leading European smart home platform. We enable Toon users to control and monitor their homes using both an in-home display and app. As a data driven company, we use machine learning algorithms to generate actionable insights for our end users. We have developed data driven services to ensure that users do not needlessly waste energy and can receive real-time alerts about problems with their heating system. In this talk, Erni will describe our journey of productionizing data science algorithms. We'll take a deep dive into our pipeline and describe our streamlined development and deployment workflow. We'll explain how we define and manage dependencies between jobs in multiple environments (test, acceptance and production) and schedule the pipeline computation. We'll delve into scale challenges, metrics, monitoring and data quality. Also, we will reflect on the lessons learned while building high volume infrastructure that offers multiple data driven services to hundreds of thousands of users.
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/
Видео Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal (Paytm) канала Databricks
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/
Видео Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal (Paytm) канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipelineIntroduction to ML with Apache Spark MLib by Taras MatyashovskyyOpen Source Reliability for Data Lake with Apache SparkIngest, Curate & Consume - Analytics Data Science ProcessAI And Machine Learning Full Course | Artificial Intelligence & Machine Learning Course |SimplilearnIngest data from an FTP server to your data lakeAWS re:Invent 2019: Build reliable data lakes with Delta Lake & Databricks (ANT347-S)Everyday I'm Shuffling - Tips for Writing Better Apache Spark ProgramsThe Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)Spark Tutorial For Beginners | Big Data Spark Tutorial | Apache Spark Tutorial | SimplilearnClickstream Analysis with Spark—Understanding Visitors in RealtimeWhat is the interview process for a Data Engineering position - My experienceIntroduction to Data IngestionAzure Databricks - Accessing Data Lake - Using a Service PrincipalAnalyzing Log Data With Apache SparkA Whirlwind Overview of Apache BeamHands On With Spark: Creating A Fast Data Pipeline With Structured Streaming And Spark StreamingBig Data Challenges and OpportunitiesBuilding Robust ETL Pipelines with Apache Spark - Xiao Li