Near Real Time Analytics with Apache Spark: Ingestion, ETL, and Interactive QueriesBrandon Hamric Ev
Near real-time analytics has become a common requirement for many data teams as the technology has caught up to the demand. One of the hardest aspects of enabling near-realtime analytics is making sure the source data is ingested and deduplicated often enough to be useful to analysts while writing the data in a format that is usable by your analytics query engine. This is usually the domain of many tools since there are three different aspects of the problem: streaming ingestion of data, deduplication using an ETL process, and interactive analytics. With Spark, this can be done with one tool. This talk with walk you through how to use Spark Streaming to ingest change-log data, use Spark batch jobs to perform major and minor compaction, and query the results with Spark.SQL. At the end of this talk you will know what is required to setup near-realtime analytics at your organization, the common gotchas including file formats and distributed file systems, and how to handle data the unique data integrity issues that arise from near-realtime analytics.
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/
Видео Near Real Time Analytics with Apache Spark: Ingestion, ETL, and Interactive QueriesBrandon Hamric Ev канала Databricks
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/
Видео Near Real Time Analytics with Apache Spark: Ingestion, ETL, and Interactive QueriesBrandon Hamric Ev канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
ETL Is Dead, Long Live Streams: real-time streams w/ Apache KafkaGet Rid of Traditional ETL, Move to Spark! (Bas Geerdink)Building Robust ETL Pipelines with Apache Spark - Xiao LiI Analyze Data - Streaming and Real-time Analytics (Level 300)Batch Processing vs Stream Processing | System Design Primer | Tech PrimersAdvancing Spark - Understanding the Spark UIWhat is an ETL Tool?Real Time Analytics at UBER ScaleFree Energy Using Speaker Magnet Technology For 2019Shay Banon - ElasticSearch: Big Data, Search, and AnalyticsTime Series Analysis with Spark and Cassandra | Christopher BateyEveryday I'm Shuffling - Tips for Writing Better Apache Spark ProgramsAzure Databricks | Consume Streaming Data from Azure Event Hub in Azure Databricks - VID009 #AzureScalable Data Ingestion Architecture Using Airflow and Spark | Komodo HealthHoodie: An Open Source Incremental Processing Framework From Uber | UberApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal (Paytm)Intro to Apache Spark for Java and Scala Developers - Ted Malaska (Cloudera)Building a Complete End-to-End Batch and Real-time Recommendation Engine by Chris FreglyPyCon.DE 2017 Tamara Mendt - Modern ETL-ing with Python and Airflow (and Spark)