Exceptions are the Norm: Dealing with Bad Actors in ETL: Spark Summit East talk by Sameer Agarwal
Stable and robust data pipelines are a critical component of the data infrastructure of enterprises. Most commonly, data pipelines ingest messy data sources with incorrect, incomplete or inconsistent records and produce curated and/or summarized data for consumption by subsequent applications.
In this talk we go over new and upcoming features in Spark that enable it to better serve such workloads. Such features include isolation of corrupt input records and files, useful diagnostic feedback to users and improved support for nested type handling which is common in ETL jobs.
Видео Exceptions are the Norm: Dealing with Bad Actors in ETL: Spark Summit East talk by Sameer Agarwal канала Spark Summit
In this talk we go over new and upcoming features in Spark that enable it to better serve such workloads. Such features include isolation of corrupt input records and files, useful diagnostic feedback to users and improved support for nested type handling which is common in ETL jobs.
Видео Exceptions are the Norm: Dealing with Bad Actors in ETL: Spark Summit East talk by Sameer Agarwal канала Spark Summit
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Keeping Spark on Track: Productionizing Spark for ETL: talk by Kyle Pistor and Miklos Christine](https://i.ytimg.com/vi/1XkUSWbu-C0/default.jpg)
![Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland](https://i.ytimg.com/vi/_0Wpwj_gvzg/default.jpg)
![Error Handling In Informatica](https://i.ytimg.com/vi/Z65v-2ua5hU/default.jpg)
![Deep Dive into Monitoring Spark Applications Using Web UI and SparkListeners (Jacek Laskowski)](https://i.ytimg.com/vi/mVP9sZ6K__Y/default.jpg)
![RDDs, DataFrames and Datasets in Apache Spark - NE Scala 2016](https://i.ytimg.com/vi/pZQsDloGB4w/default.jpg)
![ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka](https://i.ytimg.com/vi/I32hmY4diFY/default.jpg)
![Top 5 Mistakes When Writing Spark Applications](https://i.ytimg.com/vi/WyfHUNnMutg/default.jpg)
![SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal](https://i.ytimg.com/vi/AoVmgzontXo/default.jpg)
![try/catch (in Scala)](https://i.ytimg.com/vi/H97YKSV_ygQ/default.jpg)
![Tricks of the Trade to be an Apache Spark Rock Star - Ted Malaska](https://i.ytimg.com/vi/kv8kWvk1xyY/default.jpg)
![A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)](https://i.ytimg.com/vi/dmL0N3qfSc8/default.jpg)
![A Developer’s View into Spark's Memory Model - Wenchen Fan](https://i.ytimg.com/vi/-Aq1LMpzaKw/default.jpg)
![Functional Data Engineering - A Set of Best Practices | DataEngConf SF '18](https://i.ytimg.com/vi/4Spo2QRTz1k/default.jpg)
![Optimal Strategies for Large Scale Batch ETL Jobs - Emma Tang & Hua Wang, Ph.D.](https://i.ytimg.com/vi/UmBL5RApe-I/default.jpg)
![Hive Bucketing in Apache Spark - Tejas Patil](https://i.ytimg.com/vi/6BD-Vv-ViBw/default.jpg)
![Improving Python and Spark Performance and Interoperability: Spark Summit East talk by Wes McKinney](https://i.ytimg.com/vi/qIKImANLFtE/default.jpg)
![Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)](https://i.ytimg.com/vi/vZhSbs1xLx4/default.jpg)
![Building Robust ETL Pipelines with Apache Spark - Xiao Li](https://i.ytimg.com/vi/exWGf0aXJF4/default.jpg)
![Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc Nguyen](https://i.ytimg.com/vi/N10k7tRsTUA/default.jpg)
![Top 5 Mistakes When Writing Spark Applications](https://i.ytimg.com/vi/vfiJQ7wg81Y/default.jpg)