Building a scalable focused web crawler with Flink - Ken Krugler
Flink Forward San Francisco, April 2018 #flinkforward
Building a scalable focused web crawler with Flink - Ken Krugler
Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and iterations.
https://data-artisans.com/
Видео Building a scalable focused web crawler with Flink - Ken Krugler канала Flink Forward
Building a scalable focused web crawler with Flink - Ken Krugler
Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and iterations.
https://data-artisans.com/
Видео Building a scalable focused web crawler with Flink - Ken Krugler канала Flink Forward
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Real-time vehicle telemetry analysis with Kafka StreamsSystem Design distributed web crawler to crawl Billions of web pages | web crawler system designHow To Build A Web Scraper With Python Using Beautiful Soup [2020] #selftaughtdevHow to Code a Web Crawler using NodeJsElasticsearch from the bottom upHow Slack WorksHow to Build an ExchangeWatch Elon Musk’s original Neuralink presentationMap Reduce Paper - Distributed data processingMake your Own Web Crawler - Part 1 - The BasicsDemystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisGoogle Systems Design Interview With An Ex-GooglerAtrax, a distributed web crawlerClean Architectures in Python - Leonardo Giordani - PyLondinium19The Anatomy of a Distributed SystemSystem Design | Web Crawler for google | yahoo | Data mining | Distributed SystemWhat is Distributed Caching? Explained with Redis!System Design Interview - Top K Problem (Heavy Hitters)System Design Mock Interview - Web Crawler - Interviewing a Senior Software Amazon EngineerHow We Learned to Stop Worrying and Love Fan-In at Twitter