Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka
Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka
Pierre Villard, Timothy Spann
A presentation from ApacheCon @Home 2020
https://apachecon.com/acah2020/
We will ingest a variety of real-time feeds including stocks with NiFi, filter and process and segment it into Kafka topics. Kafka data will be in Apache Avro format with schemas specified in Cloudera Schema Registry. Apache Flink, Kafka Connect and NiFi will do additional event processing along with machine learning and deep learning. We will store real-time feed data in Apache Kudu for real-time analytics and summaries. Apache OpenNLP, Apache MXNet, CoreNLP, NLTK and SpaCy will be used to analyse stock trend data in streams as well as stock prices and futures. As part of the stream processing we will also be classifying images and stock data with Apache MXNet and DJL. We will also produce cleaned and aggregated data to subscribers via Apache Kafka, Apache Flink SQL and Apache NiFi. We will push to applications, message listeners, web clients, Slack channels and to email, To be useful in our enterprise, we will have full authorization, authentication, auditing, data encryption and data lineage via Apache Ranger, Apache Atlas and Apache NiFi. References: https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-Apache-NiFi-and-Apache-Kafka/ta-p/249221
Pierre Villard is currently a Senior Product Manager at Cloudera in charge of all the products around Apache NiFi and its subprojects like the NiFi Registry, MiNiFi agents, etc.. He has been active in the Apache NiFi project for the last 4.5 years and is a committer and PMC member of the project. Before joining Cloudera, Pierre worked at Google and Hortonworks where he helped customers develop solutions on-premises and in the cloud by using many technologies including Apache NiFi.
Tim Spann is a Principal DataFlow Field Engineer at Cloudera, the Big Data Zone leader and blogger at DZone and an experienced data engineer with 15 years of experience. He runs the Future of Data Princeton meetup as well as other events. He has spoken at Philly Open SOurce, ApacheCon in Montreal, Strata NYC, Oracle Code NYC, IoT Fusion in Philly, meetups in Princeton, NYC, Philly, Berlin and Prague, DataWorks Summits in San Jose, Berlin and Sydney.
Видео Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka канала The ASF
Pierre Villard, Timothy Spann
A presentation from ApacheCon @Home 2020
https://apachecon.com/acah2020/
We will ingest a variety of real-time feeds including stocks with NiFi, filter and process and segment it into Kafka topics. Kafka data will be in Apache Avro format with schemas specified in Cloudera Schema Registry. Apache Flink, Kafka Connect and NiFi will do additional event processing along with machine learning and deep learning. We will store real-time feed data in Apache Kudu for real-time analytics and summaries. Apache OpenNLP, Apache MXNet, CoreNLP, NLTK and SpaCy will be used to analyse stock trend data in streams as well as stock prices and futures. As part of the stream processing we will also be classifying images and stock data with Apache MXNet and DJL. We will also produce cleaned and aggregated data to subscribers via Apache Kafka, Apache Flink SQL and Apache NiFi. We will push to applications, message listeners, web clients, Slack channels and to email, To be useful in our enterprise, we will have full authorization, authentication, auditing, data encryption and data lineage via Apache Ranger, Apache Atlas and Apache NiFi. References: https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-Apache-NiFi-and-Apache-Kafka/ta-p/249221
Pierre Villard is currently a Senior Product Manager at Cloudera in charge of all the products around Apache NiFi and its subprojects like the NiFi Registry, MiNiFi agents, etc.. He has been active in the Apache NiFi project for the last 4.5 years and is a committer and PMC member of the project. Before joining Cloudera, Pierre worked at Google and Hortonworks where he helped customers develop solutions on-premises and in the cloud by using many technologies including Apache NiFi.
Tim Spann is a Principal DataFlow Field Engineer at Cloudera, the Big Data Zone leader and blogger at DZone and an experienced data engineer with 15 years of experience. He runs the Future of Data Princeton meetup as well as other events. He has spoken at Philly Open SOurce, ApacheCon in Montreal, Strata NYC, Oracle Code NYC, IoT Fusion in Philly, meetups in Princeton, NYC, Philly, Berlin and Prague, DataWorks Summits in San Jose, Berlin and Sydney.
Видео Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka канала The ASF
Комментарии отсутствуют
Информация о видео
22 октября 2020 г. 21:29:13
00:51:38
Другие видео канала