Working with time-series data at scale by Javier Ramírez
As individuals, we use time series data in everyday life all the time; If you’re trying to improve your health, you may track how many steps you take daily, and relate that to your body weight or size over time to understand how well you’re doing.
This is clearly a small-scale example, but on the other end of the spectrum, large-scale time series use cases abound in our current technological landscape. Be it tracking the price of a stock or cryptocurrency that changes every millisecond, performance and health metrics of a video streaming application, sensors for reading temperature, pressure and humidity, or the information generated from millions of IoT devices.
Modern digital applications require collecting, storing, and analyzing time series data at extreme scale, and with performance that a relational database simply cannot provide. We have all seen very creative solutions built to work around this problem, but as throughput needs increase, scaling them becomes a major challenge.
To get the job done, developers end up landing, transforming, and moving data around repeatedly, using multiple components pipelined together. Looking at these solutions really feels like looking at Rube Goldberg machines. It’s staggering to see how complex architectures become in order to satisfy the needs of these workloads.
Most importantly, all of this is something that needed to be built, managed, and maintained, and it still doesn’t meet very high scale and performance needs. Many time series applications can generate enormous volumes of data. One common example here is video streaming.
The act of delivering high quality video content is a very complex process. Understanding load latency, video frame drops, and user activity is something that needs to happen at massive scale and in real time. This process alone can generate several GBs of data every second, while easily running hundreds of thousands, sometimes over a million, queries per hour.
A relational database certainly isn’t the right choice here. Which is exactly why we built Timestream at AWS. Timestream started out by decoupling data ingestion, storage, and query such that each can scale independently. The design keeps each sub-system simple, making it easier to achieve unwavering reliability, while also eliminating scaling bottlenecks, and reducing the chances of correlated system failures which becomes more important as the system grows.
At the same time, in order to manage overall growth, the system is cell based – rather than scale the system as a whole, we segment the system into multiple smaller copies of itself so that these cells can be tested at full scale, and a system problem in one cell can’t affect activity in any of the other cells. In this session, I will introduce the problem of time-series, I will take a look at some architectures that have been used it the past to work around the problem, and I will then introduce Amazon Timestream, a purpose-built database to process and analyze time-series data at scale.
In this session I will describe the time-series problem, discuss the architecture of Amazon Timestream, and demo how it can be used to ingest and process time-series data at scale as a fully managed service. I will also demo how it can be easily integrated with open source tools like Apache Flink or Grafana.
Видео Working with time-series data at scale by Javier Ramírez канала Big Things Conference
This is clearly a small-scale example, but on the other end of the spectrum, large-scale time series use cases abound in our current technological landscape. Be it tracking the price of a stock or cryptocurrency that changes every millisecond, performance and health metrics of a video streaming application, sensors for reading temperature, pressure and humidity, or the information generated from millions of IoT devices.
Modern digital applications require collecting, storing, and analyzing time series data at extreme scale, and with performance that a relational database simply cannot provide. We have all seen very creative solutions built to work around this problem, but as throughput needs increase, scaling them becomes a major challenge.
To get the job done, developers end up landing, transforming, and moving data around repeatedly, using multiple components pipelined together. Looking at these solutions really feels like looking at Rube Goldberg machines. It’s staggering to see how complex architectures become in order to satisfy the needs of these workloads.
Most importantly, all of this is something that needed to be built, managed, and maintained, and it still doesn’t meet very high scale and performance needs. Many time series applications can generate enormous volumes of data. One common example here is video streaming.
The act of delivering high quality video content is a very complex process. Understanding load latency, video frame drops, and user activity is something that needs to happen at massive scale and in real time. This process alone can generate several GBs of data every second, while easily running hundreds of thousands, sometimes over a million, queries per hour.
A relational database certainly isn’t the right choice here. Which is exactly why we built Timestream at AWS. Timestream started out by decoupling data ingestion, storage, and query such that each can scale independently. The design keeps each sub-system simple, making it easier to achieve unwavering reliability, while also eliminating scaling bottlenecks, and reducing the chances of correlated system failures which becomes more important as the system grows.
At the same time, in order to manage overall growth, the system is cell based – rather than scale the system as a whole, we segment the system into multiple smaller copies of itself so that these cells can be tested at full scale, and a system problem in one cell can’t affect activity in any of the other cells. In this session, I will introduce the problem of time-series, I will take a look at some architectures that have been used it the past to work around the problem, and I will then introduce Amazon Timestream, a purpose-built database to process and analyze time-series data at scale.
In this session I will describe the time-series problem, discuss the architecture of Amazon Timestream, and demo how it can be used to ingest and process time-series data at scale as a fully managed service. I will also demo how it can be easily integrated with open source tools like Apache Flink or Grafana.
Видео Working with time-series data at scale by Javier Ramírez канала Big Things Conference
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Adopt AI in your organization by Aarthi Srinivasan](https://i.ytimg.com/vi/f4dvdXaHIJg/default.jpg)
![Interview to Loren Shure at Big Data Spain 2016](https://i.ytimg.com/vi/PxzFftz3E_E/default.jpg)
![How to integrate Big Data onto an analytical portal by Isaac Ciprés at Big Data Spain 2015](https://i.ytimg.com/vi/SJglKZpB3lQ/default.jpg)
![Can an intelligent system exist without awareness? by Marco Baena](https://i.ytimg.com/vi/taWOfATd2tE/default.jpg)
![BigInsights and streams: IBM Hadoop solution by LUIS REINA at Big Data Spain 2014](https://i.ytimg.com/vi/VfUfMiA3IsI/default.jpg)
![More people, less banking: Blockchain by Salvador Casquero](https://i.ytimg.com/vi/wdTmVozLZwU/default.jpg)
![Introduction to Neo4j Workshop by JIM WEBBER at Big Data Spain 2014](https://i.ytimg.com/vi/OjQyffPSWJk/default.jpg)
![Deploying AI for Near Real-Time Manufacturing Decisions by Jim Stewart, Ph.D.& Heather Gorr, Ph.D.](https://i.ytimg.com/vi/-ijF9wecgLA/default.jpg)
![Developing Data Products by Jason Sundram at Big Data Spain 2015](https://i.ytimg.com/vi/CkEdD6FL7Ug/default.jpg)
![Entrevista a Francisco González Blanch, Desarrollo de producto en Madiva - Dare2Data](https://i.ytimg.com/vi/x6IIGcinrMQ/default.jpg)
![Apache Mesos As The Foundation Of Your Big Data Cluster by Jörg Schad at Big Data Spain 2015](https://i.ytimg.com/vi/Pd0Vxz5L6eA/default.jpg)
![Big Data as a game-changer by Rafael San Miguel & Dr. Javier Gómez Pavón at Big Data Spain 2015](https://i.ytimg.com/vi/tOAPwAPw3pA/default.jpg)
![AI @ Scale by Pablo Peris and Carlos de Huerta](https://i.ytimg.com/vi/ZkIoMHrFtTM/default.jpg)
![Graphs for Analytics. The power of connections to understand the world by Josep Tarruella](https://i.ytimg.com/vi/_UUSI7zuBhQ/default.jpg)
![Self Sovereign Identity: Building the pillars of a new data economy by Daniel Díez](https://i.ytimg.com/vi/JH3qCjH52VQ/default.jpg)
![Foundations of Data Teams by Jesse Anderson](https://i.ytimg.com/vi/rsxNvLs3yKo/default.jpg)
![Welcome to Big Things Conference 2021!!](https://i.ytimg.com/vi/btA_H1AMJzE/default.jpg)
![Would you trust your model with your life? Research vs. reality in AI by Heather Gorr](https://i.ytimg.com/vi/M75CSBqTeEU/default.jpg)
![How I won the Alibaba self-driving LIDAR point cloud segmentation competition by Andrés Torrubia](https://i.ytimg.com/vi/iPxxIj3X6Jk/default.jpg)
![Unexplainable AI: Why machines are acting in that way by Moisés Martínez at #BIGTH21](https://i.ytimg.com/vi/FjVomU1xjwY/default.jpg)
![Big data architecture for prediction and decision by Rafael Muñoz](https://i.ytimg.com/vi/uUDacfHcwRQ/default.jpg)