Загрузка страницы

Monitoring machine learning model performance in production systems

BasisAI's Senior Software Engineer, Qiao Han, looks at some of the technical challenges in productionising machine learning, specifically looking at how model performance can be monitored continuously after deployment.

Learn practical techniques in detecting feature and inference drifts, and how they can be applied at scale using Prometheus metrics. A live demo is used to guide you through instrumenting your training and model serving code, and crafting PromQL for real time analysis. He also presents the latency impact of this approach using anonymised data from model servers with ~100 inference requests per second.

BasisAI's ML platform, Bedrock, offers many more monitoring capabilities built on top of these core concepts and DevOps best practices.

Learn more about Bedrock: https://basis-ai.com/product
Read our latest blog post on AI monitoring: https://blog.basis-ai.com/ai-monitoring-with-bedrock

00:14 - Agenda
00:53 - Why is ML monitoring important
02:39 - Quiz
06:44 - Overview of monitoring solutions
09:25 - Key idea
10:04 - Feature distribution
11:38 - Drift detection
12:13 - Why Prometheus?
15:30 - Prometheus histogram
17:42 - Choosing histogram bins
19:00 - Instrumentation - training
20:08 - Instrumentation - serving
21:36 - Demo
24:53 - Querying
25:32 - Kolmogorov–Smirnov test
26:20 - Demo - writing PromQL
33:31 - Tracking invalid values
34:15 - Visualising feature drift
35:35 - Recap
36:13 - Extensions
38:26 - Future work
40:54 - Performance evaluation
42:49 - Q&A
Do concept drift and model decay apply to models responding to new/undiscovered patterns given COVID? Such as changes in consumer habits, etc. Do developers prefer to design models for a pre-defined short term e.g. 2020-2021 or tweak models with a longer historical tail?
45:30 - Q&A
For identifying outliers, does Prometheus have any other methods as well or does it have to be custom-coded?
46:41 - Q&A
What is the relationship of this problem with incremental learning?
48:11 - Q&A
What are some actual projects you have been working on in applying this model monitoring tool?
49:34 - Q&A
If I convert my model to tflite, does it affect the production accuracy a lot? I'm doing image classification.

Видео Monitoring machine learning model performance in production systems канала BasisAI
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
12 августа 2020 г. 16:08:46
00:49:09
Яндекс.Метрика