Serving Machine Learning Models at Scale Using KServe - Animesh Singh, IBM - KubeCon North America
KFServing is a serverless open source solution to serve machine learning models. With machine learning becoming more widely adopted in organizations, the trend is to deploy larger numbers of models. Plus, there is an increasing need to serve models using GPUs. As GPUs are expensive, engineers are seeking ways to serve multiple models with one GPU. The KFServing community designed a Multi-Model Serving solution to scale the number of models that can be served in a Kubernetes cluster. By sharing the serving container that is enabled to host multiple models, Multi-Model Serving addresses three limitations that the current ‘one model, one service’ paradigm encounters: 1) Compute resources (including the cost for public cloud), 2) Maximum number of pods, 3) Maximum number of IP addresses. This talk will present the design of Multi-Model Serving, describe how to use it to serve models for different frameworks, and share benchmark stats that demonstrate its scalability.
Видео Serving Machine Learning Models at Scale Using KServe - Animesh Singh, IBM - KubeCon North America канала Justin Miller
Видео Serving Machine Learning Models at Scale Using KServe - Animesh Singh, IBM - KubeCon North America канала Justin Miller
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Avoiding Common Mistakes In Your Ansible Playbooks - AnsibleFest 2021Spring Kafka Beyond the Basics: Lessons Learned - Kafka Summit 2020Help, My Kafka is Broken! - Kafka Summit 2020OpenDataHub Meet Up - Meeting - Monday July 12th, 2021Red Hat Insights AMA: Compare Systems with DriftData and PicardSystems @Scale 2019 - Welcome KeynoteSystems @Scale 2019 - Delos Storage for the Facebook Control PlaneSystems @Scale 2019 - Observability Infra Uber and FacebookSystems @Scale 2019 - Apache HiveSystems @Scale 2019 - Enabling next generation models for PYMK ScaleSystems @Scale 2019 - Continuous Deployment at Facebook ScaleSystems @Scale 2019 - Observability Infra at AffirmSystems @Scale 2019 - Continuous Deployment at Facebook ScaleSystems @Scale 2019 - Scaling Cluster Management at Facebook with TupperwareSystems @Scale 2019 - Accordion Better Memory Organization for LSM Key Value StoresSystems @Scale 2019 - Preemption in Nomad A Greedy Algorithm That ScalesSystems @Scale 2019 - Disaster Recovery at Facebook ScaleHow to use Impala's query plan and profile to fix Performance - Part 2How to use Impala's query plan and profile to fix Performance - Part 4