What is generative ai? : Model Format, Deployments, Runtimes - Part 2

What is generative ai? - Part 2 of the foundational concepts of GenAI series, designed to help you master the core principles of the field. While Part 1 focused on architecture and training phases, this session dives deep into how model files are stored, deployed, and accessed via APIs.
What You Will Learn:
• Model File Formats: Discover the technical differences between formats like PyTorch (.pth), ONNX, and the industry-standard SafeTensors, which provides high security and faster loading. We also explain the GGUF (GPT Generated Unified File) format, which is essential for running models on local CPUs.
• Deployment Options: We break down the six primary ways to deploy a model, ranging from Managed Cloud APIs (like OpenAI and Gemini) to Local Deployment on your own laptop. Learn about the trade-offs between proprietary and open-source models, as well as serverless vs. self-hosted infrastructure.
• AI Runtimes: Understand why there are so many runtimes, including Llama CPP for local development, vLLM for high-throughput production serving, and NVIDIA’s custom TensorRT-LLM.
• API Standards: Learn the difference between the OpenAI Standard, which emerged organically as a runtime protocol, and the OpenAPI specification used by providers like Google and Hugging Face.
0:00 - Introduction to GenAI Fundamentals (Part 2)
0:55 - Model Weights and Storage Formats
3:45 - Choosing the Right Format: Production vs. Local Deployment
7:00 - Deployment Options: Managed Cloud, Vertex AI, and Custom Models
9:40 - Self-Hosted, Serverless, and Local Deployment
11:20 - Understanding Runtimes: Llama CPP, vLLM, and TensorRT
14:35 - API Standards: OpenAI Standard vs. OpenAPI Specification

#arcrajeshkumarar

Видео What is generative ai? : Model Format, Deployments, Runtimes - Part 2 канала RajeshKumar AR | AI Automation