This video was edited with AI agent. But how?
The talk is about world’s first open-source video editing agent!
Diffusion Studio x Re-Skill technology proposal:
Our Python-based agent starts a browser session using Playwright and opens operator.diffusion.studio.
This web app is a video editing UI optimized for agents, providing access to Diffusion Studio Core—a JavaScript-based engine that renders videos directly in the browser using WebCodecs (fully hardware-accelerated).
🖥 How it works:
1️⃣ A VideoEditingTool generates code based on user prompts and runs it in the browser.
2️⃣ If additional context is needed, DocsSearchTool uses RAG to pull information from operator.diffusion.studio/llms.txt.
3️⃣ After each execution step, the composition is sampled (currently 1 frame per second) and analyzed using VisualFeedbackTool via a multi-modal model.
4️⃣ The feedback system decides whether to proceed with rendering or refine further.
📡 File transfers between the browser and Python happen via Chrome DevTools Protocol, and for scalability, the agent can connect to a GPU-accelerated remote browser session via WebSocket (WIP: wss://chrome.diffusion.studio).
---
https://github.com/diffusionstudio/agent
https://re-skill.io/
slides: https://docs.google.com/presentation/d/1eipINYiwx3vjwvJXrv4QA0-9t4-uIVPuh112X9pElkM/edit?usp=sharing
Видео This video was edited with AI agent. But how? канала AI Engineer
Diffusion Studio x Re-Skill technology proposal:
Our Python-based agent starts a browser session using Playwright and opens operator.diffusion.studio.
This web app is a video editing UI optimized for agents, providing access to Diffusion Studio Core—a JavaScript-based engine that renders videos directly in the browser using WebCodecs (fully hardware-accelerated).
🖥 How it works:
1️⃣ A VideoEditingTool generates code based on user prompts and runs it in the browser.
2️⃣ If additional context is needed, DocsSearchTool uses RAG to pull information from operator.diffusion.studio/llms.txt.
3️⃣ After each execution step, the composition is sampled (currently 1 frame per second) and analyzed using VisualFeedbackTool via a multi-modal model.
4️⃣ The feedback system decides whether to proceed with rendering or refine further.
📡 File transfers between the browser and Python happen via Chrome DevTools Protocol, and for scalability, the agent can connect to a GPU-accelerated remote browser session via WebSocket (WIP: wss://chrome.diffusion.studio).
---
https://github.com/diffusionstudio/agent
https://re-skill.io/
slides: https://docs.google.com/presentation/d/1eipINYiwx3vjwvJXrv4QA0-9t4-uIVPuh112X9pElkM/edit?usp=sharing
Видео This video was edited with AI agent. But how? канала AI Engineer
Комментарии отсутствуют
Информация о видео
23 февраля 2025 г. 1:00:06
00:05:00
Другие видео канала




















