AI Daily Brief@2025-03-21 | OpenAI voice models | openai.fm | Claude web search | NotebookLM mindmap
Yesterday, OpenAI rolled out major updates to its voice models.
For speech-to-text, they've switched from Whisper to 4o and 4o mini, significantly reducing word error rates, with especially notable improvements for less common languages.
On the text-to-speech front, they've moved from TTS-1 to 4o mini, with the biggest enhancement being steerability. You can now use natural language to freely control audio output effects, like specifying emotions, tone, and speaking speed. This flexibility represents a revolutionary advance compared to other voice services.
However, the text-to-speech service currently only uses the 4o mini distilled model, so the quality might not be optimal. Plus, it's limited to just 11 preset voices, which is clearly too few options.
OpenAI also launched something interesting called openai.fm. Don't misunderstand, they're not launching their own content service. This is a website where developers and users can experience the new voice models, select voices, input text, choose audio output effects, and then hit play.
Another piece of OpenAI news is the launch of the o1-pro model API. I've noticed that OpenAI's model pricing always becomes a hot topic because the rates are truly astonishing.
Let me compare for you: for input, the cost per million tokens is 15 dollars for o1, 75 dollars for GPT-4.5, and 150 dollars for o1-pro. For output, it's 60 dollars for o1, 150 dollars for GPT-4.5, and 600 dollars for o1-pro.
Next, let's continue yesterday's topic about the progress on the JFK files.
I knew AI would definitely tackle the newly released JFK files. Sure enough, Perplexity quickly analyzed the 80,000 pages of documents and published 10 insights on Twitter. Let's see what they found:
Yesterday, I introduced you to a research paper published by METR, which suggests that the length of tasks AI agents can complete doubles every 7 months. This could be seen as the Moore's Law of AI.
The paper sparked extensive discussion on Twitter. Let's look at how some heavyweight figures in the AI field evaluated this finding.
As a long-awaited feature, Claude has finally launched web search. I believe web search is the most practical among all the dazzling features AI companies roll out because it solves the model's real-time information problem.
However, this feature is currently only available to Claude's paying users in the United States. Users elsewhere or free-tier users might need to wait a bit longer.
Google also has a new feature. NotebookLM can now generate mind maps from notes, allowing users to interact with the model through these mind maps, making learning feel like playing a game. NotebookLM is truly an excellent product!
OpenAI's highly praised but expensive Deep Research service now displays remaining uses and days. For a service that requires careful consideration before each use, this feature is essential.
Видео AI Daily Brief@2025-03-21 | OpenAI voice models | openai.fm | Claude web search | NotebookLM mindmap канала Benjamin Turing
For speech-to-text, they've switched from Whisper to 4o and 4o mini, significantly reducing word error rates, with especially notable improvements for less common languages.
On the text-to-speech front, they've moved from TTS-1 to 4o mini, with the biggest enhancement being steerability. You can now use natural language to freely control audio output effects, like specifying emotions, tone, and speaking speed. This flexibility represents a revolutionary advance compared to other voice services.
However, the text-to-speech service currently only uses the 4o mini distilled model, so the quality might not be optimal. Plus, it's limited to just 11 preset voices, which is clearly too few options.
OpenAI also launched something interesting called openai.fm. Don't misunderstand, they're not launching their own content service. This is a website where developers and users can experience the new voice models, select voices, input text, choose audio output effects, and then hit play.
Another piece of OpenAI news is the launch of the o1-pro model API. I've noticed that OpenAI's model pricing always becomes a hot topic because the rates are truly astonishing.
Let me compare for you: for input, the cost per million tokens is 15 dollars for o1, 75 dollars for GPT-4.5, and 150 dollars for o1-pro. For output, it's 60 dollars for o1, 150 dollars for GPT-4.5, and 600 dollars for o1-pro.
Next, let's continue yesterday's topic about the progress on the JFK files.
I knew AI would definitely tackle the newly released JFK files. Sure enough, Perplexity quickly analyzed the 80,000 pages of documents and published 10 insights on Twitter. Let's see what they found:
Yesterday, I introduced you to a research paper published by METR, which suggests that the length of tasks AI agents can complete doubles every 7 months. This could be seen as the Moore's Law of AI.
The paper sparked extensive discussion on Twitter. Let's look at how some heavyweight figures in the AI field evaluated this finding.
As a long-awaited feature, Claude has finally launched web search. I believe web search is the most practical among all the dazzling features AI companies roll out because it solves the model's real-time information problem.
However, this feature is currently only available to Claude's paying users in the United States. Users elsewhere or free-tier users might need to wait a bit longer.
Google also has a new feature. NotebookLM can now generate mind maps from notes, allowing users to interact with the model through these mind maps, making learning feel like playing a game. NotebookLM is truly an excellent product!
OpenAI's highly praised but expensive Deep Research service now displays remaining uses and days. For a service that requires careful consideration before each use, this feature is essential.
Видео AI Daily Brief@2025-03-21 | OpenAI voice models | openai.fm | Claude web search | NotebookLM mindmap канала Benjamin Turing
Показать
Комментарии отсутствуют
Информация о видео
21 марта 2025 г. 19:13:34
00:10:48
Другие видео канала












