Нейронный Кот@neural

Нейронный Кот

Qwen3-Omni-30B-A3B-Captioner

Тут Qwen на днях выпустили модель, которая хорошо умеет описывать аудио файлы.

То есть у нее хороший audio understanding, и тут разговор не про распознавание речи. Модель понимает:

multiple speaker emotions, multilingual expressions, and layered intentions. It can also perceive cultural context and implicit information within the audio, enabling a deep comprehension of the underlying meaning behind the spoken words. In non-speech scenarios, the model demonstrates exceptional sound recognition and analysis capabilities, accurately distinguishing and describing intricate layers of real-world sounds, ambient atmospheres, and dynamic audio details in film and media.

Я прогнал через модель звук из видео «Бурановские Бабушки»: В кругу друзей. (всего 223 просмотра — поднажмем!) Получилось достаточно хорошо (см. скрин). Модель даже понимает, к какой секунде относится каждая часть контента.

НО! Нельзя задать промпт, модель принимает только аудио. То есть нельзя, например, попросить оценить акцент вашей речи, — можно только получить полное общее описание.

Вопрос — в каком продукте такая модель могла бы понадобиться?

⛏

модель

😛

демка

Please open Telegram to view this post

VIEW IN TELEGRAM

🔥3

www.tgoop.com/neural_cat/141

923 viewsedited Sep 26 at 13:03

tgoop.com/neural_cat/141

Create: 2025-09-26
Last Update: 2025-10-20 02:06:08

multiple speaker emotions, multilingual expressions, and layered intentions. It can also perceive cultural context and implicit information within the audio, enabling a deep comprehension of the underlying meaning behind the spoken words. In non-speech scenarios, the model demonstrates exceptional sound recognition and analysis capabilities, accurately distinguishing and describing intricate layers of real-world sounds, ambient atmospheres, and dynamic audio details in film and media.

⛏

модель

😛

демка

Telegram News

Qwen3-Omni-30B-A3B-Captioner