<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Video Understanding on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/video-understanding/</link><description>Recent content in Video Understanding on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 27 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/video-understanding/index.xml" rel="self" type="application/rss+xml"/><item><title>Aliyun Bailian (3): Qwen-Omni for Video, Audio, and Image Understanding</title><link>https://www.chenk.top/en/aliyun-bailian/03-qwen-omni-multimodal/</link><pubDate>Fri, 27 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/03-qwen-omni-multimodal/</guid><description>&lt;p>Of all the Bailian models, Qwen-Omni has saved me the most from product-roadmap issues. &amp;ldquo;Can you tell me what&amp;rsquo;s happening in this 2-minute promo video?&amp;rdquo; used to take 3 weeks, involving frame extraction, captioning each frame, and stitching them together. With Qwen-Omni, it&amp;rsquo;s just one HTTP request. However, the documentation lacks details on some pitfalls, such as the requirement for streaming, which has cost more than one team a half-day. Let&amp;rsquo;s avoid that for you.&lt;/p></description></item></channel></rss>