VLM | Gunshi Gupta

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Video Large Language Models (Video-LLMs) excel at understanding videos in-context, assuming full access to the video when answering queries. However, these models face challenges in streaming scenarios where hour-long videos must be processed online, …