Zero-Shot Video Question Answer

This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–85 of 85 papers

Title	Date	Tasks	Status	Hype
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization	Feb 5, 2024	Science Question AnsweringText-to-Video Generation	CodeCode Available	4
A Simple LLM Framework for Long-Range Video Question-Answering	Dec 28, 2023	EgoSchemaLanguage Modelling	CodeCode Available	1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos	Dec 16, 2023	Video Captioningvideo narration captioning	CodeCode Available	1
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens	Dec 12, 2023	HallucinationPosition	—Unverified	0
VILA: On Pre-training for Visual Language Models	Dec 12, 2023	In-Context LearningLanguage Modelling	CodeCode Available	4
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2
Zero-Shot Video Question Answering with Procedural Programs	Dec 1, 2023	Code GenerationLanguage Modeling	—Unverified	0
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models	Nov 28, 2023	Image CaptioningQuestion Answering	CodeCode Available	2
Vamos: Versatile Action Models for Video Understanding	Nov 22, 2023	EgoSchemaHard Attention	CodeCode Available	0
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	Nov 16, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Nov 14, 2023	Image-based Generative Performance BenchmarkingLanguage Modeling	CodeCode Available	2
Mistral 7B	Oct 10, 2023	answerability predictionArithmetic Reasoning	CodeCode Available	6
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	Sep 27, 2023	GPUVideo-based Generative Performance Benchmarking	CodeCode Available	1
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts	Sep 27, 2023	Few-shot Video Question AnsweringPrompt Learning	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation	Aug 8, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
Valley: Video Assistant with Large Language model Enhanced abilitY	Jun 12, 2023	Action RecognitionInstruction Following	CodeCode Available	2
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	Jun 8, 2023	Question AnsweringVCGBench-Diverse	CodeCode Available	3
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	Jun 5, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
Self-Chained Image-Language Model for Video Localization and Question Answering	May 11, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
VideoChat: Chat-Centric Video Understanding	May 10, 2023	Question AnsweringVideo-based Generative Performance Benchmarking	CodeCode Available	4
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model	Apr 28, 2023	Instruction Followingmodel	CodeCode Available	5
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	Apr 27, 2023	Visual Question Answering (VQA)Zero-Shot Video Question Answer	CodeCode Available	4
Verbs in Action: Improving verb understanding in video-language models	Apr 13, 2023	Contrastive LearningQuestion Answering	CodeCode Available	0
ViperGPT: Visual Inference via Python Execution for Reasoning	Mar 14, 2023	Code GenerationVideo Question Answering	CodeCode Available	3
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation	Feb 24, 2023	Computational EfficiencyOffline RL	CodeCode Available	0
InternVideo: General Video Foundation Models via Generative and Discriminative Learning	Dec 6, 2022	Action ClassificationAction Recognition	CodeCode Available	4
0/1 Deep Neural Networks via Block Coordinate Descent	Jun 19, 2022	10-shot image generation	—Unverified	0
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models	Jun 16, 2022	Fill MaskLanguage Modeling	CodeCode Available	1
Flamingo: a Visual Language Model for Few-Shot Learning	Apr 29, 2022	Few-Shot LearningGenerative Visual Question Answering	CodeCode Available	4
MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks	Jul 26, 2019	Zero-Shot Video Question Answer	CodeCode Available	0
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering	Jun 6, 2019	Question AnsweringVideo Question Answering	CodeCode Available	0
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering	Apr 14, 2017	Question AnsweringVisual Question Answering	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.