SOTAVerified

Multimodal Reasoning

Reasoning over multimodal inputs.

Papers

Showing 251275 of 302 papers

TitleStatusHype
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models0
Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Unveiling AI's Potential Through Tools, Techniques, and Applications0
EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent0
Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios0
Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations0
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models0
Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization0
EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges0
EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications0
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing0
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models0
Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison0
Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics0
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning0
Towards Holistic Disease Risk Prediction using Small Language Models0
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models0
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering0
FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design0
CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base0
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning0
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View0
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning0
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning0
Show:102550
← PrevPage 11 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VAccuracy24Unverified
2Gemini ProAccuracy13.2Unverified
3LLaVa-1.5-13BAccuracy1.8Unverified
4LLaVa-1.5-7BAccuracy1.5Unverified
5BLIP2-FLAN-T5-XXLAccuracy0.9Unverified
6QWENAccuracy0.9Unverified
7CogVLMAccuracy0.9Unverified
8InstructBLIPAccuracy0.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4VAccuracy22.76Unverified
2Gemini ProAccuracy17.66Unverified
3Qwen-VL-MaxAccuracy15.59Unverified
4InternLM-XComposer2-VLAccuracy14.54Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Acc30.3Unverified