SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 40514100 of 10817 papers

TitleStatusHype
ESQA: Event Sequences Question Answering0
KeyVideoLLM: Towards Large-scale Video Keyframe Selection0
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs0
UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization0
SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research0
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis0
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness0
The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA0
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs0
Neurocache: Efficient Vector Retrieval for Long-range Language ModelingCode0
Synthetic Multimodal Question Generation0
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation0
M2QA: Multi-domain Multilingual Question AnsweringCode0
Calibrated Large Language Models for Binary Question Answering0
Dynamic Few-Shot Learning for Knowledge Graph Question Answering0
An Empirical Comparison of Generative Approaches for Product Attribute-Value IdentificationCode0
Optimization of Retrieval-Augmented Generation Context with Outlier Detection0
Hierarchical Memory for Long Video QA0
Financial Knowledge Large Language Model0
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs0
BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical ScienceCode0
BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering0
Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review0
Follow-Up Questions Improve Documents Generated by Large Language Models0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering SystemsCode0
Changing Answer Order Can Decrease MMLU Accuracy0
TrustUQA: A Trustful Framework for Unified Structured Data Question AnsweringCode0
Length Optimization in Conformal PredictionCode0
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature DistillationCode0
Handling Ontology Gaps in Semantic ParsingCode0
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems0
Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models0
Geode: A Zero-shot Geospatial Question-Answering Agent with Explicit Reasoning and Precise Spatio-Temporal Retrieval0
Sanskrit Knowledge-based Systems: Annotation and Computational Tools0
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and PromptsCode0
Entropy-Based Decoding for Retrieval-Augmented Large Language ModelsCode0
CaLMQA: Exploring culturally specific long-form question answering across 23 languagesCode0
Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD0
Zero-Shot Long-Form Video Understanding through Screenplay0
Claude 3.5 Sonnet Model Card Addendum0
UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language UnderstandingCode0
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness0
GPT-4V Explorations: Mining Autonomous Driving0
Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks0
Modulating Language Model Experiences through Frictions0
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs0
Attention Instruction: Amplifying Attention in the Middle via PromptingCode0
Training-Free Exponential Context Extension via Cascading KV CacheCode0
Show:102550
← PrevPage 82 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified