SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 18511900 of 10817 papers

TitleStatusHype
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question AnsweringCode1
Compositional Exemplars for In-context LearningCode1
Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and SayingsCode1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingCode1
Learning to Answer Questions in Dynamic Audio-Visual ScenariosCode1
Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-AnsweringCode1
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name RecognitionCode1
Less is More: Data-Efficient Complex Question Answering over Knowledge BasesCode1
Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer FrameworkCode1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
Side-Tuning: A Baseline for Network Adaptation via Additive Side NetworksCode1
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic DimensionCode1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple TasksCode1
Leaf: Multiple-Choice Question GenerationCode1
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language ModelsCode1
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over WikidataCode1
CharBERT: Character-aware Pre-trained Language ModelCode1
Simple Questions Generate Named Entity Recognition DatasetsCode1
LIVE: Learnable In-Context Vector for Visual Question AnsweringCode1
Complex Knowledge Base Question Answering: A SurveyCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
Interacted Object Grounding in Spatio-Temporal Human-Object InteractionsCode1
ChartInstruct: Instruction Tuning for Chart Comprehension and ReasoningCode1
Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language ModelsCode1
Interactive Language Learning by Question AnsweringCode1
A Dataset for Interactive Vision-Language Navigation with Unknown Command FeasibilityCode1
CompAct: Compressing Retrieved Documents Actively for Question AnsweringCode1
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question AnsweringCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language ModelsCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
Lattice CNNs for Matching Based Chinese Question AnsweringCode1
Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical ReasoningCode1
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question AnsweringCode1
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge GraphCode1
A Dataset for Statutory Reasoning in Tax Law Entailment and Question AnsweringCode1
ChatGPT: Jack of all trades, master of noneCode1
Introspective Distillation for Robust Question AnsweringCode1
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity LinkingCode1
LaTr: Layout-Aware Transformer for Scene-Text VQACode1
Invariant Grounding for Video Question AnsweringCode1
Lawformer: A Pre-trained Language Model for Chinese Legal Long DocumentsCode1
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question AnsweringCode1
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question AnsweringCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Learning Associative Inference Using Fast Weight MemoryCode1
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairsCode1
Show:102550
← PrevPage 38 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified