SOTAVerified

Benchmarking

Papers

Showing 22262250 of 5548 papers

TitleStatusHype
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation0
A new pathway to generative artificial intelligence by minimizing the maximum entropy0
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative AnalysisCode0
Multilingual European Language Models: Benchmarking Approaches and Challenges0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models0
Benchmarking MedMNIST dataset on real quantum hardware0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Ansatz-free Hamiltonian learning with Heisenberg-limited scaling0
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models0
Knowledge-aware contrastive heterogeneous molecular graph learning0
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment0
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs0
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking0
User Profile with Large Language Models: Construction, Updating, and Benchmarking0
Yesil o1 Pro: Evidence-Based AI Model for Health and Benchmarking in Clinical Decision Support0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow0
Benchmarking the rationality of AI decision making using the transitivity axiom0
Show:102550
← PrevPage 90 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified