SOTAVerified

Benchmarking

Papers

Showing 901925 of 5548 papers

TitleStatusHype
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse0
Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability AnalysisCode0
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems0
Position: There are no Champions in Long-Term Time Series Forecasting0
Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification0
Benchmarking Self-Supervised Learning Methods for Accelerated MRI ReconstructionCode0
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking0
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior0
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
Multilingual European Language Models: Benchmarking Approaches and Challenges0
A deep learning framework for efficient pathology image analysisCode4
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models0
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation0
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?Code1
A new pathway to generative artificial intelligence by minimizing the maximum entropy0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative AnalysisCode0
Benchmarking MedMNIST dataset on real quantum hardware0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Positional Encoding in Transformer-Based Time Series Models: A SurveyCode1
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic ClaimsCode1
Show:102550
← PrevPage 37 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified