SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 82268250 of 474278 papers

TitleStatusHype
GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold NetworksCode2
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging casesCode2
Adaptable Logical Control for Large Language ModelsCode2
WATT: Weight Average Test-Time Adaptation of CLIPCode2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic ImagesCode2
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU TasksCode2
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsCode2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image UnderstandingCode2
Immiscible Diffusion: Accelerating Diffusion Training with Noise AssignmentCode2
Can Go AIs be adversarially robust?Code2
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and ReactionCode2
AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image ClassificationCode2
Coding Speech through Vocal Tract KinematicsCode2
From Instance Training to Instruction Learning: Task Adapters Generation from InstructionsCode2
ChangeViT: Unleashing Plain Vision Transformers for Change DetectionCode2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
TroL: Traversal of Layers for Large Language and Vision ModelsCode2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AICode2
Automated MRI Quality Assessment of Brain T1-weighted MRI in Clinical Data Warehouses: A Transfer Learning Approach Relying on Artefact SimulationCode2
Dissecting Adversarial Robustness of Multimodal LM AgentsCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
Universal Score-based Speech Enhancement with High Content PreservationCode2
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local AttentionCode2
AgentReview: Exploring Peer Review Dynamics with LLM AgentsCode2
Show:102550
← PrevPage 330 of 18972Next →