SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 56515675 of 177340 papers

TitleStatusHype
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 LanguagesCode2
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language ModelsCode2
Source-free Subject Adaptation for EEG-based Visual RecognitionCode2
HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden StatesCode2
Training-Free Adaptive Diffusion with Bounded Difference Approximation StrategyCode2
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image GenerationCode2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
Order Constraints in Optimal TransportCode2
Real-time Scene Text Detection with Differentiable BinarizationCode2
An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleCode2
VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo AlignmentCode2
Hopular: Modern Hopfield Networks for Tabular DataCode2
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
Improving the Training of Rectified FlowsCode2
A Systematic Study of Joint Representation Learning on Protein Sequences and StructuresCode2
Evaluating the Performance of Large Language Models on GAOKAO BenchmarkCode2
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language modelsCode2
On the Origin of Llamas: Model Tree Heritage RecoveryCode2
GPT-NER: Named Entity Recognition via Large Language ModelsCode2
Interpretable and Generalizable Graph Learning via Stochastic Attention MechanismCode2
AST-T5: Structure-Aware Pretraining for Code Generation and UnderstandingCode2
NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and LocalizationCode2
RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering SupervisionCode2
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMsCode2
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?Code2
Show:102550
← PrevPage 227 of 7094Next →