SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 81268150 of 474278 papers

TitleStatusHype
Efficient World Models with Context-Aware TokenizationCode2
T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient EmbeddingsCode2
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human InteractionsCode2
Chat AI: A Seamless Slurm-Native Solution for HPC-Based ServicesCode2
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure GuidanceCode2
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language ModelsCode2
A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four StemsCode2
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMsCode2
KAGNNs: Kolmogorov-Arnold Networks meet Graph LearningCode2
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language ModelsCode2
RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNetsCode2
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context InferenceCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
GenRL: Multimodal-foundation world models for generalization in embodied agentsCode2
MatchTime: Towards Automatic Soccer Game Commentary GenerationCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse ProcessCode2
Denoising as Adaptation: Noise-Space Domain Adaptation for Image RestorationCode2
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMsCode2
A Closer Look into Mixture-of-Experts in Large Language ModelsCode2
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing ImageryCode2
EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion RecognitionCode2
EgoVideo: Exploring Egocentric Foundation Model and Downstream AdaptationCode2
Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular VideosCode2
Show:102550
← PrevPage 326 of 18972Next →