SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 901925 of 659983 papers

TitleStatusHype
Fake News Detection: It's All in the Data!Code5
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingCode5
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything ModelCode5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video GenerationCode5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge AggregationCode5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real DataCode5
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMsCode5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
ESC-Eval: Evaluating Emotion Support Conversations in Large Language ModelsCode5
Uni-Mol2: Exploring Molecular Pretraining Model at ScaleCode5
aeon: a Python toolkit for learning from time seriesCode5
EvTexture: Event-driven Texture Enhancement for Video Super-ResolutionCode5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
Improving Text-To-Audio Models with Synthetic CaptionsCode5
Autoregressive Image Generation without Vector QuantizationCode5
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World DomainsCode5
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder PipelineCode5
PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing ImageryCode5
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal PromptsCode5
4M-21: An Any-to-Any Vision Model for Tens of Tasks and ModalitiesCode5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel FusionCode5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BCode5
Show:102550
← PrevPage 37 of 26400Next →