SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 901950 of 659983 papers

TitleStatusHype
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingCode5
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything ModelCode5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video GenerationCode5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge AggregationCode5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real DataCode5
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMsCode5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
ESC-Eval: Evaluating Emotion Support Conversations in Large Language ModelsCode5
Uni-Mol2: Exploring Molecular Pretraining Model at ScaleCode5
aeon: a Python toolkit for learning from time seriesCode5
EvTexture: Event-driven Texture Enhancement for Video Super-ResolutionCode5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
Improving Text-To-Audio Models with Synthetic CaptionsCode5
Autoregressive Image Generation without Vector QuantizationCode5
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World DomainsCode5
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder PipelineCode5
PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing ImageryCode5
4M-21: An Any-to-Any Vision Model for Tens of Tasks and ModalitiesCode5
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal PromptsCode5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel FusionCode5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BCode5
Zero-shot Image Editing with Reference ImitationCode5
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationCode5
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth EstimationCode5
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsCode5
Matching Anything by Segmenting AnythingCode5
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsCode5
Text-to-Image Rectified Flow as Plug-and-Play PriorsCode5
Wings: Learning Multimodal LLMs without Text-only ForgettingCode5
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
Parrot: Multilingual Visual Instruction TuningCode5
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information FunnelingCode5
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
Ovis: Structural Embedding Alignment for Multimodal Large Language ModelCode5
Enhancing Efficiency of Safe Reinforcement Learning via Sample ManipulationCode5
Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch PredictionCode5
Xwin-LM: Strong and Scalable Alignment Practice for LLMsCode5
SpinQuant: LLM quantization with learned rotationsCode5
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological CounselingCode5
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZCode5
Focus Anywhere for Fine-grained Multi-page Document UnderstandingCode5
TimeMixer: Decomposable Multiscale Mixing for Time Series ForecastingCode5
Improved Distribution Matching Distillation for Fast Image SynthesisCode5
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM CompressionCode5
Awesome Multi-modal Object TrackingCode5
Diffusion for World Modeling: Visual Details Matter in AtariCode5
Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose PredictionCode5
Show:102550
← PrevPage 19 of 13200Next →