SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 39013925 of 661570 papers

TitleStatusHype
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart ReasoningCode3
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCode3
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language ModelsCode3
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language ModelsCode3
GenAD: Generative End-to-End Autonomous DrivingCode3
OneBit: Towards Extremely Low-bit Large Language ModelsCode3
LLMDFA: Analyzing Dataflow in Code with Large Language ModelsCode3
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCode3
Discovering and exploring cases of educational source code plagiarism with DolosCode3
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-TuningCode3
Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic ChipsCode3
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language ModelsCode3
GES: Generalized Exponential Splatting for Efficient Radiance Field RenderingCode3
Data Engineering for Scaling Language Models to 128K ContextCode3
BitDelta: Your Fine-Tune May Only Be Worth One BitCode3
QuRating: Selecting High-Quality Data for Training Language ModelsCode3
Magic-Me: Identity-Specific Video Customized DiffusionCode3
Traj-LIO: A Resilient Multi-LiDAR Multi-IMU State Estimator Through Sparse Gaussian ProcessCode3
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal RetrieversCode3
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree SearchCode3
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language ModelsCode3
SPO: Sequential Monte Carlo Policy OptimisationCode3
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language ModelsCode3
Scaling Laws for Fine-Grained Mixture of ExpertsCode3
X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular DesignCode3
Show:102550
← PrevPage 157 of 26463Next →