SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1505115100 of 474278 papers

TitleStatusHype
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?Code1
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph EmbeddingsCode1
A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching AssistantCode1
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and GenerationCode1
R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level VisionCode1
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning BehaviorCode1
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot NavigationCode1
DiffO: Single-step Diffusion for Image Compression at Ultra-Low BitratesCode1
Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB ImagesCode1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural UnitsCode1
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language ModelsCode1
EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised TrainingCode1
Probing the Robustness of Large Language Models Safety to Latent PerturbationsCode1
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling ResearchCode1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
On using AI for EEG-based BCI applications: problems, current challenges and future trendsCode1
StoryWriter: A Multi-Agent Framework for Long Story GenerationCode1
Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis GradingCode1
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion ModelCode1
All is Not Lost: LLM Recovery without CheckpointsCode1
GRAM: A Generative Foundation Reward Model for Reward GeneralizationCode1
Equivariance Everywhere All At Once: A Recipe for Graph Foundation ModelsCode1
Refining music sample identification with a self-supervised graph neural networkCode1
Sampling from Your Language Model One Byte at a TimeCode1
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference OptimizationCode1
A Variational Framework for Improving Naturalness in Generative Spoken Language ModelsCode1
Optimizing Length Compression in Large Reasoning ModelsCode1
Unsupervised Imaging Inverse Problems with Diffusion Distribution MatchingCode1
Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation ReuseCode1
MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style ConvolutionCode1
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World AnomaliesCode1
3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-SplattingCode1
AgentSynth: Scalable Task Generation for Generalist Computer-Use AgentsCode1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
RMIT-ADM+S at the SIGIR 2025 LiveRAG ChallengeCode1
SeqPE: Transformer with Sequential Position EncodingCode1
COME: Adding Scene-Centric Forecasting Control to Occupancy World ModelCode1
PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep LearningCode1
Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular ImagesCode1
TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented ContrastCode1
The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor ProductsCode1
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model OutputsCode1
Tady: A Neural Disassembler without Structural Constraint ViolationsCode1
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact VerifiersCode1
Steering LLM Thinking with Budget GuidanceCode1
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative RefinementCode1
Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack PerspectiveCode1
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table AnalysisCode1
Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide SequencingCode1
Show:102550
← PrevPage 302 of 9486Next →