SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2020120250 of 474278 papers

TitleStatusHype
Autoencoding Random Forests0
LPOI: Listwise Preference Optimization for Vision Language ModelsCode1
Simple yet Effective Graph Distillation via Clustering0
The Multilingual Divide and Its Impact on Global AI Safety0
OASIS: Online Sample Selection for Continual Visual Instruction Tuning0
AI Approach for Predicting Superhyrophobicity of Thermal Sprayed Copper Coated Aluminum SurfacesCode0
AZT1D: A Real-World Dataset for Type 1 Diabetes0
Rendering-Aware Reinforcement Learning for Vector Graphics Generation0
Code Researcher: Deep Research Agent for Large Systems Code and Commit History0
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment GroundingCode1
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?Code2
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed IndividualsCode1
Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local ExplanationsCode1
Non-invasive two-step strategy BCI: brain-muscle-hand interface0
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs0
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO0
Constructing a bridge between functioning of oscillatory neuronal networks and quantum-like cognition along with quantum-inspired computation and AI0
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes0
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models0
Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective0
WDMIR: Wavelet-Driven Multimodal Intent Recognition0
ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools0
Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration0
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective0
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation0
A Reinforcement-Learning-Enhanced LLM Framework for Automated A/B Testing in Personalized Marketing0
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short OnesCode0
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsCode2
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMsCode0
HoliTom: Holistic Token Merging for Fast Video Large Language ModelsCode2
ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding0
Aligning Proteins and Language: A Foundation Model for Protein Retrieval0
E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing0
Hardware-Efficient Attention for Fast DecodingCode2
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long SequencesCode0
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation0
Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score0
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations0
PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing0
Pretraining Language Models to Ponder in Continuous SpaceCode1
TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-stateCode2
LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization AlgorithmsCode2
Reinforcing General Reasoning without VerifiersCode2
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models0
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning0
Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning0
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language ModelsCode2
Show:102550
← PrevPage 405 of 9486Next →