SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 33513375 of 661570 papers

TitleStatusHype
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models1
A Framework for Formalizing LLM Agent Security0
Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids0
Narrative Aligned Long Form Video Question Answering0
Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following0
Any-Subgroup Equivariant Networks via Symmetry Breaking0
ICLAD: In-Context Learning for Unified Tabular Anomaly Detection Across Supervision Regimes0
Teaching an Agent to Sketch One Part at a Time0
Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering0
Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement0
Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks0
Linear Social Choice with Few Queries: A Moment-Based Approach0
FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy0
Learning to Disprove: Formal Counterexample Generation with Large Language Models0
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models0
Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis0
ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding0
Inducing Sustained Creativity and Diversity in Large Language Models0
Recognising BSL Fingerspelling in Continuous Signing Sequences0
SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions0
AURORA: Adaptive Unified Representation for Robust Ultrasound AnalysisCode0
Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social DilemmasCode0
TRACE: Trajectory Recovery with State Propagation Diffusion for Urban MobilityCode0
End-to-End QGAN-Based Image Synthesis via Neural Noise Encoding and Intensity Calibration0
Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework0
Show:102550
← PrevPage 135 of 26463Next →