SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1000110025 of 177340 papers

TitleStatusHype
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement LearningCode2
Seeing World Dynamics in a NutshellCode2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language ModelsCode2
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAGCode2
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding TutorsCode2
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference OptimizationCode2
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance SegmentationCode2
A Survey on Data Contamination for Large Language ModelsCode2
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion ModelsCode2
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric PruningCode2
voc2vec: A Foundation Model for Non-Verbal VocalizationCode2
WebGames: Challenging General-Purpose Web-Browsing AI AgentsCode2
FlexVAR: Flexible Visual Autoregressive Modeling without Residual PredictionCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
Automatic database description generation for Text-to-SQLCode2
UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture SearchCode2
LongProLIP: A Probabilistic Vision-Language Model with Long Context TextCode2
An Approach for Air Drawing Using Background Subtraction and Contour ExtractionCode2
Interactive Debugging and Steering of Multi-Agent AI SystemsCode2
MPO: Boosting LLM Agents with Meta Plan OptimizationCode2
Text2LIVE: Text-Driven Layered Image and Video EditingCode2
Similarity-Guided Layer-Adaptive Vision Transformer for UAV TrackingCode2
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian SplatsCode2
Is CLIP ideal? No. Can we fix it? Yes!Code2
Word2World: Generating Stories and Worlds through Large Language ModelsCode2
Show:102550
← PrevPage 401 of 7094Next →