SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 55515600 of 661570 papers

TitleStatusHype
xVerify: Efficient Answer Verifier for Reasoning Model EvaluationsCode2
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement LearningCode2
FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand GenerationCode2
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge ModelsCode2
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World UsersCode2
NaviDiffusor: Cost-Guided Diffusion Model for Visual NavigationCode2
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-IdentificationCode2
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language ModelsCode2
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative ReasoningCode2
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal UnderstandingCode2
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape GenerationCode2
Software package for simulations using the coarse-grained CALVADOS modelCode2
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single TransformerCode2
ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language modelCode2
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model CapabilityCode2
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video ReasoningCode2
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented GenerationCode2
Vision-Language Model for Object Detection and Segmentation: A Review and EvaluationCode2
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large ImagesCode2
SegEarth-R1: Geospatial Pixel Reasoning via Large Language ModelCode2
Learning Occlusion-Robust Vision Transformers for Real-Time UAV TrackingCode2
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian SplattingCode2
Flux Already Knows -- Activating Subject-Driven Image Generation without TrainingCode2
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and FutureCode2
TorchFX: A modern approach to Audio DSP with PyTorch and GPU accelerationCode2
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera MovementsCode2
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced ReasoningCode2
DataMap: A Portable Application for Visualizing High-Dimensional DataCode2
self-prompting analogical reasoning for uav object detectionCode2
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
SpecReason: Fast and Accurate Inference-Time Compute via Speculative ReasoningCode2
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsCode2
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement LearningCode2
P2Object: Single Point Supervised Object Detection and Instance SegmentationCode2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video SegmentationCode2
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language ModelsCode2
MM-IFEngine: Towards Multimodal Instruction FollowingCode2
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible CorporaCode2
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank AdaptationCode2
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-ImprovementCode2
Compositional Flows for 3D Molecule and Synthesis Pathway Co-designCode2
LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document RerankingCode2
OmniCaptioner: One Captioner to Rule Them AllCode2
AssistanceZero: Scalably Solving Assistance GamesCode2
ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating UtilitiesCode2
Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object DetectionCode2
Objaverse++: Curated 3D Object Dataset with Quality AnnotationsCode2
InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction FeaturesCode2
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language ModelingCode2
Rethinking LayerNorm in Image Restoration TransformersCode2
Show:102550
← PrevPage 112 of 13232Next →