SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 34513475 of 661570 papers

TitleStatusHype
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsCode3
TopoBench: A Framework for Benchmarking Topological Deep LearningCode3
Probabilistic Weather Forecasting with Hierarchical Graph Neural NetworksCode3
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed TomographyCode3
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language ModelsCode3
CRAG -- Comprehensive RAG BenchmarkCode3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCode3
GameBench: Evaluating Strategic Reasoning Abilities of LLM AgentsCode3
Multi-Head RAG: Solving Multi-Aspect Problems with LLMsCode3
Improving Alignment and Robustness with Circuit BreakersCode3
Physics3D: Learning Physical Properties of 3D Gaussians via Video DiffusionCode3
VideoTetris: Towards Compositional Text-to-Video GenerationCode3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
Vision-LSTM: xLSTM as Generic Vision BackboneCode3
Are We Done with MMLU?Code3
MLVU: Benchmarking Multi-task Long Video UnderstandingCode3
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference OptimizationCode3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language ModelsCode3
FusionBench: A Comprehensive Benchmark of Deep Model FusionCode3
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image AnalysisCode3
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance SegmentationCode3
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-MultinomialsCode3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language ModelsCode3
Description Boosting for Zero-Shot Entity and Relation ClassificationCode3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkCode3
Show:102550
← PrevPage 139 of 26463Next →