SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 69016925 of 474278 papers

TitleStatusHype
BERTrend: Neural Topic Modeling for Emerging Trends DetectionCode2
Online-LoRA: Task-free Online Continual Learning via Low Rank AdaptationCode2
LLM-PySC2: Starcraft II learning environment for Large Language ModelsCode2
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-AnsweringCode2
Improved Multi-Task Brain Tumour Segmentation with Synthetic Data AugmentationCode2
Brain Tumour Removing and Missing Modality Generation using 3D WDMCode2
Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and AlternativesCode2
PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-trainingCode2
HourVideo: 1-Hour Video-Language UnderstandingCode2
Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information RetrievalCode2
AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual AlignmentCode2
Scaling Laws for PrecisionCode2
Dialectal Coverage And Generalization in Arabic Speech RecognitionCode2
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?Code2
Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View SynthesisCode2
VQA^2: Visual Question Answering for Video Quality AssessmentCode2
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video UnderstandingCode2
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-RewardingCode2
AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-MakingCode2
3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object RearrangementCode2
GIS Copilot: Towards an Autonomous GIS Agent for Spatial AnalysisCode2
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language ModelsCode2
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference OptimizationCode2
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
Show:102550
← PrevPage 277 of 18972Next →