SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 56015625 of 177340 papers

TitleStatusHype
Human Preference Score: Better Aligning Text-to-Image Models with Human PreferenceCode2
Rotation Invariant Graph Neural Networks using Spin ConvolutionsCode2
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of RerankersCode2
ActionFormer: Localizing Moments of Actions with TransformersCode2
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image DehazingCode2
Multiview Compressive Coding for 3D ReconstructionCode2
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuningCode2
Encouraging Divergent Thinking in Large Language Models through Multi-Agent DebateCode2
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 LanguagesCode2
Retrieval Augmented Visual Question Answering with Outside KnowledgeCode2
Towards Zero-Shot Scale-Aware Monocular Depth EstimationCode2
A Dynamic Points Removal Benchmark in Point Cloud MapsCode2
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language ModelsCode2
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous DrivingCode2
OpenESS: Event-based Semantic Scene Understanding with Open VocabulariesCode2
What Can Natural Language Processing Do for Peer Review?Code2
Mixed-Curvature Decision Trees and Random ForestsCode2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual GroundingCode2
RecFlow: An Industrial Full Flow Recommendation DatasetCode2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference OptimizationCode2
ProxylessNAS: Direct Neural Architecture Search on Target Task and HardwareCode2
PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation TasksCode2
GPQA: A Graduate-Level Google-Proof Q&A BenchmarkCode2
PruneVid: Visual Token Pruning for Efficient Video Large Language ModelsCode2
Show:102550
← PrevPage 225 of 7094Next →