SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 69016950 of 661570 papers

TitleStatusHype
TabLLM: Few-shot Classification of Tabular Data with Large Language ModelsCode2
Human Preference Score: Better Aligning Text-to-Image Models with Human PreferenceCode2
Rotation Invariant Graph Neural Networks using Spin ConvolutionsCode2
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of RerankersCode2
ActionFormer: Localizing Moments of Actions with TransformersCode2
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image DehazingCode2
Multiview Compressive Coding for 3D ReconstructionCode2
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuningCode2
Encouraging Divergent Thinking in Large Language Models through Multi-Agent DebateCode2
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 LanguagesCode2
Retrieval Augmented Visual Question Answering with Outside KnowledgeCode2
Towards Zero-Shot Scale-Aware Monocular Depth EstimationCode2
A Dynamic Points Removal Benchmark in Point Cloud MapsCode2
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language ModelsCode2
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous DrivingCode2
OpenESS: Event-based Semantic Scene Understanding with Open VocabulariesCode2
What Can Natural Language Processing Do for Peer Review?Code2
Mixed-Curvature Decision Trees and Random ForestsCode2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual GroundingCode2
RecFlow: An Industrial Full Flow Recommendation DatasetCode2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference OptimizationCode2
ProxylessNAS: Direct Neural Architecture Search on Target Task and HardwareCode2
PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation TasksCode2
GPQA: A Graduate-Level Google-Proof Q&A BenchmarkCode2
PruneVid: Visual Token Pruning for Efficient Video Large Language ModelsCode2
Voice Conversion With Just Nearest NeighborsCode2
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with TransformersCode2
DreamLLM: Synergistic Multimodal Comprehension and CreationCode2
On-Device Domain GeneralizationCode2
Dynamic Early Exit in Reasoning ModelsCode2
Medical Vision Generalist: Unifying Medical Imaging Tasks in ContextCode2
AIR-Bench: Automated Heterogeneous Information Retrieval BenchmarkCode2
Revisiting Adversarial Training under Long-Tailed DistributionsCode2
Many-Shot In-Context Learning in Multimodal Foundation ModelsCode2
Towards Unified Keyframe Propagation ModelsCode2
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive TasksCode2
OS-Harm: A Benchmark for Measuring Safety of Computer Use AgentsCode2
A Versatile Framework for Multi-scene Person Re-identificationCode2
Measuring Massive Multitask Language UnderstandingCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-TuningCode2
Tuning Large Neural Networks via Zero-Shot Hyperparameter TransferCode2
YOLO-UniOW: Efficient Universal Open-World Object DetectionCode2
Voxurf: Voxel-based Efficient and Accurate Neural Surface ReconstructionCode2
DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic SystemsCode2
CLRerNet: Improving Confidence of Lane Detection with LaneIoUCode2
Do we actually understand the impact of renewables on electricity prices? A causal inference approachCode2
Transformer Circuit Faithfulness Metrics are not RobustCode2
Retinexmamba: Retinex-based Mamba for Low-light Image EnhancementCode2
Show:102550
← PrevPage 139 of 13232Next →