SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 82518300 of 177340 papers

TitleStatusHype
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation ExperimentsCode2
3DFaceShop: Explicitly Controllable 3D-Aware Portrait GenerationCode2
ExpeL: LLM Agents Are Experiential LearnersCode2
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsCode2
MuMA-ToM: Multi-modal Multi-Agent Theory of MindCode2
Retrieval-Augmented Diffusion Models for Time Series ForecastingCode2
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object DetectionCode2
Brax -- A Differentiable Physics Engine for Large Scale Rigid Body SimulationCode2
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing ImageryCode2
Machine learning interatomic potential can infer electrical responseCode2
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
Fully Sparse 3D Occupancy PredictionCode2
SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity RecognitionCode2
MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable RegistrationCode2
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category ReconstructionCode2
Human Pose as Compositional TokensCode2
Dense Distinct Query for End-to-End Object DetectionCode2
Deduplicating Training Data Makes Language Models BetterCode2
Approximate Convex Decomposition for 3D Meshes with Collision-Aware Concavity and Tree SearchCode2
Autonomous GIS: the next-generation AI-powered GISCode2
The Surprising Effectiveness of Negative Reinforcement in LLM ReasoningCode2
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D DataCode2
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
Graph Neural Network Surrogates to leverage Mechanistic Expert Knowledge towards Reliable and Immediate Pandemic ResponseCode2
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure AnalysisCode2
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph MatchingCode2
ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature LearningCode2
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose EstimationCode2
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal ModelCode2
Bracketing Image Restoration and Enhancement with High-Low Frequency DecompositionCode2
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design AutomationCode2
Overview of the PromptCBLUE Shared Task in CHIP2023Code2
DebugBench: Evaluating Debugging Capability of Large Language ModelsCode2
SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement LearningCode2
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMsCode2
PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image SegmentationCode2
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular FusionCode2
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic NavigationCode2
STEVE-1: A Generative Model for Text-to-Behavior in MinecraftCode2
An Efficient and Mixed Heterogeneous Model for Image RestorationCode2
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and ArenaCode2
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and DevelopmentCode2
Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model ErasCode2
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly DetectionCode2
TeCH: Text-guided Reconstruction of Lifelike Clothed HumansCode2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process ThinkingCode2
Show:102550
← PrevPage 166 of 3547Next →