SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1670116750 of 474278 papers

TitleStatusHype
Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series ForecastingCode1
TuRTLe: A Unified Evaluation of LLMs for RTL GenerationCode1
ZeroMimic: Distilling Robotic Manipulation Skills from Web VideosCode1
MaintainCoder: Maintainable Code Generation Under Dynamic RequirementsCode1
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time ComputeCode1
IMPACT: A Generic Semantic Loss for Multimodal Medical Image RegistrationCode1
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research PapersCode1
InteractiveSurvey: An LLM-based Personalized and Interactive Survey Paper Generation SystemCode1
Rethinking Key-Value Cache Compression Techniques for Large Language Model ServingCode1
3D Dental Model Segmentation with Geometrical Boundary PreservingCode1
Boosting MLLM Reasoning with Text-Debiased Hint-GRPOCode1
Exploring Temporal Dynamics in Event-based Eye TrackerCode1
Spectral-Adaptive Modulation Networks for Visual PerceptionCode1
Towards Understanding How Knowledge Evolves in Large Vision-Language ModelsCode1
Can Test-Time Scaling Improve World Foundation Model?Code1
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel DataCode1
Universal Zero-shot Embedding InversionCode1
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language ModelsCode1
EagleVision: Object-level Attribute Multimodal LLM for Remote SensingCode1
DASH: Detection and Assessment of Systematic Hallucinations of VLMsCode1
Enhancing Creative Generation on Stable Diffusion-based ModelsCode1
LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational RecommendationCode1
Whisper-LM: Improving ASR Models with Language Models for Low-Resource LanguagesCode1
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time AdaptationCode1
A Survey on Unlearnable DataCode1
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation ModelCode1
A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal ControlCode1
POINT^2: A Polymer Informatics Training and Testing DatabaseCode1
LIRA: A Learning-based Query-aware Partition Framework for Large-scale ANN SearchCode1
Language Guided Concept Bottleneck Models for Interpretable Continual LearningCode1
BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image SegmentationCode1
Uncertainty-Instructed Structure Injection for Generalizable HD Map ConstructionCode1
AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry DataCode1
Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense RetrievalCode1
SuperEIO: Self-Supervised Event Feature Learning for Event Inertial OdometryCode1
ShiftLIC: Lightweight Learned Image Compression with Spatial-Channel Shift OperationsCode1
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMsCode1
STSA: Spatial-Temporal Semantic Alignment for Visual DubbingCode1
RefChartQA: Grounding Visual Answer on Chart Images through Instruction TuningCode1
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference OptimizationCode1
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?Code1
tempdisagg: A Python Framework for Temporal Disaggregation of Time Series DataCode1
Multi-modal Knowledge Distillation-based Human Trajectory ForecastingCode1
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy PredictionCode1
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisCode1
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-IDCode1
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound ScenesCode1
FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt LearningCode1
EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric VideosCode1
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene FlowCode1
Show:102550
← PrevPage 335 of 9486Next →