SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1560115650 of 474278 papers

TitleStatusHype
VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on ModelsCode1
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIPCode1
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and RobustnessCode1
Large Language Models for Planning: A Comprehensive and Systematic SurveyCode1
Unifying Multimodal Large Language Model Capabilities and Modalities via Model MergingCode1
Prot2Token: A Unified Framework for Protein Modeling via Next-Token PredictionCode1
KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge TracingCode1
PolyPose: Localizing Deformable Anatomy in 3D from Sparse 2D X-ray Images using Polyrigid TransformsCode1
PATS: Process-Level Adaptive Thinking Mode SwitchingCode1
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI WorldCode1
CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial CorrelationsCode1
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language ModelsCode1
On the Role of Label Noise in the Feature Learning ProcessCode1
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable RewardsCode1
Optimized Text Embedding Models and Benchmarks for Amharic Passage RetrievalCode1
MedITok: A Unified Tokenizer for Medical Image Synthesis and InterpretationCode1
ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy PredictionCode1
MMP-2K: A Benchmark Multi-Labeled Macro Photography Image Quality Assessment DatabaseCode1
ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language ModelsCode1
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMsCode1
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited DataCode1
How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic SegmentationCode1
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question AnsweringCode1
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language ModelsCode1
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural ChangeCode1
POQD: Performance-Oriented Query Decomposer for Multi-vector retrievalCode1
Freqformer: Image-Demoiréing Transformer via Efficient Frequency DecompositionCode1
Structured Reinforcement Learning for Combinatorial Decision-MakingCode1
Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuningCode1
Behavior Injection: Preparing Language Models for Reinforcement LearningCode1
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics ReasoningCode1
FlashMD: long-stride, universal prediction of molecular dynamicsCode1
FP4 All the Way: Fully Quantized Training of LLMsCode1
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target UnmixingCode1
Can Multimodal Large Language Models Understand Spatial Relations?Code1
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time ScalingCode1
STRICT: Stress Test of Rendering Images Containing TextCode1
Smoothie: Smoothing Diffusion on Token Embeddings for Text GenerationCode1
VORTA: Efficient Video Diffusion via Routing Sparse AttentionCode1
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified FrameworkCode1
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMsCode1
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning ChainsCode1
Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary MethodsCode1
Removal of Hallucination on Hallucination: Debate-Augmented RAGCode1
Enhancing Training Data Attribution with Representational OptimizationCode1
GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal SynthesisCode1
Mind the Gap: A Practical Attack on GGUF QuantizationCode1
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift AnalysisCode1
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
DVD-Quant: Data-free Video Diffusion Transformers QuantizationCode1
Show:102550
← PrevPage 313 of 9486Next →