SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1775117800 of 474278 papers

TitleStatusHype
Thinking Preference OptimizationCode1
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment AnalysisCode1
Learning Dexterous Bimanual Catch Skills through Adversarial-Cooperative Heterogeneous-Agent Reinforcement LearningCode1
Model Generalization on Text Attribute Graphs: Principles with Large Language ModelsCode1
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?Code1
Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table ReasoningCode1
Event-based Solutions for Human-centered Applications: A Comprehensive ReviewCode1
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video GenerationCode1
AdaSplash: Adaptive Sparse Flash AttentionCode1
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?Code1
Range and Bird's Eye View Fused Cross-Modal Visual Place RecognitionCode1
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked EntitiesCode1
A Novel Unified Parametric Assumption for Nonconvex OptimizationCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Logic.py: Bridging the Gap between LLMs and Constraint SolversCode1
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQLCode1
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on ManchuCode1
VRoPE: Rotary Position Embedding for Video Large Language ModelsCode1
SMART: Self-Aware Agent for Tool Overuse MitigationCode1
Towards Mechanistic Interpretability of Graph Transformers via Attention GraphsCode1
Deep Learning of Proteins with Local and Global Regions of DisorderCode1
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUsCode1
Following the Autoregressive Nature of LLM Embeddings via Compression and AlignmentCode1
Market-Derived Financial Sentiment Analysis: Context-Aware Language Models for Crypto ForecastingCode1
Learning to Sample Effective and Diverse Prompts for Text-to-Image GenerationCode1
A Physics-Informed Blur Learning Framework for Imaging SystemsCode1
VANPY: Voice Analysis FrameworkCode1
Causal Inference for Qualitative OutcomesCode1
Small Models Struggle to Learn from Strong ReasonersCode1
ILIAS: Instance-Level Image retrieval At ScaleCode1
Masked Latent Prediction and Classification for Self-Supervised Audio Representation LearningCode1
VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual CuesCode1
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic ClaimsCode1
ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease DiagnosisCode1
VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy RisksCode1
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language ModelsCode1
DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object DetectionCode1
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingCode1
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language ModelsCode1
Dyve: Thinking Fast and Slow for Dynamic Process VerificationCode1
Learning to Reason from Feedback at Test-TimeCode1
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming CapabilitiesCode1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot ApplicationsCode1
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font AnnotationsCode1
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak AttacksCode1
MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language ModelsCode1
ReLearn: Unlearning via Learning for Large Language ModelsCode1
The Mirage of Model Editing: Revisiting Evaluation in the WildCode1
GRIFFIN: Effective Token Alignment for Faster Speculative DecodingCode1
Show:102550
← PrevPage 356 of 9486Next →