SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1565115700 of 474278 papers

TitleStatusHype
DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding SpacesCode1
MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image SegmentationCode1
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language ModelsCode1
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ TasksCode1
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured AttentionCode1
MLRan: A Behavioural Dataset for Ransomware Analysis and DetectionCode1
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
Flex-Judge: Think Once, Judge AnywhereCode1
VORTA: Efficient Video Diffusion via Routing Sparse AttentionCode1
Smoothie: Smoothing Diffusion on Token Embeddings for Text GenerationCode1
Removal of Hallucination on Hallucination: Debate-Augmented RAGCode1
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified FrameworkCode1
GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal SynthesisCode1
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series ForecastingCode1
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank AdaptationCode1
Generative Distribution EmbeddingsCode1
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language ModelsCode1
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose EstimationCode1
IDA-Bench: Evaluating LLMs on Interactive Guided Data AnalysisCode1
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic informationCode1
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVsCode1
Co-Reinforcement Learning for Unified Multimodal Understanding and GenerationCode1
Value-Guided Search for Efficient Chain-of-Thought ReasoningCode1
RaDeR: Reasoning-aware Dense Retrieval ModelsCode1
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target AtomsCode1
Frankentext: Stitching random text fragments into long-form narrativesCode1
MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-TuningCode1
HRSim: An agent-based simulation platform for high-capacity ride-sharing servicesCode1
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement LearningCode1
Taming Diffusion for Dataset Distillation with High RepresentativenessCode1
The Cell Must Go On: Agar.io for Continual Reinforcement LearningCode1
CENet: Context Enhancement Network for Medical Image SegmentationCode1
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge GraphCode1
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement LearningCode1
Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate PredictionCode1
Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross AttentionCode1
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion ModelsCode1
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence ModelsCode1
Reinforcement Learning for Ballbot Navigation in Uneven TerrainCode1
REN: Fast and Efficient Region Encodings from Patch-Based Image EncodersCode1
Universal Biological Sequence Reranking for Improved De Novo Peptide SequencingCode1
Center-aware Residual Anomaly Synthesis for Multi-class Industrial Anomaly DetectionCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human StatesCode1
Semantic Correspondence: Unified Benchmarking and a Strong BaselineCode1
ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent FrameworkCode1
The Origins of Representation Manifolds in Large Language ModelsCode1
Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questionsCode1
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic LensCode1
Show:102550
← PrevPage 314 of 9486Next →