SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1995120000 of 474278 papers

TitleStatusHype
Navigating the Latent Space Dynamics of Neural Models0
CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
Highly Efficient and Effective LLMs with Multi-Boolean Architectures0
Structured Memory Mechanisms for Stable Context Representation in Large Language Models0
What Has Been Lost with Synthetic Evaluation?0
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room0
Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs0
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models0
Enhancing Study-Level Inference from Clinical Trial Papers via RL-based Numeric Reasoning0
Design and testing of an agent chatbot supporting decision making with public transport data0
Predicting Human Depression with Hybrid Data Acquisition utilizing Physical Activity Sensing and Social Media Feeds0
From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration0
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack0
Event-based Egocentric Human Pose Estimation in Dynamic Environment0
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning0
InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective0
OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning0
Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective0
Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing0
Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification0
TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation0
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High0
A Survey on Training-free Open-Vocabulary Semantic Segmentation0
Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval0
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models0
Let Them Talk: Audio-Driven Multi-Person Conversational Video GenerationCode7
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework DesignCode1
Improving Continual Pre-training Through Seamless Data PackingCode0
Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint QueriesCode0
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task LearningCode0
StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association AnalysisCode0
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue EvaluatorsCode0
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language InferenceCode0
OWL: Probing Cross-Lingual Recall of Memorized Texts via World LiteratureCode0
When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?Code0
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model MergingCode0
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration NetworkCode0
ASyMOB: Algebraic Symbolic Mathematical Operations BenchmarkCode0
Adapting Segment Anything Model for Power Transmission Corridor Hazard SegmentationCode0
The WHY in Business Processes: Unification of Causal Process Models0
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects0
Talent or Luck? Evaluating Attribution Bias in Large Language ModelsCode0
Towards a More Generalized Approach in Open Relation ExtractionCode0
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models0
Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
Advancing Multimodal Reasoning via Reinforcement Learning with Cold StartCode1
How Do Diffusion Models Improve Adversarial Robustness?0
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting0
Show:102550
← PrevPage 400 of 9486Next →