SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 30013050 of 659983 papers

TitleStatusHype
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and ReasoningCode0
A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems0
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities0
Conservative Continuous-Time Treatment Optimization0
Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots0
Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry0
Evaluating Ill-Defined Tasks in Large Language Models0
Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones0
Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement0
Ontological foundations for contrastive explanatory narration of robot plans0
VQKV: High-Fidelity and High-Ratio Cache Compression via Vector-Quantization0
TempCore: Are Video QA Benchmarks Temporally Grounded? A Frame Selection Sensitivity Analysis and Benchmark0
Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy0
What DINO saw: ALiBi positional encoding reduces positional bias in Vision Transformers0
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs0
From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation0
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement0
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning0
When Machine Learning Gets Personal: Evaluating Prediction and Explanation0
Feature Attribution in 5G Intrusion Detection: A Statistical vs. Logic-Based Comparison0
WildCap: Facial Albedo Capture in the Wild via Hybrid Inverse Rendering0
LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning0
Representing Beauty: Towards a Participatory but Objective Latent Aesthetics0
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators0
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems0
Strategic Costs of Perceived Bias in Fair Selection0
Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models0
S2WMamba: A Wavelet-Assisted Mamba-Based Dual-Branch Network For Pansharpening0
Analyzing Planner Design Trade-offs for MAPF under ADG-based Realistic Execution0
On Geometric Understanding and Learned Priors in Feed-forward 3D Reconstruction Models0
Toward Better Temporal Structures for Geopolitical Events Forecasting0
A Novel Patch-Based TDA Approach for Computed Tomography Imaging0
DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model0
Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning0
Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles0
Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency LossCode0
SentGraph: Hierarchical Sentence Graph for Multi-hop Retrieval-Augmented Question Answering0
Aletheia: What Makes RLVR For Code Verifiers Tick?0
VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration0
Think3D: Thinking with Space for Spatial ReasoningCode0
Building a Correct-by-Design Lakehouse. Data Contracts, Versioning, and Transactional Pipelines for Humans and Agents0
LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models0
Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows0
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking0
Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns0
Ask don't tell: Reducing sycophancy in large language models0
Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation0
Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework0
Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video0
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models0
Show:102550
← PrevPage 61 of 13200Next →