SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 45514575 of 661570 papers

TitleStatusHype
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic0
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation3
Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges0
Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models0
Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching0
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and ReasoningCode0
A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems0
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities0
Conservative Continuous-Time Treatment Optimization0
Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots0
Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry0
Evaluating Ill-Defined Tasks in Large Language Models0
Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones0
Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement0
Ontological foundations for contrastive explanatory narration of robot plans0
VQKV: High-Fidelity and High-Ratio Cache Compression via Vector-Quantization0
TempCore: Are Video QA Benchmarks Temporally Grounded? A Frame Selection Sensitivity Analysis and Benchmark0
Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy0
What DINO saw: ALiBi positional encoding reduces positional bias in Vision Transformers0
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs0
From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation0
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement0
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning0
When Machine Learning Gets Personal: Evaluating Prediction and Explanation0
Feature Attribution in 5G Intrusion Detection: A Statistical vs. Logic-Based Comparison0
Show:102550
← PrevPage 183 of 26463Next →