SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1670116750 of 474278 papers

TitleStatusHype
MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated CurvatureCode0
Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced ReportsCode0
IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)Code0
Agile Reinforcement Learning for Real-Time Task Scheduling in Edge ComputingCode0
InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation AnalysisCode0
On Finetuning Tabular Foundation ModelsCode1
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video StreamsCode2
Image Demoiréing Using Dual Camera Fusion on Mobile PhonesCode1
SAMSelect: A Spectral Index Search for Marine Debris Visualization using Segment AnythingCode0
HSG-12M: A Large-Scale Spatial Multigraph DatasetCode1
EtiCor++: Towards Understanding Etiquettical Bias in LLMsCode0
On Reasoning Strength Planning in Large Reasoning ModelsCode1
Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal DiscoveryCode0
Offline RL with Smooth OOD Generalization in Convex Hull and its NeighborhoodCode0
Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem SolvingCode0
Time-Aware World Model for Adaptive Prediction and ControlCode0
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPOCode0
Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated DataCode0
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement LearningCode2
ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific ChartsCode0
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usabilityCode2
Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of EmbeddingsCode0
Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion RecognitionCode0
Do MIL Models Transfer?Code2
A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck0
JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare DatasetCode0
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed0
Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest0
Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification0
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
MagCache: Fast Video Generation with Magnitude-Aware CacheCode3
ArrowPose: Segmentation, Detection, and 5 DoF Pose Estimation Network for Colorless Point Clouds0
LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4RTX 4090s0
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
PlantBert: An Open Source Language Model for Plant Science0
CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models0
Improved Scaling Laws in Linear Regression via Data Reuse0
RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step RetrievalCode0
midr: Learning from Black-Box Models by Maximum Interpretation DecompositionCode0
Dialect Normalization using Large Language Models and Morphological RulesCode0
Learnable Spatial-Temporal Positional Encoding for Link PredictionCode0
PropMEND: Hypernetworks for Knowledge Propagation in LLMsCode0
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand BetterCode2
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language ModelsCode0
Olica: Efficient Structured Pruning of Large Language Models without RetrainingCode0
The Decoupled Risk Landscape in Performative PredictionCode0
Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label DistillationCode0
VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward MechanismCode0
Solving excited states for long-range interacting trapped ions with neural networks0
Factors affecting the in-context learning abilities of LLMs for dialogue state tracking0
Show:102550
← PrevPage 335 of 9486Next →