SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 98519900 of 661570 papers

TitleStatusHype
GS-2M: Material-aware Gaussian Splatting for High-fidelity Mesh Reconstruction0
Interpretable Maximum Margin Deep Anomaly Detection0
Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation0
Conditional Rank-Rank Regression via Deep Conditional Transformation Models0
Idiom Understanding as a Tool to Measure the Dialect Gap0
Adaptive Discovery of Interpretable Audio Attributes with Multimodal LLMs for Low-Resource Classification0
Retrieval-Augmented Multi-scale Framework for County-Level Crop Yield Prediction Across Large Regions0
Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions0
Bi-directional digital twin prototype anchoring with multi-periodicity learning for few-shot fault diagnosis0
Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network0
Fast and Flexible Audio Bandwidth Extension via Vocos0
Retrieval-Augmented Generation for Predicting Cellular Responses to Gene PerturbationCode0
ECHO: Frequency-aware Hierarchical Encoding for Variable-length SignalsCode0
S^2Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token DistillationCode0
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention SparsificationCode0
Self-Supervised Multi-Modal World Model with 4D Space-Time EmbeddingCode0
Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVRCode0
The Model Knows Which Tokens Matter: Automatic Token Selection via Noise GatingCode0
To Predict or Not to Predict? Towards reliable uncertainty estimation in the presence of noiseCode0
Learning Concept Bottleneck Models from Mechanistic ExplanationsCode0
Empowering Microscopic Traffic Simulators with Realistic Perception using Surrogate Sensor ModelsCode0
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token SchedulingCode0
Benchmark Leakage Trap: Can We Trust LLM-based Recommendation?Code0
Can a Lightweight Automated AI Pipeline Solve Research-Level Mathematical Problems?Code0
PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography MeasurementCode0
Batch-of-Thought: Cross-Instance Learning for Enhanced LLM ReasoningCode0
From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language ModelsCode0
AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-JudgeCode0
OV-DEIM: Real-time DETR-Style Open-Vocabulary Object Detection with GridSynthetic AugmentationCode0
MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation SteeringCode0
Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning OptimizersCode0
Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural LanguageCode0
PDD: Manifold-Prior Diverse Distillation for Medical Anomaly DetectionCode0
CanoVerse: 3D Object Scalable Canonicalization and Dataset for Generation and PoseCode0
Variational Flow Maps: Make Some Noise for One-Step Conditional GenerationCode0
A Component-Based Survey of Interactions between Large Language Models and Multi-Armed BanditsCode0
MipSLAM: Alias-Free Gaussian Splatting SLAMCode0
Rethinking Driving World Model as Synthetic Data Generator for Perception TasksCode0
WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image RetrievalCode0
Quantized Visual Geometry Grounded TransformerCode0
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing3
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving1
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning2
FinSheet-Bench: From Simple Lookups to Complex Reasoning, Where LLMs Break on Financial Spreadsheets0
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning0
Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion0
How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty DetectionCode0
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift0
Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels0
MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?0
Show:102550
← PrevPage 198 of 13232Next →