SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1155111600 of 661570 papers

TitleStatusHype
InstMeter: An Instruction-Level Method to Predict Energy and Latency of DL Model Inference on MCUs0
Scalable Second-order Riemannian Optimization for K-means Clustering0
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning0
ceLLMate: Sandboxing Browser AI Agents0
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque0
Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM0
Test Case Prioritization: A Snowballing Literature Review and TCPFramework with Approach Combinators0
Harmonic Dataset Distillation for Time Series Forecasting0
Vector-Quantized Soft Label Compression for Dataset Distillation0
Large-Margin Hyperdimensional Computing: A Learning-Theoretical Perspective0
Yolo-Key-6D: Single Stage Monocular 6D Pose Estimation with Keypoint Enhancements0
Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning0
Exploiting Subgradient Sparsity in Max-Plus Neural Networks0
Bridging Pedagogy and Play: Introducing a Language Mapping Interface for Human-AI Co-Creation in Educational Game Design0
GeoSeg: Training-Free Reasoning-Driven Segmentation in Remote Sensing Imagery0
Who Judges the Judge? Evaluating LLM-as-a-Judge for French Medical open-ended QA0
A Consensus-Bayesian Framework for Detecting Malicious Activity in Enterprise Directory Access Graphs0
Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation0
A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning0
Factuality Matters: When Image Generation and Editing Meet Structured Visuals1
It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks0
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts0
Weight Space Representation Learning via Neural Field Adaptation0
Extending Neural Operators: Robust Handling of Functions Beyond the Training Set0
EvalMVX: A Unified Benchmarking for Neural 3D Reconstruction under Diverse Multiview Setups0
Architecture and evaluation protocol for transformer-based visual object tracking in UAV applications0
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects0
A Multi-Agent Framework for Interpreting Multivariate Physiological Time Series0
Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme0
Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt LearningCode0
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?Code0
Non-Collaborative User Simulators for Tool AgentsCode0
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-PlayCode0
Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the LoopCode0
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB SegmentationCode0
Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI AutomationCode0
Soft Quality-Diversity OptimizationCode0
MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly DetectionCode0
Specificity-aware reinforcement learning for fine-grained open-world classificationCode0
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?Code0
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and ExploitationCode0
Relational In-Context Learning via Synthetic Pre-training with Structural PriorCode0
From Misclassifications to Outliers: Joint Reliability Assessment in ClassificationCode0
DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic MappingCode0
Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly DetectionCode0
Discriminative Perception via Anchored Description for Reasoning SegmentationCode0
LifeBench: A Benchmark for Long-Horizon Multi-Source MemoryCode0
Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPsCode0
RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented GenerationCode0
MeanFlowSE: one-step generative speech enhancement via conditional mean flowCode0
Show:102550
← PrevPage 232 of 13232Next →