SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1600116050 of 474278 papers

TitleStatusHype
Rethinking Repetition Problems of LLMs in Code GenerationCode1
Learned Lightweight Smartphone ISP with Unpaired DataCode1
Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G NetworksCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
Sparse Point Cloud Patches Rendering via Splitting 2D GaussiansCode1
Empirical Performance Evaluation of Lane Keeping Assist on Modern Production VehiclesCode1
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological AssessmentCode1
OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving ConditionsCode1
BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video CompressionCode1
EDBench: Large-Scale Electron Density Data for Molecular ModelingCode1
Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural NetworksCode1
Bridging Human Oversight and Black-box Driver Assistance: Vision-Language Models for Predictive Alerting in Lane Keeping Assist SystemsCode1
TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud GenerationCode1
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language ModelsCode1
Online Isolation ForestCode1
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMsCode1
UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband UnitsCode1
InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional MaterialsCode1
Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batchesCode1
Evaluation in EEG Emotion Recognition: State-of-the-Art Review and Unified FrameworkCode1
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys PlatformCode1
Analog Foundation ModelsCode1
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken LearningCode1
AdaFortiTran: An Adaptive Transformer Model for Robust OFDM Channel EstimationCode1
Introducing voice timbre attribute detectionCode1
Towards scalable surrogate models based on Neural Fields for large scale aerodynamic simulationsCode1
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial DataCode1
TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery DetectionCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Foundation Models Knowledge Distillation For Battery Capacity Degradation ForecastCode1
Large Language Models for Computer-Aided Design: A SurveyCode1
WaveGuard: Robust Deepfake Detection and Source Tracing via Dual-Tree Complex Wavelet and Graph Neural NetworksCode1
ADC-GS: Anchor-Driven Deformable and Compressed Gaussian Splatting for Dynamic Scene ReconstructionCode1
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term TrackingCode1
TiMo: Spatiotemporal Foundation Model for Satellite Image Time SeriesCode1
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUsCode1
Total Variation-Based Image Decomposition and Denoising for Microscopy ImagesCode1
Rejoining fragmented ancient bamboo slips with physics-driven deep learningCode1
PrePrompt: Predictive prompting for class incremental learningCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
Domain Knowledge Integrated CNN-xLSTM-xAtt Network with Multi Stream Feature Fusion for Cuffless Blood Pressure Estimation from Photoplethysmography SignalsCode1
Hyperbolic Contrastive Learning with Model-augmentation for Knowledge-aware RecommendationCode1
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language ModelsCode1
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM OutputsCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
Codifying Character Logic in Role-PlayingCode1
Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring DialogueCode1
Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold NetworksCode1
Chronocept: Instilling a Sense of Time in MachinesCode1
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language ModelsCode1
Show:102550
← PrevPage 321 of 9486Next →