SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1785117900 of 474278 papers

TitleStatusHype
Towards provable probabilistic safety for scalable embodied AI systems0
Learning to Plan via Supervised Contrastive Learning and Strategic Interpolation: A Chess Case StudyCode0
Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards0
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning BenchmarkCode7
Distributional encoding for Gaussian process regression with qualitative inputs0
Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer0
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal ReasoningCode0
Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data0
Causal Effect Identification in lvLiNGAM from Higher-Order CumulantsCode0
Invisible Backdoor Triggers in Image Editing Model via Deep WatermarkingCode0
LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff TableCode1
LLMs for sensory-motor control: Combining in-context and iterative learningCode0
Unsupervised Machine Learning for Scientific Discovery: Workflow and Best PracticesCode0
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation ModelCode1
Spatiotemporal Contrastive Learning for Cross-View Video Localization in Unstructured Off-road Terrains0
Cloud-Based Interoperability in Residential Energy Systems0
Feature-Based Lie Group Transformer for Real-World Applications0
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models0
Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations0
Toward Better SSIM Loss for Unsupervised Monocular Depth Estimation0
A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair0
NIMO: a Nonlinear Interpretable MOdel0
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
SupeRANSAC: One RANSAC to Rule Them AllCode3
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation DecodersCode0
HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place RecognitionCode0
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs0
Membership Inference Attacks on Sequence Models0
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets0
Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap0
Robust Few-Shot Vision-Language Model Adaptation0
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation0
On Automating Security Policies with Contemporary LLMs0
Urania: Differentially Private Insights into AI Use0
Clustering and Median Aggregation Improve Differentially Private Inference0
Intentionally Unintentional: GenAI Exceptionalism and the First Amendment0
Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems0
TQml Simulator: Optimized Simulation of Quantum Machine Learning0
Handle-based Mesh Deformation Guided By Vision Language Model0
VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection0
Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning0
Privacy Amplification Through Synthetic Data: Insights from Linear Regression0
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information RetrievalCode0
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian SplattingCode2
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia GamesCode1
Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query ExpansionCode0
Sparse Autoencoders, Again?0
Show:102550
← PrevPage 358 of 9486Next →