SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 18011850 of 659983 papers

TitleStatusHype
In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing0
GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models0
Deep Hilbert--Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control0
Hyperagents4
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 30
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models1
A Framework for Formalizing LLM Agent Security0
Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids0
Narrative Aligned Long Form Video Question Answering0
Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following0
Any-Subgroup Equivariant Networks via Symmetry Breaking0
ICLAD: In-Context Learning for Unified Tabular Anomaly Detection Across Supervision Regimes0
Teaching an Agent to Sketch One Part at a Time0
Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering0
Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement0
Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks0
Linear Social Choice with Few Queries: A Moment-Based Approach0
FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy0
Learning to Disprove: Formal Counterexample Generation with Large Language Models0
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models0
Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis0
ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding0
Inducing Sustained Creativity and Diversity in Large Language Models0
Recognising BSL Fingerspelling in Continuous Signing Sequences0
SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions0
AURORA: Adaptive Unified Representation for Robust Ultrasound AnalysisCode0
Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social DilemmasCode0
TRACE: Trajectory Recovery with State Propagation Diffusion for Urban MobilityCode0
End-to-End QGAN-Based Image Synthesis via Neural Noise Encoding and Intensity Calibration0
Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework0
Balancing Performance and Fairness in Explainable AI for Anomaly Detection in Distributed Power Plants Monitoring0
Sheaf Neural Networks and biomedical applications0
Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model0
Score Reversal Is Not Free for Quantum Diffusion Models0
To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs0
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders0
Generalization of Long-Range Machine Learning Potentials in Complex Chemical Spaces0
All-in-One Slider for Attribute Manipulation in Diffusion ModelsCode0
PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents0
Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors0
WarPGNN: A Parametric Thermal Warpage Analysis Framework with Physics-aware Graph Neural Network0
Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs0
AU Codes, Language, and Synthesis: Translating Anatomy to Text for Facial Behavior Synthesis0
Student views in AI Ethics and Social Impact0
ITKIT: Feasible CT Image Analysis based on SimpleITK and MMEngine0
Investigating Faithfulness in Large Audio Language Models0
From Workflow Automation to Capability Closure: A Formal Framework for Safe and Revenue-Aware Customer Service AI0
Redundancy-as-Masking: Formalizing the Artificial Age Score (AAS) to Model Memory Aging in Generative AI0
Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework0
REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation0
Show:102550
← PrevPage 37 of 13200Next →