SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1875118800 of 474278 papers

TitleStatusHype
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
Higher-Order Responsibility0
CoBRA: Quantifying Strategic Language Use and LLM PragmaticsCode0
PromptVFX: Text-Driven Fields for Open-World 3D Gaussian AnimationCode0
Mamba Drafters for Speculative Decoding0
A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems0
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models0
FedRPCA: Enhancing Federated LoRA Aggregation Using Robust PCA0
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response TheoryCode1
Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland SwedishCode0
Graph Neural Networks for Jamming Source LocalizationCode0
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent CollaborationCode5
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering TasksCode0
Will Agents Replace Us? Perceptions of Autonomous Multi-Agent AICode0
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity AwarenessCode0
Learning DNF through Generalized Fourier Representations0
No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural NetworksCode0
Unfolding Boxes with Local ConstraintsCode0
RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation SystemsCode0
CAPAA: Classifier-Agnostic Projector-Based Adversarial AttackCode0
Regulatory Graphs and GenAI for Real-Time Transaction Monitoring and Compliance Explanation in Banking0
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access BookCode0
Quantization-based Bounds on the Wasserstein Metric0
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real WorldCode1
How Neural Networks Organize Concepts: Introducing Concept Trajectory Analysis for Deep Learning InterpretabilityCode0
Adapting General-Purpose Embedding Models to Private Datasets Using Keyword-based RetrievalCode0
Concept-Centric Token Interpretation for Vector-Quantized Generative ModelsCode0
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart GenerationCode0
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn0
iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection0
Revisiting LLMs as Zero-Shot Time-Series Forecasters: Small Noise Can Break Large ModelsCode0
LoRA as a Flexible Framework for Securing Large Vision Systems0
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA0
ARIA: Training Language Agents with Intention-Driven Reward Aggregation0
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent SystemsCode0
An application of machine learning to the motion response prediction of floating assets0
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval0
Organizational Adaptation to Generative AI in Cybersecurity: A Systematic Review0
CineMA: A Foundation Model for Cine Cardiac MRICode2
The Coupling Effect of Sensing Targets on the Environment for 3GPP ISAC Channels: Observation, Modeling, and Validation0
Look mom, no experimental data! Learning to score protein-ligand interactions from simulationsCode1
ABCDEFGH: An Adaptation-Based Convolutional Neural Network-CycleGAN Disease-Courses Evolution Framework Using Generative Models in Health EducationCode0
Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free0
MR2US-Pro: Prostate MR to Ultrasound Image Translation and Registration Based on Diffusion Models0
Image Restoration Learning via Noisy Supervision in the Fourier Domain0
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors0
Integrated Sensing, Computing and Semantic Communication for Vehicular Networks0
A Family of Robust Generalized Adaptive Filters and Application for Time-series Prediction0
Power-of-Two (PoT) Weights in Large Language Models (LLMs)0
Active Learning via Regression Beyond Realizability0
Show:102550
← PrevPage 376 of 9486Next →