SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1815118200 of 474278 papers

TitleStatusHype
Pruning Everything, Everywhere, All at OnceCode0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation0
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals0
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch0
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective0
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design0
Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification0
Privacy and Security Threat for OpenAI GPTs0
Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets0
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems0
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents0
Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability0
Crowd-SFT: Crowdsourcing for LLM Alignment0
Preface to the Special Issue of the TAL Journal on Scholarly Document Processing0
Does Prompt Design Impact Quality of Data Imputation by LLMs?0
Photoreal Scene Reconstruction from an Egocentric DeviceCode2
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian SplattingCode1
Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition ModelsCode0
Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical FoundationsCode0
Watermarking Degrades Alignment in Language Models: Analysis and MitigationCode0
TracLLM: A Generic Framework for Attributing Long Context LLMsCode1
POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation LearningCode0
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural PerspectiveCode0
Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental GridCode0
RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image DetectorsCode0
TextAtari: 100K Frames Game Playing with Language AgentsCode0
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness0
An Expansion-Based Approach for Quantified Integer ProgrammingCode0
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering HateCode0
Gradient Inversion Attacks on Parameter-Efficient Fine-TuningCode0
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object DetectorCode1
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences0
ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding0
ViTSGMM: A Robust Semi-Supervised Image Recognition Network Using Sparse LabelsCode0
CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications0
VLMs Can Aggregate Scattered Training PatchesCode1
Facial Appearance Capture at Home with Patch-Level Reflectance PriorCode2
TokAlign: Efficient Vocabulary Adaptation via Token AlignmentCode1
HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and BenchmarkCode3
CHEER-Ekman: Fine-grained Embodied Emotion ClassificationCode0
Multi-level Mixture of Experts for Multimodal Entity LinkingCode0
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid MotionsCode1
Contrast & Compress: Learning Lightweight Embeddings for Short Trajectories0
Investigating Quantum Feature Maps in Quantum Support Vector Machines for Lung Cancer Classification0
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework0
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety VerificationCode0
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions0
Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation0
DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization0
A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems0
Show:102550
← PrevPage 364 of 9486Next →