SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1645116500 of 474278 papers

TitleStatusHype
Breaking the Data Barrier -- Building GUI Agents Through Task GeneralizationCode1
BO-SA-PINNs: Self-adaptive physics-informed neural networks based on Bayesian optimization for automatically designing PDE solversCode1
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languagesCode1
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?Code1
Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image RestorationCode1
LEMUR Neural Network Dataset: Towards Seamless AutoMLCode1
Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative FilteringCode1
ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language ModelsCode1
Invariance Matters: Empowering Social Recommendation via Graph Invariant LearningCode1
Hearing Anywhere in Any EnvironmentCode1
SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical ImagingCode1
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese DishesCode1
GenTe: Generative Real-world Terrains for General Legged Robot Locomotion ControlCode1
Enhanced Semantic Extraction and Guidance for UGC Image Super ResolutionCode1
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Address Multimodal HallucinationCode1
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem SolutionsCode1
Better Estimation of the KL Divergence Between Language ModelsCode1
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text SpottingCode1
CHARM: Calibrating Reward Models With Chatbot Arena ScoresCode1
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-trainingCode1
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025Code1
Uncertainty Guided Refinement for Fine-Grained Salient Object DetectionCode1
A Survey on Efficient Vision-Language ModelsCode1
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language ModelsCode1
Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluationCode1
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak DefenderCode1
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing WorkflowCode1
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health SafetyCode1
Integrating Textual Embeddings from Contrastive Learning with Generative Recommender for Enhanced PersonalizationCode1
Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics SimulationsCode1
CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion RecognitionCode1
BioChemInsight: An Open-Source Toolkit for Automated Identification and Recognition of Optical Chemical Structures and Activity Data in Scientific PublicationsCode1
NetTAG: A Multimodal RTL-and-Layout-Aligned Netlist Foundation Model via Text-Attributed GraphCode1
Beyond Degradation Conditions: All-in-One Image Restoration via HOG TransformersCode1
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference TimeCode1
Parameterized Synthetic Text Generation with SimpleStoriesCode1
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End SystemCode1
RT-DATR:Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature LearningCode1
On Oversquashing in Graph Neural Networks Through the Lens of Dynamical SystemsCode1
SN-LiDAR: Semantic Neural Fields for Novel Space-time View LiDAR SynthesisCode1
Mimic In-Context Learning for Multimodal TasksCode1
PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface ReconstructionCode1
F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from VideosCode1
Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent TasksCode1
Towards generalizable single-cell perturbation modeling via the Conditional Monge GapCode1
Single View Garment Reconstruction Using Diffusion Mapping Via Pattern CoordinatesCode1
MooseAgent: A LLM Based Multi-agent Framework for Automating Moose SimulationCode1
Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-TrainingCode1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical ImagingCode1
Show:102550
← PrevPage 330 of 9486Next →