SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2075120800 of 474278 papers

TitleStatusHype
TrustEMG-Net: Using Representation-Masking Transformer with U-Net for Surface Electromyography EnhancementCode1
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical ScenariosCode1
You Know What I'm Saying: Jailbreak Attack via Implicit ReferenceCode1
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-JudgeCode1
High-Efficiency Neural Video Compression via Hierarchical Predictive LearningCode1
SuperGS: Super-Resolution 3D Gaussian Splatting Enhanced by Variational Residual Features and Uncertainty-Augmented LearningCode1
Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile glovesCode1
Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality AssessmentCode1
Enhanced MRI brain tumor detection and classification via topological data analysis and low-rank tensor decompositionCode1
BACKTIME: Backdoor Attacks on Multivariate Time Series ForecastingCode1
Custom Non-Linear Model Predictive Control for Obstacle Avoidance in Indoor and Outdoor EnvironmentsCode1
Spiking Neural Network as Adaptive Event Stream SlicerCode1
Adversarial Decoding: Generating Readable Documents for Adversarial ObjectivesCode1
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time ExpertiseCode1
Med-TTT: Vision Test-Time Training model for Medical Image SegmentationCode1
General Preference Modeling with Preference Representations for Aligning Language ModelsCode1
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human PreferencesCode1
Disentangling Textual and Acoustic Features of Neural Speech RepresentationsCode1
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto FrontCode1
Boosting Masked ECG-Text Auto-Encoders as Discriminative LearnersCode1
Inductive Generative Recommendation via Retrieval-based SpeculationCode1
Long-Sequence Recommendation Models Need Decoupled EmbeddingsCode1
Unleashing the Potential of the Diffusion Model in Few-shot Semantic SegmentationCode1
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AICode1
POSIX: A Prompt Sensitivity Index For Large Language ModelsCode1
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?Code1
Searching for Efficient Linear Layers over a Continuous Space of Structured MatricesCode1
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and ObjectsCode1
ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent CollaborationCode1
Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language ModelsCode1
Training Language Models on Synthetic Edit Sequences Improves Code SynthesisCode1
Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place RecognitionCode1
BadCM: Invisible Backdoor Attack Against Cross-Modal LearningCode1
PixelShuffler: A Simple Image Translation Through Pixel RearrangementCode1
DaWin: Training-free Dynamic Weight Interpolation for Robust AdaptationCode1
Agents' Room: Narrative Generation through Multi-step CollaborationCode1
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text GenerationCode1
Spatial-Temporal Multi-Cuts for Online Multiple-Camera Vehicle TrackingCode1
Parameter Competition Balancing for Model MergingCode1
A New Benchmark In Vivo Paired Dataset for Laparoscopic Image De-smokingCode1
Collective Critics for Creative Story GenerationCode1
Mitigating Memorization In Language ModelsCode1
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesCode1
Immunogenicity Prediction with Dual Attention Enables Vaccine Target SelectionCode1
Agent-Oriented Planning in Multi-Agent SystemsCode1
Lightweight Diffusion Models for Resource-Constrained Semantic CommunicationCode1
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language ModelCode1
EmbedLLM: Learning Compact Representations of Large Language ModelsCode1
MA-RLHF: Reinforcement Learning from Human Feedback with Macro ActionsCode1
SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric GroupsCode1
Show:102550
← PrevPage 416 of 9486Next →