SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 98769900 of 474278 papers

TitleStatusHype
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations ModelingCode2
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion ModelsCode2
A Touch, Vision, and Language Dataset for Multimodal AlignmentCode2
StyleDubber: Towards Multi-Scale Style Learning for Movie DubbingCode2
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken ConversationsCode2
Me LLaMA: Foundation Large Language Models for Medical ApplicationsCode2
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMsCode2
Universal Physics Transformers: A Framework For Efficiently Scaling Neural OperatorsCode2
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMsCode2
EmoBench: Evaluating the Emotional Intelligence of Large Language ModelsCode2
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language ModelsCode2
UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion ModelsCode2
Event-Based Motion MagnificationCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set RelationshipsCode2
CausalGym: Benchmarking causal interpretability methods on linguistic tasksCode2
EVOR: Evolving Retrieval for Code GenerationCode2
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language ModelsCode2
Generative Semi-supervised Graph Anomaly DetectionCode2
Pan-Mamba: Effective pan-sharpening with State Space ModelCode2
The Revolution of Multimodal Large Language Models: A SurveyCode2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticCode2
Spatio-Temporal Few-Shot Learning via Diffusive Neural Network GenerationCode2
Reformatted AlignmentCode2
A Critical Evaluation of AI Feedback for Aligning Large Language ModelsCode2
Show:102550
← PrevPage 396 of 18972Next →