SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1090110950 of 661570 papers

TitleStatusHype
Harnessing Administrative Data Inventories to Create a Reliable Transnational Reference Database for Crop Type MonitoringCode2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math ReasoningCode2
Compressing Context to Enhance Inference Efficiency of Large Language ModelsCode2
Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGsCode2
Distributional Soft Actor-Critic with Three RefinementsCode2
OptiMUS: Optimization Modeling Using MIP Solvers and large language modelsCode2
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion ModelsCode2
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editingCode2
Generative Judge for Evaluating AlignmentCode2
HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face GenerationCode2
Humanoid Agents: Platform for Simulating Human-like Generative AgentsCode2
Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud RegistrationCode2
Interpreting CLIP's Image Representation via Text-Based DecompositionCode2
ZooPFL: Exploring Black-box Foundation Models for Personalized Federated LearningCode2
ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot CoordinationCode2
Fast protein backbone generation with SE(3) flow matchingCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
Crystal-GFN: sampling crystals with desirable properties and constraintsCode2
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task DatasetsCode2
Towards Foundation Models for Knowledge Graph ReasoningCode2
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs TrainingCode2
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth EstimatorsCode2
GoLLIE: Annotation Guidelines improve Zero-Shot Information-ExtractionCode2
Aligning Text-to-Image Diffusion Models with Reward BackpropagationCode2
Smoothing Methods for Automatic Differentiation Across Conditional BranchesCode2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Code2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MLAgentBench: Evaluating Language Agents on Machine Learning ExperimentationCode2
FreshLLMs: Refreshing Large Language Models with Search Engine AugmentationCode2
SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3DCode2
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR ModelsCode2
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode2
Ring Attention with Blockwise Transformers for Near-Infinite ContextCode2
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language ModelsCode2
SE(3)-Stochastic Flow Matching for Protein Backbone GenerationCode2
ACE: A fast, skillful learned global atmospheric model for climate predictionCode2
Can large language models provide useful feedback on research papers? A large-scale empirical analysisCode2
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingCode2
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative VokensCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of CodeCode2
Controlling Vision-Language Models for Multi-Task Image RestorationCode2
Quantifying the Plausibility of Context Reliance in Neural Machine TranslationCode2
You Only Look at Once for Real-time and Generic Multi-TaskCode2
GPT-Driver: Learning to Drive with GPTCode2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
Making LLaMA SEE and Draw with SEED TokenizerCode2
GRID: A Platform for General Robot Intelligence DevelopmentCode2
GenSim: Generating Robotic Simulation Tasks via Large Language ModelsCode2
Show:102550
← PrevPage 219 of 13232Next →