SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 39013950 of 177340 papers

TitleStatusHype
MUSt3R: Multi-view Network for Stereo 3D ReconstructionCode3
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model InferenceCode3
Inversion-Free Image Editing with Language-Guided Diffusion ModelsCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
OpenSpiel: A Framework for Reinforcement Learning in GamesCode3
Scikit-fingerprints: easy and efficient computation of molecular fingerprints in PythonCode3
NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance FieldsCode3
CLIMB: Class-imbalanced Learning Benchmark on Tabular DataCode3
Around the World in 80 Timesteps: A Generative Approach to Global Visual GeolocationCode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Meta-Transformer: A Unified Framework for Multimodal LearningCode3
GroundingGPT:Language Enhanced Multi-modal Grounding ModelCode3
Evaluating Large Language Models with fmevalCode3
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A SurveyCode3
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleCode3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsCode3
Rethinking Early Stopping: Refine, Then CalibrateCode3
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent BenchmarkCode3
Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular DataCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-ThoughtCode3
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State SpacesCode3
MVGS: Multi-view-regulated Gaussian Splatting for Novel View SynthesisCode3
Classification Done Right for Vision-Language Pre-TrainingCode3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI SystemsCode3
AutoScraper: A Progressive Understanding Web Agent for Web Scraper GenerationCode3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer DetectionCode3
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsCode3
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU MemoryCode3
A new face swap method for image and video domains: a technical reportCode3
MooER: LLM-based Speech Recognition and Translation Models from Moore ThreadsCode3
Reinforcement Learning Enhanced LLMs: A SurveyCode3
PF3plat: Pose-Free Feed-Forward 3D Gaussian SplattingCode3
RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive SupervisionCode3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics PerceptionCode3
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single PromptCode3
An Imitative Reinforcement Learning Framework for Autonomous DogfightCode3
FusionBench: A Comprehensive Benchmark of Deep Model FusionCode3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language ModelsCode3
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought ReasoningCode3
From Panels to Prose: Generating Literary Narratives from ComicsCode3
TorchCP: A Python Library for Conformal PredictionCode3
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language ProcessingCode3
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRsCode3
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary ResolutionCode3
X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular DesignCode3
Segment Any Medical Model ExtendedCode3
An Image is Worth 32 Tokens for Reconstruction and GenerationCode3
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion ModelsCode3
Show:102550
← PrevPage 79 of 3547Next →