SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 40014050 of 661570 papers

TitleStatusHype
HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and BenchmarkCode3
Motion Anything: Any to Motion GenerationCode3
RAP-SAM: Towards Real-Time All-Purpose Segment AnythingCode3
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive ReasoningCode3
EnvGS: Modeling View-Dependent Appearance with Environment GaussianCode3
A Survey on Data Selection for Language ModelsCode3
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video GenerationCode3
A Survey on Deep Learning for Theorem ProvingCode3
APOLLO: SGD-like Memory, AdamW-level PerformanceCode3
MUSt3R: Multi-view Network for Stereo 3D ReconstructionCode3
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model InferenceCode3
Inversion-Free Image Editing with Language-Guided Diffusion ModelsCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
OpenSpiel: A Framework for Reinforcement Learning in GamesCode3
Scikit-fingerprints: easy and efficient computation of molecular fingerprints in PythonCode3
NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance FieldsCode3
CLIMB: Class-imbalanced Learning Benchmark on Tabular DataCode3
Around the World in 80 Timesteps: A Generative Approach to Global Visual GeolocationCode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Meta-Transformer: A Unified Framework for Multimodal LearningCode3
GroundingGPT:Language Enhanced Multi-modal Grounding ModelCode3
Evaluating Large Language Models with fmevalCode3
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A SurveyCode3
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at ScaleCode3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsCode3
Rethinking Early Stopping: Refine, Then CalibrateCode3
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent BenchmarkCode3
Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular DataCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-ThoughtCode3
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State SpacesCode3
MVGS: Multi-view-regulated Gaussian Splatting for Novel View SynthesisCode3
Classification Done Right for Vision-Language Pre-TrainingCode3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI SystemsCode3
AutoScraper: A Progressive Understanding Web Agent for Web Scraper GenerationCode3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer DetectionCode3
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsCode3
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU MemoryCode3
A new face swap method for image and video domains: a technical reportCode3
MooER: LLM-based Speech Recognition and Translation Models from Moore ThreadsCode3
Reinforcement Learning Enhanced LLMs: A SurveyCode3
PF3plat: Pose-Free Feed-Forward 3D Gaussian SplattingCode3
RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive SupervisionCode3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics PerceptionCode3
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single PromptCode3
An Imitative Reinforcement Learning Framework for Autonomous DogfightCode3
FusionBench: A Comprehensive Benchmark of Deep Model FusionCode3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language ModelsCode3
Show:102550
← PrevPage 81 of 13232Next →