SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 401450 of 658356 papers

TitleStatusHype
PowerPM: Foundation Model for Power SystemsCode7
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesCode7
Segment Anything in Medical Images and Videos: Benchmark and DeploymentCode7
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative PretrainingCode7
Global Structure-from-Motion RevisitedCode7
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection TransformerCode7
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
Stable Audio OpenCode7
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse DomainsCode7
Qwen2-Audio Technical ReportCode7
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark ConditionsCode7
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal ModelsCode7
MambaVision: A Hybrid Mamba-Transformer Vision BackboneCode7
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning MethodsCode7
Agentless: Demystifying LLM-based Software Engineering AgentsCode7
ColPali: Efficient Document Retrieval with Vision Language ModelsCode7
RouteLLM: Learning to Route LLMs with Preference DataCode7
BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGOCode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
EAGLE-2: Faster Inference of Language Models with Dynamic Draft TreesCode7
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence SegmentationCode7
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and BenchmarkingCode7
Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)Code7
DataComp-LM: In search of the next generation of training sets for language modelsCode7
Grounding Image Matching in 3D with MASt3RCode7
MeshAnything: Artist-Created Mesh Generation with Autoregressive TransformersCode7
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference FeedbackCode7
TextGrad: Automatic "Differentiation" via TextCode7
Mixture-of-Agents Enhances Large Language Model CapabilitiesCode7
M&M VTO: Multi-Garment Virtual Try-On and EditingCode7
The Prompt Report: A Systematic Survey of Prompting TechniquesCode7
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiTCode7
Scalable MatMul-free Language ModelingCode7
Seed-TTS: A Family of High-Quality Versatile Speech Generation ModelsCode7
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text EmbeddingCode7
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single ImageCode7
TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRICode7
Adaptive In-conversation Team Building for Language Model AgentsCode7
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer ArchitectureCode7
PromptWizard: Task-Aware Prompt Optimization FrameworkCode7
Efficient multi-prompt evaluation of LLMsCode7
Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityCode7
The Road Less ScheduledCode7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language ModelsCode7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM TrainingCode7
Learning Multi-dimensional Human Preference for Text-to-Image GenerationCode7
Dynamic data sampler for cross-language transfer learning in large language modelsCode7
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionCode7
Chameleon: Mixed-Modal Early-Fusion Foundation ModelsCode7
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language ModelsCode7
Show:102550
← PrevPage 9 of 13168Next →