SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 69517000 of 177340 papers

TitleStatusHype
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation ModelsCode2
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent CollaborationCode2
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view CamerasCode2
MedM-VL: What Makes a Good Medical LVLM?Code2
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion ModelsCode2
All for One and One for All: Improving Music Separation by Bridging NetworksCode2
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and RestorationCode2
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent SystemsCode2
Mixture of LoRA ExpertsCode2
Neighboring Autoregressive Modeling for Efficient Visual GenerationCode2
The Calysto Scheme ProjectCode2
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language ModelsCode2
Exploring Plain Vision Transformer Backbones for Object DetectionCode2
Twin-Merging: Dynamic Integration of Modular Expertise in Model MergingCode2
Hidden Biases of End-to-End Driving ModelsCode2
LaserMix for Semi-Supervised LiDAR Semantic SegmentationCode2
IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source LocalizationCode2
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal ModelingCode2
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation EngineeringCode2
SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep ReconstructionCode2
Can Language Models Solve Olympiad Programming?Code2
Improving Autoformalization using Type CheckingCode2
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event DetectionCode2
Masked Autoencoders for Point Cloud Self-supervised LearningCode2
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation LearningCode2
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIPCode2
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion PredictionCode2
ZooPFL: Exploring Black-box Foundation Models for Personalized Federated LearningCode2
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play DeformationCode2
Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object TrackingCode2
Attention Concatenation Volume for Accurate and Efficient Stereo MatchingCode2
Crafting Interpretable Embeddings by Asking LLMs QuestionsCode2
PodAgent: A Comprehensive Framework for Podcast GenerationCode2
FLAME: Financial Large-Language Model Assessment and Metrics EvaluationCode2
Octopus: Embodied Vision-Language Programmer from Environmental FeedbackCode2
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented GenerationCode2
Learning Human-Inspired Force Strategies for Robotic AssemblyCode2
Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed ObservationsCode2
MFTCoder: Boosting Code LLMs with Multitask Fine-TuningCode2
When is Tree Search Useful for LLM Planning? It Depends on the DiscriminatorCode2
ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingCode2
Learning to Prompt for Vision-Language ModelsCode2
EmoFace: Audio-driven Emotional 3D Face AnimationCode2
OmniBench: Towards The Future of Universal Omni-Language ModelsCode2
ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series DataCode2
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept ExtractionCode2
InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction FeaturesCode2
Specializing Smaller Language Models towards Multi-Step ReasoningCode2
Show:102550
← PrevPage 140 of 3547Next →