SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2030120350 of 474278 papers

TitleStatusHype
Steering Your Generalists: Improving Robotic Foundation Models via Value GuidanceCode1
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided DecodingCode1
Interpreting Temporal Graph Neural Networks with Koopman TheoryCode1
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation LearningCode1
ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain ExpertiseCode1
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-GuidanceCode1
LESS: Label-Efficient and Single-Stage Referring 3D SegmentationCode1
Starbucks: Improved Training for 2D Matryoshka EmbeddingsCode1
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in TransformersCode1
Reward-free World Models for Online Imitation LearningCode1
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation SystemsCode1
Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimizationCode1
TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change AwarenessCode1
RAMPA: Robotic Augmented Reality for Machine Programming by DemonstrAtionCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote SensingCode1
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMsCode1
PORTAL: Scalable Tabular Foundation Models via Content-Specific TokenizationCode1
DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene RenderingCode1
FIRE: Fact-checking with Iterative Retrieval and VerificationCode1
Diffusing States and Matching Scores: A New Framework for Imitation LearningCode1
EP-SAM: Weakly Supervised Histopathology Segmentation via Enhanced Prompt with Segment AnythingCode1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
Can MLLMs Understand the Deep Implication Behind Chinese Images?Code1
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMsCode1
Learning Graph Quantized TokenizersCode1
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view ImagesCode1
A Simulation System Towards Solving Societal-Scale ManipulationCode1
Preference Diffusion for RecommendationCode1
Looking Inward: Language Models Can Learn About Themselves by IntrospectionCode1
Interpret and Control Dense Retrieval with Sparse Latent FeaturesCode1
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided DiffusionCode1
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web NavigationCode1
Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement LearningCode1
Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual KnowledgeCode1
CREAM: Consistency Regularized Self-Rewarding Language ModelsCode1
Rethinking Token Reduction for State Space ModelsCode1
FragNet: A Graph Neural Network for Molecular Property Prediction with Four Levels of InterpretabilityCode1
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding TasksCode1
VividMed: Vision Language Model with Versatile Visual Grounding for MedicineCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World ClaimsCode1
Counterfactual Generative Modeling with Variational Causal InferenceCode1
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned ConceptsCode1
In-vivo high-resolution χ-separation at 7TCode1
Open Materials 2024 (OMat24) Inorganic Materials Dataset and ModelsCode1
Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph ForecastingCode1
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction TuningCode1
Dual Prototype Evolving for Test-Time Generalization of Vision-Language ModelsCode1
Revealing the Barriers of Language Agents in PlanningCode1
Show:102550
← PrevPage 407 of 9486Next →