SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1550115550 of 474278 papers

TitleStatusHype
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian SplattingCode1
IMTS is Worth Time Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series PredictionCode1
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and BenchmarkingCode1
Do You See Me : A Multidimensional Benchmark for Evaluating Visual Perception in Multimodal LLMsCode1
Large Language Models for Depression Recognition in Spoken Language Integrating Psychological KnowledgeCode1
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and BenchmarkingCode1
Update Your Transformer to the Latest Release: Re-Basin of Task VectorsCode1
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit DesignCode1
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing ProblemCode1
Pre-Training Curriculum for Multi-Token Prediction in Language ModelsCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
Self-orthogonalizing attractor neural networks emerging from the free energy principleCode1
ChatCFD: an End-to-End CFD Agent with Domain-specific Structured ThinkingCode1
Training Language Models to Generate Quality Code with Program Analysis FeedbackCode1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K ResolutionCode1
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy PredictionCode1
DeSocial: Blockchain-based Decentralized Social NetworksCode1
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement LearningCode1
MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent SystemsCode1
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent VisibilityCode1
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical ReasoningCode1
RefAV: Towards Planning-Centric Scenario MiningCode1
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy SpaceCode1
ConText-CIR: Learning from Concepts in Text for Composed Image RetrievalCode1
Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local ExplanationsCode1
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment GroundingCode1
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language ModelsCode1
LPOI: Listwise Preference Optimization for Vision Language ModelsCode1
AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop MappingCode1
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed IndividualsCode1
Taylor expansion-based Kolmogorov-Arnold network for blind image quality assessmentCode1
Minute-Long Videos with Dual ParallelismsCode1
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial InformationCode1
Dual-Polarization Stacked Intelligent Metasurfaces for Holographic MIMOCode1
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone NavigationCode1
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent CollaborationCode1
AutoReproduce: Automatic AI Experiment Reproduction with Paper LineageCode1
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language NavigationCode1
DiMoSR: Feature Modulation via Multi-Branch Dilated Convolutions for Efficient Image Super-ResolutionCode1
RoBiS: Robust Binary Segmentation for High-Resolution Industrial ImagesCode1
FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and AttentionCode1
Pretraining Language Models to Ponder in Continuous SpaceCode1
Music Source RestorationCode1
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using BlenderCode1
Efficient Multi-modal Long Context Learning for Training-free AdaptationCode1
Lifelong Safety Alignment for Language ModelsCode1
REARANK: Reasoning Re-ranking Agent via Reinforcement LearningCode1
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMsCode1
Show:102550
← PrevPage 311 of 9486Next →