SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 81518200 of 661570 papers

TitleStatusHype
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered CluesCode2
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language UnderstandingCode2
Evaluating Morphological Compositional Generalization in Large Language ModelsCode2
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement LearningCode2
DM-Codec: Distilling Multimodal Representations for Speech TokenizationCode2
GPT or BERT: why not both?Code2
Model merging with SVD to tie the KnotsCode2
SciPIP: An LLM-based Scientific Paper Idea ProposerCode2
Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series ForecastingCode2
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution DetectionCode2
MetaOpenFOAM: an LLM-based multi-agent framework for CFDCode2
PyGen: A Collaborative Human-AI Approach to Python Package CreationCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation PerspectiveCode2
vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationCode2
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion ModelsCode2
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian SplattingCode2
Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene GraphsCode2
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation ModelsCode2
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and RerankingCode2
FLAIR: VLM with Fine-grained Language-informed Image RepresentationsCode2
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context ScenarioCode2
SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation LearningCode2
Divot: Diffusion Powers Video Tokenizer for Comprehension and GenerationCode2
JPC: Flexible Inference for Predictive Coding Networks in JAXCode2
MESA: Effective Matching Redundancy Reduction by Semantic Area SegmentationCode2
DriveMM: All-in-One Large Multimodal Model for Autonomous DrivingCode2
MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D ReconstructionCode2
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
MR-GDINO: Efficient Open-World Continual Object DetectionCode2
Scenario-Wise Rec: A Multi-Scenario Recommendation BenchmarkCode2
EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model EvaluationCode2
Test-time Computing: from System-1 Thinking to System-2 ThinkingCode2
TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response ScenariosCode2
Russian Financial Statements Database: A firm-level collection of the universe of financial statementsCode2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code GenerationCode2
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion ModelsCode2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference OptimizationCode2
SalM2: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver AttentionCode2
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton OperatorsCode2
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and EvaluationsCode2
Sanity Checking Causal Representation Learning on a Simple Real-World SystemCode2
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report GenerationCode2
A Training-free LLM-based Approach to General Chinese Character Error CorrectionCode2
SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation ModelsCode2
Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread FunctionsCode2
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit TopologiesCode2
Patch-wise Structural Loss for Time Series ForecastingCode2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationCode2
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical EnvironmentsCode2
Show:102550
← PrevPage 164 of 13232Next →