SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1745117500 of 474278 papers

TitleStatusHype
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own ReasoningCode1
Contextualizing biological perturbation experiments through languageCode1
Dynamic Markov Blanket Detection for Macroscopic Physics DiscoveryCode1
Algebraic Machine Learning: Learning as computing an algebraic decomposition of a taskCode1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep LearningCode1
Your contrastive learning problem is secretly a distribution alignment problemCode1
Multi-Turn Code Generation Through Single-Step RewardsCode1
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language ModelsCode1
Playing Pokémon Red via Deep Reinforcement LearningCode1
A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera RelocalizationCode1
ColorDynamic: Generalizable, Scalable, Real-time, End-to-end Local Planner for Unstructured and Dynamic EnvironmentsCode1
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval EvaluationCode1
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity GroupingCode1
Can Textual Gradient Work in Federated Learning?Code1
ProAPO: Progressively Automatic Prompt Optimization for Visual ClassificationCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Long-Context Inference with Retrieval-Augmented Speculative DecodingCode1
Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM AgentsCode1
SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous NetworksCode1
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech EnhancementCode1
Time-Varying Identification of Structural Vector AutoregressionsCode1
Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement LearningCode1
Bridging the PLC Binary Analysis Gap: A Cross-Compiler Dataset and Neural Framework for Industrial Control SystemsCode1
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language ModelsCode1
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language ModelsCode1
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge BasesCode1
Mixtera: A Data Plane for Foundation Model TrainingCode1
Implicit Search via Discrete Diffusion: A Study on ChessCode1
Gradient-Guided Annealing for Domain GeneralizationCode1
Generative augmentations for improved cardiac ultrasound segmentation using diffusion modelsCode1
Foot-In-The-Door: A Multi-turn Jailbreak for LLMsCode1
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual KnowledgeCode1
EgoNormia: Benchmarking Physical Social Norm UnderstandingCode1
QPM: Discrete Optimization for Globally Interpretable Image ClassificationCode1
Vector-Quantized Vision Foundation Models for Object-Centric LearningCode1
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence LearningCode1
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix FactorizationCode1
SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language ModelCode1
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired TransformerCode1
Mixmamba-fewshot: mamba and attention mixer-based method with few-shot learning for bearing fault diagnosisCode1
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series ExpansionsCode1
RouteRL: Multi-agent reinforcement learning framework for urban route choice with autonomous vehiclesCode1
SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View SegmentationCode1
Spiideo SoccerNet SynLoc: Single Frame World Coordinate Athlete Detection and Localization with Synthetic DataCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-ExpertsCode1
Multi-Keypoint Affordance Representation for Functional Dexterous GraspingCode1
Show:102550
← PrevPage 350 of 9486Next →