SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1790117950 of 474278 papers

TitleStatusHype
Identifying and Understanding Cross-Class Features in Adversarial TrainingCode0
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained VideosCode0
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values0
Tight analyses of first-order methods with error feedbackCode0
Inference economics of language modelsCode0
User Altruism in Recommendation SystemsCode0
Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation0
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models0
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
Progressive Tempering Sampler with DiffusionCode1
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-ViewCode1
EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware ClassifiersCode2
Learning Monotonic Probabilities with a Generative Cost ModelCode0
Video, How Do Your Tokens Merge?Code0
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement0
Go-Browse: Training Web Agents with Structured Exploration0
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data AnnotationCode1
OSGNet @ Ego4D Episodic Memory Challenge 2025Code1
Self-Composing Policies for Scalable Continual Reinforcement Learning0
RedDebate: Safer Responses through Multi-Agent Red Teaming DebatesCode0
MANBench: Is Your Multimodal Model Smarter than Human?Code0
Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer's Disease Detection0
AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data0
Impact of Hill coefficient and time delay on a perceptual decision-making model0
Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity0
Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier0
Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts0
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos0
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning0
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models0
AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving0
SuperWriter: Reflection-Driven Long-Form Generation with Large Language ModelsCode1
LeanExplore: A search engine for Lean 4 declarationsCode2
Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives0
Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models0
A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks0
MudiNet: Task-guided Disentangled Representation Learning for 5G Indoor Multipath-assisted Positioning0
SVD-Based Graph Fractional Fourier Transform on Directed Graphs and Its Application0
Spatiotemporal Prediction of Electric Vehicle Charging Load Based on Large Language Models0
High-Speed Ultra-Energy-Efficient Memristor-Based Massive MIMO SIC Detector Circuit with Hybrid Analog-Digital Computing Architecture0
Learning Fair And Effective Points-Based Rewards Programs0
A note on metapopulation models0
Generalized Lotka-Volterra systems with quenched random interactions and saturating functional response0
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset0
Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation0
The mutual exclusivity bias of bilingual visually grounded speech modelsCode0
Latent Guided Sampling for Combinatorial OptimizationCode0
Identification of RIS-Assisted Paths for Wireless Integrated Sensing and Communication0
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual ReasoningCode0
Show:102550
← PrevPage 359 of 9486Next →