SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 89018950 of 177340 papers

TitleStatusHype
RuleKit 2: Faster and simpler rule learningCode2
Segment Anything for HistopathologyCode2
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement LearningCode2
Seeing World Dynamics in a NutshellCode2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language ModelsCode2
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAGCode2
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding TutorsCode2
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference OptimizationCode2
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance SegmentationCode2
A Survey on Data Contamination for Large Language ModelsCode2
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion ModelsCode2
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric PruningCode2
voc2vec: A Foundation Model for Non-Verbal VocalizationCode2
WebGames: Challenging General-Purpose Web-Browsing AI AgentsCode2
FlexVAR: Flexible Visual Autoregressive Modeling without Residual PredictionCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
Automatic database description generation for Text-to-SQLCode2
UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture SearchCode2
LongProLIP: A Probabilistic Vision-Language Model with Long Context TextCode2
An Approach for Air Drawing Using Background Subtraction and Contour ExtractionCode2
Interactive Debugging and Steering of Multi-Agent AI SystemsCode2
MPO: Boosting LLM Agents with Meta Plan OptimizationCode2
Text2LIVE: Text-Driven Layered Image and Video EditingCode2
Similarity-Guided Layer-Adaptive Vision Transformer for UAV TrackingCode2
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian SplatsCode2
Is CLIP ideal? No. Can we fix it? Yes!Code2
Word2World: Generating Stories and Worlds through Large Language ModelsCode2
LLM-FP4: 4-Bit Floating-Point Quantized TransformersCode2
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMsCode2
A Comprehensive Survey on Knowledge DistillationCode2
TimberTrek: Exploring and Curating Sparse Decision Trees with Interactive VisualizationCode2
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion ModelsCode2
MambaIC: State Space Models for High-Performance Learned Image CompressionCode2
Single Image Iterative Subject-driven Generation and EditingCode2
NuiScene: Exploring Efficient Generation of Unbounded Outdoor ScenesCode2
SaMam: Style-aware State Space Model for Arbitrary Image Style TransferCode2
Splat-LOAM: Gaussian Splatting LiDAR Odometry and MappingCode2
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly DetectionCode2
Datasets for Depression Modeling in Social Media: An OverviewCode2
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real WorldCode2
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile DevicesCode2
Efficient Federated Learning Tiny Language Models for Mobile Network Feature PredictionCode2
An Illusion of Progress? Assessing the Current State of Web AgentsCode2
Re-thinking Temporal Search for Long-Form Video UnderstandingCode2
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and OpportunitiesCode2
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal PromptingCode2
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality GenerationCode2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language ModelsCode2
Show:102550
← PrevPage 179 of 3547Next →