SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 56265650 of 474278 papers

TitleStatusHype
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by SegmentationCode2
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual EncodingCode2
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality GenerationCode2
Investigating Affective Use and Emotional Well-being on ChatGPTCode2
RWKVTTS: Yet another TTS based on RWKV-7Code2
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech TranslationCode2
Agentic Knowledgeable Self-awarenessCode2
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic SegmentationCode2
GPG: A Simple and Strong Reinforcement Learning Baseline for Model ReasoningCode2
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual EditingCode2
Sparse Autoencoders Learn Monosemantic Features in Vision-Language ModelsCode2
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric CalibrationCode2
ZClip: Adaptive Spike Mitigation for LLM Pre-TrainingCode2
CrystalFormer-RL: Reinforcement Fine-Tuning for Materials DesignCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite ImageryCode2
Exploration-Driven Generative Interactive EnvironmentsCode2
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation SchemeCode2
Re-thinking Temporal Search for Long-Form Video UnderstandingCode2
Scaling Video-Language Models to 10K Frames via Hierarchical Differential DistillationCode2
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion RefinementCode2
An Illusion of Progress? Assessing the Current State of Web AgentsCode2
SpaceR: Reinforcing MLLMs in Video Spatial ReasoningCode2
Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation FrameworkCode2
Efficient Federated Learning Tiny Language Models for Mobile Network Feature PredictionCode2
Show:102550
← PrevPage 226 of 18972Next →