SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 82518300 of 661570 papers

TitleStatusHype
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and ReactionCode2
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image UnderstandingCode2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMCode2
Universal Score-based Speech Enhancement with High Content PreservationCode2
MegaScenes: Scene-Level View Synthesis at ScaleCode2
Task Me AnythingCode2
Scaling Efficient Masked Image Modeling on Large Remote Sensing DatasetCode2
Solving the Inverse Problem of Electrocardiography for Cardiac Digital Twins: A SurveyCode2
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific FactorsCode2
mDPO: Conditional Preference Optimization for Multimodal Large Language ModelsCode2
A Robust Online Multi-Camera People Tracking System With Geometric Consistency and State-aware Re-ID CorrectionCode2
Residual and bidirectional LSTM for epileptic seizure detectionCode2
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process RefinementCode2
Twin-Merging: Dynamic Integration of Modular Expertise in Model MergingCode2
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical ReasoningCode2
MedCalc-Bench: Evaluating Large Language Models for Medical CalculationsCode2
GUICourse: From General Vision Language Models to Versatile GUI AgentsCode2
Transcoders Find Interpretable LLM Feature CircuitsCode2
In-Context Editing: Learning Knowledge from Self-Induced DistributionsCode2
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPOCode2
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%Code2
Understanding Multi-Granularity for Open-Vocabulary Part SegmentationCode2
Zero-Shot Scene Change DetectionCode2
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language ModelsCode2
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic SegmentationCode2
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMsCode2
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning AbilitiesCode2
OGNI-DC: Robust Depth Completion with Optimization-Guided Neural IterationsCode2
Large Scale Transfer Learning for Tabular Data via Language ModelingCode2
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language ModelsCode2
DistPred: A Distribution-Free Probabilistic Inference Method for Regression and ForecastingCode2
Duoduo CLIP: Efficient 3D Understanding with Multi-View ImagesCode2
DiffMM: Multi-Modal Diffusion Model for RecommendationCode2
Ontology Embedding: A Survey of Methods, Applications and ResourcesCode2
STAR: Scale-wise Text-to-image generation via Auto-Regressive representationsCode2
Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold NetworksCode2
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language ModelsCode2
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion ModelsCode2
CrossFuse: A Novel Cross Attention Mechanism based Infrared and Visible Image Fusion ApproachCode2
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object DetectionCode2
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering IncorrectlyCode2
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and GenerationCode2
Text-space Graph Foundation Models: Comprehensive Benchmarks and New InsightsCode2
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian SplattingCode2
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language UnderstandingCode2
QQQ: Quality Quattuor-Bit Quantization for Large Language ModelsCode2
GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View DiffusionCode2
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian LanguagesCode2
Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent LearningCode2
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation ModelsCode2
Show:102550
← PrevPage 166 of 13232Next →