SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1260112650 of 474278 papers

TitleStatusHype
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPTCode2
RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation AlgorithmsCode2
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and EvolutionCode2
Trusted Multi-View Classification with Dynamic Evidential FusionCode2
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal CuesCode2
Deep Differentiable Logic Gate NetworksCode2
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language ModelsCode2
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"Code2
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained EvaluationCode2
Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered ScenesCode2
Epidemiology-Aware Neural ODE with Continuous Disease Transmission GraphCode2
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational BiologyCode2
Synthetic continued pretrainingCode2
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD MappingCode2
Towards a Unified Copernicus Foundation Model for Earth VisionCode2
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMsCode2
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-ResolutionCode2
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"Code2
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion DetectionCode2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math ReasoningCode2
The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP ModelsCode2
Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with PerspectivesCode2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language ModelsCode2
Jailbreaking Attack against Multimodal Large Language ModelCode2
IFRNet: Intermediate Feature Refine Network for Efficient Frame InterpolationCode2
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker SimulationCode2
PromptIR: Prompting for All-in-One Blind Image RestorationCode2
Convolutional Neural Operators for robust and accurate learning of PDEsCode2
Grappa -- A Machine Learned Molecular Mechanics Force FieldCode2
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language VariantsCode2
A Machine Learning Approach That Beats Large Rubik's CubesCode2
Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized MappingCode2
CAPO: Cost-Aware Prompt OptimizationCode2
BakedAvatar: Baking Neural Fields for Real-Time Head Avatar SynthesisCode2
MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular BackbonesCode2
Artificial Intelligence of Things: A SurveyCode2
BianCang: A Traditional Chinese Medicine Large Language ModelCode2
Fast Dynamic Radiance Fields with Time-Aware Neural VoxelsCode2
Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New ApplicationsCode2
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural AnnotationsCode2
Fraud Dataset Benchmark and ApplicationsCode2
Video-STaR: Self-Training Enables Video Instruction Tuning with Any SupervisionCode2
DeBERTa: Decoding-enhanced BERT with Disentangled AttentionCode2
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language ModelsCode2
Streaming Active Learning with Deep Neural NetworksCode2
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing TranslationCode2
Frequency-Adaptive Dilated Convolution for Semantic SegmentationCode2
LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep FeaturesCode2
On Meta-PromptingCode2
Reducing Hallucinations in Vision-Language Models via Latent Space SteeringCode2
Show:102550
← PrevPage 253 of 9486Next →