SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1140111450 of 474278 papers

TitleStatusHype
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and ResultsCode2
VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-TuningCode2
Grounded 3D-LLM with Referent TokensCode2
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous DrivingCode2
GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph LearnerCode2
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLMCode2
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language ModelCode2
Medical Image Segmentation Review: The success of U-NetCode2
A Survey on Protein Representation Learning: Retrospect and ProspectCode2
Tissue Concepts: supervised foundation models in computational pathologyCode2
Real-time Scene Text Detection with Differentiable BinarizationCode2
Characterization of Large Language Model Development in the DatacenterCode2
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graphCode2
Generative Semi-supervised Graph Anomaly DetectionCode2
Reconstructing Personalized Semantic Facial NeRF Models From Monocular VideoCode2
Language-driven Semantic SegmentationCode2
dtaianomaly: A Python library for time series anomaly detectionCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial PracticeCode2
MPAX: Mathematical Programming in JAXCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNsCode2
ColorVideoVDP: A visual difference predictor for image, video and display distortionsCode2
MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRICode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
DataSciBench: An LLM Agent Benchmark for Data ScienceCode2
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content DetectorsCode2
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlowCode2
Foundation Policies with Hilbert RepresentationsCode2
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise OptimizationCode2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D ScenesCode2
Explore the Limits of Omni-modal Pretraining at ScaleCode2
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and SegmentationCode2
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language ModelsCode2
FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity RefinerCode2
MC-Calib: A generic and robust calibration toolbox for multi-camera systemsCode2
Yggdrasil Decision Forests: A Fast and Extensible Decision Forests LibraryCode2
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the WildCode2
Imp: Highly Capable Large Multimodal Models for Mobile DevicesCode2
DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving ApplicationsCode2
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPTCode2
RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation AlgorithmsCode2
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and EvolutionCode2
Trusted Multi-View Classification with Dynamic Evidential FusionCode2
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal CuesCode2
Deep Differentiable Logic Gate NetworksCode2
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language ModelsCode2
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"Code2
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained EvaluationCode2
Show:102550
← PrevPage 229 of 9486Next →