SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 38763900 of 661570 papers

TitleStatusHype
IEPile: Unearthing Large-Scale Schema-Based Information Extraction CorpusCode3
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single ShotCode3
OmniPred: Language Models as Universal RegressorsCode3
MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein EmbeddingCode3
Cleaner Pretraining Corpus Curation with Neural Web ScrapingCode3
Towards Seamless Adaptation of Pre-trained Models for Visual Place RecognitionCode3
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
Towards Building Multilingual Language Model for MedicineCode3
LongRoPE: Extending LLM Context Window Beyond 2 Million TokensCode3
Bench: Extending Long Context Evaluation Beyond 100K TokensCode3
Visual Style Prompting with Swapping Self-AttentionCode3
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
TorchCP: A Python Library for Conformal PredictionCode3
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-PositiveCode3
Codec-SUPERB: An In-Depth Analysis of Sound Codec ModelsCode3
FiT: Flexible Vision Transformer for Diffusion ModelCode3
A Chinese Dataset for Evaluating the Safeguards in Large Language ModelsCode3
UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal PredictionCode3
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image GenerationCode3
Language-Codec: Bridging Discrete Codec Representations and Speech Language ModelsCode3
Sequoia: Scalable, Robust, and Hardware-aware Speculative DecodingCode3
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart ReasoningCode3
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic EvaluationsCode3
Major TOM: Expandable Datasets for Earth ObservationCode3
Query-Based Adversarial Prompt GenerationCode3
Show:102550
← PrevPage 156 of 26463Next →