SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 18761900 of 661570 papers

TitleStatusHype
PointMamba: A Simple State Space Model for Point Cloud AnalysisCode4
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language ModelsCode4
Generative Representational Instruction TuningCode4
TIAViz: A Browser-based Visualization Tool for Computational Pathology ModelsCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLMCode4
DoRA: Weight-Decomposed Low-Rank AdaptationCode4
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question AnsweringCode4
Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and EnglishCode4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image SegmentationCode4
ScreenAgent: A Vision Language Model-driven Computer Control AgentCode4
Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMACode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and WriteCode4
MIGC: Multi-Instance Generation Controller for Text-to-Image SynthesisCode4
Spirit LM: Interleaved Spoken and Written Language ModelCode4
You Only Need One Color Space: An Efficient Network for Low-light Image EnhancementCode4
AlphaFold Meets Flow Matching for Generating Protein EnsemblesCode4
JAX-Fluids 2.0: Towards HPC for Differentiable CFD of Compressible Two-phase FlowsCode4
Amortized Planning with Large-Scale Transformers: A Case Study on ChessCode4
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image SegmentationCode4
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice CodebooksCode4
LESS: Selecting Influential Data for Targeted Instruction TuningCode4
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust RefusalCode4
Show:102550
← PrevPage 76 of 26463Next →