SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 18511900 of 659983 papers

TitleStatusHype
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision ModelsCode4
LLM Inference Unveiled: Survey and Roofline Model InsightsCode4
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation GenerationCode4
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPTCode4
Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question AnsweringCode4
Neural Operators with Localized Integral and Differential KernelsCode4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-stepCode4
Knowledge Fusion of Chat LLMs: A Preliminary Technical ReportCode4
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent LearningCode4
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent SystemCode4
Self-Supervised Pre-Training for Table Structure Recognition TransformerCode4
Cameras as Rays: Pose Estimation via Ray DiffusionCode4
2D Matryoshka Sentence EmbeddingsCode4
TinyLLaVA: A Framework of Small-scale Large Multimodal ModelsCode4
Large Language Models for Data Annotation and Synthesis: A SurveyCode4
Benchmarking Retrieval-Augmented Generation for MedicineCode4
Neural Network DiffusionCode4
FinBen: A Holistic Financial Benchmark for Large Language ModelsCode4
Aria Everyday Activities DatasetCode4
AnyGPT: Unified Multimodal LLM with Discrete Sequence ModelingCode4
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMsCode4
GIM: Learning Generalizable Image Matcher From Internet VideosCode4
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs MissCode4
Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image SegmentationCode4
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-DistillationCode4
PointMamba: A Simple State Space Model for Point Cloud AnalysisCode4
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language ModelsCode4
Generative Representational Instruction TuningCode4
TIAViz: A Browser-based Visualization Tool for Computational Pathology ModelsCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLMCode4
DoRA: Weight-Decomposed Low-Rank AdaptationCode4
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question AnsweringCode4
Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and EnglishCode4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image SegmentationCode4
ScreenAgent: A Vision Language Model-driven Computer Control AgentCode4
Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMACode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and WriteCode4
MIGC: Multi-Instance Generation Controller for Text-to-Image SynthesisCode4
Spirit LM: Interleaved Spoken and Written Language ModelCode4
You Only Need One Color Space: An Efficient Network for Low-light Image EnhancementCode4
AlphaFold Meets Flow Matching for Generating Protein EnsemblesCode4
JAX-Fluids 2.0: Towards HPC for Differentiable CFD of Compressible Two-phase FlowsCode4
Amortized Planning with Large-Scale Transformers: A Case Study on ChessCode4
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image SegmentationCode4
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice CodebooksCode4
LESS: Selecting Influential Data for Targeted Instruction TuningCode4
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust RefusalCode4
Show:102550
← PrevPage 38 of 13200Next →