The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1900 of 659983 papers

Title	Date	Tasks	Status	Hype
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models	Feb 27, 2024	MarketingVideo Generation	CodeCode Available	4
LLM Inference Unveiled: Survey and Roofline Model Insights	Feb 26, 2024	Knowledge DistillationLanguage Modelling	CodeCode Available	4
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation	Feb 26, 2024	Code Documentation GenerationCode Generation	CodeCode Available	4
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT	Feb 26, 2024		CodeCode Available	4
Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering	Feb 26, 2024	Evidence SelectionOpen-Ended Question Answering	CodeCode Available	4
Neural Operators with Localized Integral and Differential Kernels	Feb 26, 2024	Operator learning	CodeCode Available	4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step	Feb 25, 2024	Code GenerationHumanEval	CodeCode Available	4
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	Feb 25, 2024		CodeCode Available	4
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning	Feb 23, 2024		CodeCode Available	4
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System	Feb 23, 2024	AI Agent	CodeCode Available	4
Self-Supervised Pre-Training for Table Structure Recognition Transformer	Feb 23, 2024	Representation Learning	CodeCode Available	4
Cameras as Rays: Pose Estimation via Ray Diffusion	Feb 22, 2024	3D ReconstructionCamera Pose Estimation	CodeCode Available	4
2D Matryoshka Sentence Embeddings	Feb 22, 2024	RAGRepresentation Learning	CodeCode Available	4
TinyLLaVA: A Framework of Small-scale Large Multimodal Models	Feb 22, 2024	Visual Question Answering	CodeCode Available	4
Large Language Models for Data Annotation and Synthesis: A Survey	Feb 21, 2024	Survey	CodeCode Available	4
Benchmarking Retrieval-Augmented Generation for Medicine	Feb 20, 2024	BenchmarkingInformation Retrieval	CodeCode Available	4
Neural Network Diffusion	Feb 20, 2024	Decoder	CodeCode Available	4
FinBen: A Holistic Financial Benchmark for Large Language Models	Feb 20, 2024	Question AnsweringRAG	CodeCode Available	4
Aria Everyday Activities Dataset	Feb 20, 2024		CodeCode Available	4
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling	Feb 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs	Feb 19, 2024	Knowledge Distillation	CodeCode Available	4
GIM: Learning Generalizable Image Matcher From Internet Videos	Feb 16, 2024	3D ReconstructionCamera Pose Estimation	CodeCode Available	4
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss	Feb 16, 2024	RAG	CodeCode Available	4
Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation	Feb 16, 2024	Cardiac SegmentationDecoder	CodeCode Available	4
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation	Feb 16, 2024	Knowledge DistillationQuantization	CodeCode Available	4
PointMamba: A Simple State Space Model for Point Cloud Analysis	Feb 16, 2024	GPUMamba	CodeCode Available	4
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models	Feb 16, 2024		CodeCode Available	4
Generative Representational Instruction Tuning	Feb 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
TIAViz: A Browser-based Visualization Tool for Computational Pathology Models	Feb 15, 2024	whole slide images	CodeCode Available	4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	Feb 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	4
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM	Feb 14, 2024	Medical Visual Question AnsweringQuestion Answering	CodeCode Available	4
DoRA: Weight-Decomposed Low-Rank Adaptation	Feb 14, 2024	parameter-efficient fine-tuning	CodeCode Available	4
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering	Feb 12, 2024	Common Sense ReasoningGraph Classification	CodeCode Available	4
Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English	Feb 12, 2024		CodeCode Available	4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Feb 12, 2024	HallucinationObject Localization	CodeCode Available	4
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation	Feb 11, 2024	Cardiac SegmentationContrastive Learning	CodeCode Available	4
ScreenAgent: A Vision Language Model-driven Computer Control Agent	Feb 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA	Feb 9, 2024	Event DetectionHate Speech Detection	CodeCode Available	4
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4
InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write	Feb 8, 2024	Derendering	CodeCode Available	4
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis	Feb 8, 2024	AttributeConditional Text-to-Image Synthesis	CodeCode Available	4
Spirit LM: Interleaved Spoken and Written Language Model	Feb 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement	Feb 8, 2024	Image EnhancementLow-light Image Deblurring and Enhancement	CodeCode Available	4
AlphaFold Meets Flow Matching for Generating Protein Ensembles	Feb 7, 2024	Diversity	CodeCode Available	4
JAX-Fluids 2.0: Towards HPC for Differentiable CFD of Compressible Two-phase Flows	Feb 7, 2024	GPU	CodeCode Available	4
Amortized Planning with Large-Scale Transformers: A Case Study on Chess	Feb 7, 2024	Memorization	CodeCode Available	4
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation	Feb 7, 2024	Cardiac SegmentationComputational Efficiency	CodeCode Available	4
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks	Feb 6, 2024	Quantization	CodeCode Available	4
LESS: Selecting Influential Data for Targeted Instruction Tuning	Feb 6, 2024		CodeCode Available	4
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal	Feb 6, 2024	Red Teaming	CodeCode Available	4