The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3951–4000 of 661570 papers

Title	Date	Tasks	Status	Hype
EscherNet: A Generative Model for Scalable View Synthesis	Feb 6, 2024	3D ReconstructionGPU	CodeCode Available	3
Self-Discover: Large Language Models Self-Compose Reasoning Structures	Feb 6, 2024	Math	CodeCode Available	3
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations	Feb 6, 2024	Visual Reasoning	CodeCode Available	3
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs	Feb 6, 2024	BinarizationGPU	CodeCode Available	3
V-IRL: Grounding Virtual Intelligence in Real Life	Feb 5, 2024	Decision Making	CodeCode Available	3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning	Feb 5, 2024	reinforcement-learningReinforcement Learning	CodeCode Available	3
Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining	Feb 5, 2024	Image SegmentationMamba	CodeCode Available	3
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM	Feb 5, 2024	3D Semantic SegmentationCamera Pose Estimation	CodeCode Available	3
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache	Feb 5, 2024	Quantization	CodeCode Available	3
Neural networks for abstraction and reasoning: Towards broad generalization in machines	Feb 5, 2024	ARCVisual Reasoning	CodeCode Available	3
Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting	Feb 4, 2024	Time SeriesTime Series Forecasting	CodeCode Available	3
A Survey of Large Language Models in Finance (FinLLMs)	Feb 4, 2024	Named Entity Recognition (NER)Question Answering	CodeCode Available	3
AutoTimes: Autoregressive Time Series Forecasters via Large Language Models	Feb 4, 2024	DecoderIn-Context Learning	CodeCode Available	3
Transolver: A Fast Transformer Solver for PDEs on General Geometries	Feb 4, 2024		CodeCode Available	3
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving	Feb 4, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	3
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains	Feb 4, 2024		CodeCode Available	3
Position: Graph Foundation Models are Already Here	Feb 3, 2024	Position	CodeCode Available	3
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models	Feb 2, 2024	Action GenerationDecision Making	CodeCode Available	3
cmaes : A Simple yet Practical Python Library for CMA-ES	Feb 2, 2024	Transfer Learning	CodeCode Available	3
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting	Feb 2, 2024		CodeCode Available	3
TravelPlanner: A Benchmark for Real-World Planning with Language Agents	Feb 2, 2024		CodeCode Available	3
ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution	Feb 2, 2024	Combinatorial OptimizationEvolutionary Algorithms	CodeCode Available	3
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data	Feb 2, 2024	Contrastive LearningDescriptive	CodeCode Available	3
BlackMamba: Mixture of Experts for State-Space Models	Feb 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Safety of Multimodal Large Language Models on Images and Texts	Feb 1, 2024	Survey	CodeCode Available	3
StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering	Feb 1, 2024	Novel View Synthesis	CodeCode Available	3
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy	Feb 1, 2024	Neural Rendering	CodeCode Available	3
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces	Feb 1, 2024	Computational EfficiencyGPU	CodeCode Available	3
PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks	Feb 1, 2024	Deep Learning	CodeCode Available	3
Repeat After Me: Transformers are Better than State Space Models at Copying	Feb 1, 2024	State Space Models	CodeCode Available	3
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization	Jan 31, 2024	GPUQuantization	CodeCode Available	3
LongAlign: A Recipe for Long Context Alignment of Large Language Models	Jan 31, 2024	DiversityInstruction Following	CodeCode Available	3
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation	Jan 31, 2024	Hierarchical Text Segmentationparameter-efficient fine-tuning	CodeCode Available	3
Common Sense Reasoning for Deepfake Detection	Jan 31, 2024	Binary ClassificationCommon Sense Reasoning	CodeCode Available	3
MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images	Jan 30, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available	3
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models	Jan 30, 2024	Self-Supervised LearningSpeaker Recognition	CodeCode Available	3
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models	Jan 30, 2024		CodeCode Available	3
When Large Language Models Meet Vector Databases: A Survey	Jan 30, 2024	HallucinationInformation Retrieval	CodeCode Available	3
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models	Jan 30, 2024	Knowledge Base ConstructionQuestion Answering	CodeCode Available	3
Corrective Retrieval Augmented Generation	Jan 29, 2024	RAGRetrieval	CodeCode Available	3
StableIdentity: Inserting Anybody into Anywhere at First Sight	Jan 29, 2024	3D Generation	CodeCode Available	3
DeFlow: Decoder of Scene Flow Network in Autonomous Driving	Jan 29, 2024	Autonomous DrivingDecoder	CodeCode Available	3
BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry	Jan 28, 2024		CodeCode Available	3
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting	Jan 28, 2024	Weather Forecasting	CodeCode Available	3
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries	Jan 27, 2024	BenchmarkingRAG	CodeCode Available	3
A Practical Probabilistic Benchmark for AI Weather Models	Jan 27, 2024	DiagnosticWeather Forecasting	CodeCode Available	3
Scientific Large Language Models: A Survey on Biological & Chemical Domains	Jan 26, 2024	scientific discoverySurvey	CodeCode Available	3
SliceGPT: Compress Large Language Models by Deleting Rows and Columns	Jan 26, 2024		CodeCode Available	3
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design	Jan 25, 2024	GPUQuantization	CodeCode Available	3
pix2gestalt: Amodal Segmentation by Synthesizing Wholes	Jan 25, 2024	3D ReconstructionObject Recognition	CodeCode Available	3