The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9851–9900 of 661570 papers

Title	Date	Tasks	Status	Hype
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data	Feb 22, 2024	Irregular Time SeriesMissing Values	CodeCode Available	2
tinyBenchmarks: evaluating LLMs with fewer examples	Feb 22, 2024	MMLUMultiple-choice	CodeCode Available	2
HyperFast: Instant Classification for Tabular Data	Feb 22, 2024	AutoMLClassification	CodeCode Available	2
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective	Feb 22, 2024	HallucinationSentence	CodeCode Available	2
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues	Feb 22, 2024		CodeCode Available	2
PALO: A Polyglot Large Multimodal Model for 5B People	Feb 22, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching	Feb 21, 2024	Image Generation	CodeCode Available	2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents	Feb 21, 2024	Active LearningPosition	CodeCode Available	2
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding	Feb 21, 2024	Text Generation	CodeCode Available	2
Geometry-Informed Neural Networks	Feb 21, 2024	Diversity	CodeCode Available	2
Full-Atom Peptide Design with Geometric Latent Diffusion	Feb 21, 2024		CodeCode Available	2
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning	Feb 21, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models	Feb 21, 2024		CodeCode Available	2
D-Flow: Differentiating through Flows for Controlled Generation	Feb 21, 2024		CodeCode Available	2
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems	Feb 21, 2024	Logical Fallacies	CodeCode Available	2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions	Feb 21, 2024	Decision MakingImitation Learning	CodeCode Available	2
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent	Feb 21, 2024	Incremental Learning	CodeCode Available	2
Coercing LLMs to do and reveal (almost) anything	Feb 21, 2024		CodeCode Available	2
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks	Feb 21, 2024	Computational EfficiencyObject	CodeCode Available	2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models	Feb 21, 2024	Question Answering	CodeCode Available	2
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis	Feb 21, 2024		CodeCode Available	2
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain	Feb 21, 2024	Autonomous DrivingDecision Making	CodeCode Available	2
A Touch, Vision, and Language Dataset for Multimodal Alignment	Feb 20, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Transformer tricks: Precomputing the first layer	Feb 20, 2024		CodeCode Available	2
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition	Feb 20, 2024	Emotion RecognitionSelf-Supervised Learning	CodeCode Available	2
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models	Feb 20, 2024	DenoisingImage Generation	CodeCode Available	2
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling	Feb 20, 2024	Multivariate Time Series ForecastingTime Series	CodeCode Available	2
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations	Feb 20, 2024	Sentence	CodeCode Available	2
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention	Feb 20, 2024		CodeCode Available	2
Me LLaMA: Foundation Large Language Models for Medical Applications	Feb 20, 2024	Few-Shot LearningGPU	CodeCode Available	2
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing	Feb 20, 2024	Voice Cloning	CodeCode Available	2
Event-Based Motion Magnification	Feb 19, 2024	BenchmarkingMotion Detection	CodeCode Available	2
UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models	Feb 19, 2024	Image GenerationMachine Unlearning	CodeCode Available	2
Class-incremental Learning for Time Series: Benchmark and Evaluation	Feb 19, 2024	Activity RecognitionBenchmarking	CodeCode Available	2
A Critical Evaluation of AI Feedback for Aligning Large Language Models	Feb 19, 2024	Instruction Followingreinforcement-learning	CodeCode Available	2
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models	Feb 19, 2024	Adversarial DefenseMultimodal Deep Learning	CodeCode Available	2
EmoBench: Evaluating the Emotional Intelligence of Large Language Models	Feb 19, 2024	Emotional IntelligenceEmotion Recognition	CodeCode Available	2
The Revolution of Multimodal Large Language Models: A Survey	Feb 19, 2024	Image GenerationInstruction Following	CodeCode Available	2
Reformatted Alignment	Feb 19, 2024	GSM8KHallucination	CodeCode Available	2
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs	Feb 19, 2024	Safety Alignment	CodeCode Available	2
EVOR: Evolving Retrieval for Code Generation	Feb 19, 2024	Code GenerationRAG	CodeCode Available	2
Generative Semi-supervised Graph Anomaly Detection	Feb 19, 2024	Anomaly DetectionGraph Anomaly Detection	CodeCode Available	2
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs	Feb 19, 2024	Question Answering	CodeCode Available	2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Feb 19, 2024	Instruction FollowingMath	CodeCode Available	2
Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation	Feb 19, 2024	DenoisingFew-Shot Learning	CodeCode Available	2
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators	Feb 19, 2024		CodeCode Available	2
CausalGym: Benchmarking causal interpretability methods on linguistic tasks	Feb 19, 2024	BenchmarkingInterpretability Techniques for Deep Learning	CodeCode Available	2
Pan-Mamba: Effective pan-sharpening with State Space Model	Feb 19, 2024	MambaPansharpening	CodeCode Available	2
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships	Feb 19, 2024	3d scene graph generationObject	CodeCode Available	2
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models	Feb 19, 2024		CodeCode Available	2