The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8176–8200 of 474278 papers

Title	Date	Tasks	Status	Hype
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models	Jun 24, 2024	Video Generation	CodeCode Available	2
Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?	Jun 24, 2024	Image SegmentationMedical Image Segmentation	CodeCode Available	2
Finding Transformer Circuits with Edge Pruning	Jun 24, 2024	In-Context LearningLanguage Modelling	CodeCode Available	2
One Thousand and One Pairs: A "novel" challenge for long-context language models	Jun 24, 2024	RetrievalSentence	CodeCode Available	2
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments	Jun 24, 2024	World Knowledge	CodeCode Available	2
From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking	Jun 24, 2024	BenchmarkingNeRF	CodeCode Available	2
CausalFormer: An Interpretable Transformer for Temporal Causal Discovery	Jun 24, 2024	Causal DiscoveryTime Series	CodeCode Available	2
SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud	Jun 24, 2024	Autonomous DrivingAutonomous Navigation	CodeCode Available	2
LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction	Jun 23, 2024		CodeCode Available	2
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking	Jun 23, 2024	Benchmarking	CodeCode Available	2
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation	Jun 23, 2024	3D Lane DetectionAutonomous Driving	CodeCode Available	2
Efficient Evolutionary Search Over Chemical Space with Large Language Models	Jun 23, 2024	Drug DesignEvolutionary Algorithms	CodeCode Available	2
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting	Jun 22, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs	Jun 22, 2024	HallucinationUncertainty Quantification	CodeCode Available	2
Soft Masked Mamba Diffusion Model for CT to MRI Conversion	Jun 22, 2024	Computed Tomography (CT)Image Generation	CodeCode Available	2
What Matters in Transformers? Not All Attention is Needed	Jun 22, 2024	AllMMLU	CodeCode Available	2
PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud	Jun 22, 2024	Image Inpainting	CodeCode Available	2
Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level	Jun 22, 2024	Machine TranslationTranslation	CodeCode Available	2
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning	Jun 21, 2024	FairnessGeographic Question Answering	CodeCode Available	2
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models	Jun 21, 2024	Learning-To-RankPassage Ranking	CodeCode Available	2
Direct Multi-Turn Preference Optimization for Language Agents	Jun 21, 2024	Reinforcement Learning (RL)	CodeCode Available	2
RouteFinder: Towards Foundation Models for Vehicle Routing Problems	Jun 21, 2024	AttributeMulti-Task Learning	CodeCode Available	2
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis	Jun 21, 2024	AI AgentAutoML	CodeCode Available	2
Cross-Modality Safety Alignment	Jun 21, 2024	Safety Alignment	CodeCode Available	2