SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 81768200 of 474278 papers

TitleStatusHype
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion ModelsCode2
Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?Code2
Finding Transformer Circuits with Edge PruningCode2
One Thousand and One Pairs: A "novel" challenge for long-context language modelsCode2
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-ConquerCode2
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text EnvironmentsCode2
From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness BenchmarkingCode2
CausalFormer: An Interpretable Transformer for Temporal Causal DiscoveryCode2
SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point CloudCode2
LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene ReconstructionCode2
Towards Open Respiratory Acoustic Foundation Models: Pretraining and BenchmarkingCode2
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view RepresentationCode2
Efficient Evolutionary Search Over Chemical Space with Large Language ModelsCode2
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and VotingCode2
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMsCode2
Soft Masked Mamba Diffusion Model for CT to MRI ConversionCode2
What Matters in Transformers? Not All Attention is NeededCode2
PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point CloudCode2
Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next LevelCode2
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation LearningCode2
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language ModelsCode2
Direct Multi-Turn Preference Optimization for Language AgentsCode2
RouteFinder: Towards Foundation Models for Vehicle Routing ProblemsCode2
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data AnalysisCode2
Cross-Modality Safety AlignmentCode2
Show:102550
← PrevPage 328 of 18972Next →