SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 73017350 of 661570 papers

TitleStatusHype
Mixed-curvature decision trees and random forestsCode2
Towards Comprehensive Detection of Chinese Harmful MemesCode2
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLCode2
PnP-Flow: Plug-and-Play Image Restoration with Flow MatchingCode2
Curvature Diversity-Driven Deformation and Domain Alignment for Point CloudCode2
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and BeyondCode2
CodeJudge: Evaluating Code Generation with Large Language ModelsCode2
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-SeriesCode2
Interpreting and Editing Vision-Language Representations to Mitigate HallucinationsCode2
NNetscape Navigator: Complex Demonstrations for Web Agents Without a DemonstratorCode2
LLMs Know More Than They Show: On the Intrinsic Representation of LLM HallucinationsCode2
MiraGe: Editable 2D Images using Gaussian SplattingCode2
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object DetectionCode2
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical DebuggingCode2
Interpretable Contrastive Monte Carlo Tree Search ReasoningCode2
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?Code2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
FlipAttack: Jailbreak LLMs via FlippingCode2
Leopard: A Vision Language Model For Text-Rich Multi-Image TasksCode2
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language ModelsCode2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?Code2
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image GenerationCode2
Selective Aggregation for Low-Rank Adaptation in Federated LearningCode2
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentCode2
Peeling Back the Layers: An In-Depth Evaluation of Encoder Architectures in Neural News RecommendersCode2
EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary DynamicsCode2
Generative causal testing to bridge data-driven models and scientific theories in language neuroscienceCode2
PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly DetectionCode2
GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous DrivingCode2
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion ControlCode2
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU LanguagesCode2
CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAMCode2
Uncertainty Modelling and Robust Observer Synthesis using the Koopman OperatorCode2
Recent Advances in Speech Language Models: A SurveyCode2
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language ModelsCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"Code2
PerCo (SD): Open Perceptual CompressionCode2
Frequency Adaptive Normalization For Non-stationary Time Series ForecastingCode2
Robin3D: Improving 3D Large Language Model via Robust Instruction TuningCode2
HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy ScenesCode2
DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy PredictionCode2
ForecastBench: A Dynamic Benchmark of AI Forecasting CapabilitiesCode2
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention HeadCode2
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language ModelsCode2
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataCode2
Melody-Guided Music GenerationCode2
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge AugmentationCode2
End-to-end Piano Performance-MIDI to Score Conversion with TransformersCode2
Towards Robust Multimodal Sentiment Analysis with Incomplete DataCode2
Show:102550
← PrevPage 147 of 13232Next →