SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1135111400 of 474278 papers

TitleStatusHype
OptiMUS: Optimization Modeling Using MIP Solvers and large language modelsCode2
Fast Calibrated Explanations: Efficient and Uncertainty-Aware Explanations for Machine Learning ModelsCode2
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition ControlCode2
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical FeedbackCode2
InteractVLM: 3D Interaction Reasoning from 2D Foundational ModelsCode2
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language ModelsCode2
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool EmbeddingsCode2
PointAvatar: Deformable Point-based Head Avatars from VideosCode2
QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations TransferCode2
Large Scale Radio Frequency Signal ClassificationCode2
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% AccuracyCode2
ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object DetectionCode2
Temporal Action Segmentation: An Analysis of Modern TechniquesCode2
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian SplattingCode2
Splat-SLAM: Globally Optimized RGB-only SLAM with 3D GaussiansCode2
SPINACH: SPARQL-Based Information Navigation for Challenging Real-World QuestionsCode2
Recent Advances of Multimodal Continual Learning: A Comprehensive SurveyCode2
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable ApproachesCode2
Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene SegmentationCode2
Spatial Mental Modeling from Limited ViewsCode2
Quamba: A Post-Training Quantization Recipe for Selective State Space ModelsCode2
SegViT: Semantic Segmentation with Plain Vision TransformersCode2
Process Reward Models for LLM Agents: Practical Framework and DirectionsCode2
Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound PropagationCode2
Slideflow: Deep Learning for Digital Histopathology with Real-Time Whole-Slide VisualizationCode2
Dereflection Any Image with Diffusion Priors and Diversified DataCode2
pyABC: Efficient and robust easy-to-use approximate Bayesian computationCode2
Spoof Diarization: "What Spoofed When" in Partially Spoofed AudioCode2
Generating Images with Multimodal Language ModelsCode2
Sparse Autoencoders for Hypothesis GenerationCode2
Open-World Semantic Segmentation Including Class SimilarityCode2
Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and DiscoveryCode2
Tamil-Llama: A New Tamil Language Model Based on Llama 2Code2
PnP-Flow: Plug-and-Play Image Restoration with Flow MatchingCode2
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image AnalysisCode2
MotifBench: A standardized protein design benchmark for motif-scaffolding problemsCode2
Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor EnvironmentsCode2
Generative Artificial Intelligence for Navigating Synthesizable Chemical SpaceCode2
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from VideosCode2
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action RepresentationsCode2
Powershap: A Power-full Shapley Feature Selection MethodCode2
Skinned Motion Retargeting with Residual Perception of Motion Semantics & GeometryCode2
Effective Data Augmentation With Diffusion ModelsCode2
Ref-GS: Directional Factorization for 2D Gaussian SplattingCode2
DiffRect: Latent Diffusion Label Rectification for Semi-supervised Medical Image SegmentationCode2
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion RecognitionCode2
DrivingSphere: Building a High-fidelity 4D World for Closed-loop SimulationCode2
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data AnalysisCode2
V-Max: A Reinforcement Learning Framework for Autonomous DrivingCode2
Do As I Can, Not As I Say: Grounding Language in Robotic AffordancesCode2
Show:102550
← PrevPage 228 of 9486Next →