SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 61516200 of 661570 papers

TitleStatusHype
Fino1: On the Transferability of Reasoning Enhanced LLMs to FinanceCode2
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image ClassificationCode2
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image InterpretationCode2
Human-Centric Foundation Models: Perception, Generation and Agentic ModelingCode2
Cluster and Predict Latents Patches for Improved Masked Image ModelingCode2
Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent DiffusionCode2
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic TasksCode2
LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep FeaturesCode2
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book DataCode2
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting PointCode2
A Systematic Review on the Evaluation of Large Language Models in Theory of Mind TasksCode2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic DataCode2
TextAtlas5M: A Large-scale Dataset for Dense Text Image GenerationCode2
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its HybridCode2
Training Deep Learning Models with Norm-Constrained LMOsCode2
MeshSplats: Mesh-Based Rendering with Gaussian Splatting InitializationCode2
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous DrivingCode2
DPO-Shift: Shifting the Distribution of Direct Preference OptimizationCode2
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion ModelsCode2
Automated Capability Discovery via Model Self-ExplorationCode2
RoboBERT: An End-to-end Multimodal Robotic Manipulation ModelCode2
SAMRefiner: Taming Segment Anything Model for Universal Mask RefinementCode2
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph EnrichmentCode2
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
On the Emergence of Thinking in LLMs I: Searching for the Right IntuitionCode2
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series ForecastingCode2
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion ModelsCode2
Saving 77% of the Parameters in Large Language Models Technical ReportCode2
Skill Expansion and Composition in Parameter SpaceCode2
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised AnomalyCode2
Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation ModelCode2
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition BenchmarkCode2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and DebuggingCode2
Knowledge Graph-Guided Retrieval Augmented GenerationCode2
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A SurveyCode2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph StructuresCode2
NoLiMa: Long-Context Evaluation Beyond Literal MatchingCode2
GaussRender: Learning 3D Occupancy with Gaussian RenderingCode2
QuEST: Stable Training of LLMs with 1-Bit Weights and ActivationsCode2
MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detectionCode2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
SiriuS: Self-improving Multi-agent Systems via Bootstrapped ReasoningCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language ModelsCode2
Training Language Models to Reason EfficientlyCode2
SoK: Benchmarking Poisoning Attacks and Defenses in Federated LearningCode2
WaferLLM: Large Language Model Inference at Wafer ScaleCode2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference OptimizationCode2
Sparse Autoencoders for Hypothesis GenerationCode2
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile DevicesCode2
Show:102550
← PrevPage 124 of 13232Next →