SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 86018650 of 661570 papers

TitleStatusHype
Aligning language models with human preferencesCode2
Towards Universal Sequence Representation Learning for Recommender SystemsCode2
Agent Planning with World Knowledge ModelCode2
HAKE: A Knowledge Engine Foundation for Human Activity UnderstandingCode2
SfM-Free 3D Gaussian Splatting via Hierarchical TrainingCode2
xVerify: Efficient Answer Verifier for Reasoning Model EvaluationsCode2
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation ModelsCode2
Fast Feedforward NetworksCode2
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt CalibrationCode2
GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View DiffusionCode2
Ensembling Prioritized Hybrid Policies for Multi-agent PathfindingCode2
Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal ExamplesCode2
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action SegmentationCode2
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and ChallengesCode2
Generative replay with feedback connections as a general strategy for continual learningCode2
DiffCSE: Difference-based Contrastive Learning for Sentence EmbeddingsCode2
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion ModelsCode2
UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder DesignCode2
Cluster and Predict Latents Patches for Improved Masked Image ModelingCode2
NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and ResultsCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy PredictionCode2
An AI-Ready Multiplex Staining Dataset for Reproducible and Accurate Characterization of Tumor Immune MicroenvironmentCode2
A New Frontier of AI: On-Device AI Training and PersonalizationCode2
MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-ProcessingCode2
Towards Understanding and Boosting Adversarial Transferability from a Distribution PerspectiveCode2
EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object DetectionCode2
Self-playing Adversarial Language Game Enhances LLM ReasoningCode2
A Pytorch Reproduction of Masked Generative Image TransformerCode2
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-ThoughtCode2
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPOCode2
DiffCLIP: Differential Attention Meets CLIPCode2
Compute-Constrained Data SelectionCode2
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-TuningCode2
Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic SurgeryCode2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionCode2
Large Language Model Guided Tree-of-ThoughtCode2
HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine TranslationCode2
Tamper-Resistant Safeguards for Open-Weight LLMsCode2
Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot NavigationCode2
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic ScenesCode2
Fast ODE-based Sampling for Diffusion Models in Around 5 StepsCode2
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRCode2
FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech EnhancementCode2
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
Accelerated Quality-Diversity through Massive ParallelismCode2
Anomaly Detection with Conditioned Denoising Diffusion ModelsCode2
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language GuidanceCode2
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement LearningCode2
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-trainingCode2
Show:102550
← PrevPage 173 of 13232Next →