SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1125111300 of 177340 papers

TitleStatusHype
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement LearningCode2
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future DirectionsCode2
VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving SceneCode2
Accelerating Certifiable Estimation with Preconditioned EigensolversCode2
Efficient Video Object Segmentation via Modulated Cross-Attention MemoryCode2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Code2
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-RewardingCode2
Agent-R: Training Language Model Agents to Reflect via Iterative Self-TrainingCode2
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooMCode2
Evaluating LLM Reasoning in the Operations Research Domain with ORQACode2
Knowledge Conflicts for LLMs: A SurveyCode2
DualPrompt: Complementary Prompting for Rehearsal-free Continual LearningCode2
GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection ChallengeCode2
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield PromptingCode2
EfficientZero V2: Mastering Discrete and Continuous Control with Limited DataCode2
Making Large Language Models Perform Better in Knowledge Graph CompletionCode2
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous DrivingCode2
SUNet: Swin Transformer UNet for Image DenoisingCode2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single ImageCode2
Towards Knowledge-driven Autonomous DrivingCode2
Ring Attention with Blockwise Transformers for Near-Infinite ContextCode2
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language ModelsCode2
TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value EstimationCode2
Language models scale reliably with over-training and on downstream tasksCode2
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and EnhancementCode2
Editing Language Model-based Knowledge Graph EmbeddingsCode2
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and RoadmapCode2
STAMP: Scalable Task And Model-agnostic Collaborative PerceptionCode2
Dual Diffusion Implicit Bridges for Image-to-Image TranslationCode2
PartGS:Learning Part-aware 3D Representations by Fusing 2D Gaussians and SuperquadricsCode2
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object DetectionCode2
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference OptimizationCode2
Simple Online and Realtime TrackingCode2
Forecasting Global Weather with Graph Neural NetworksCode2
Towards Generating Realistic 3D Semantic Training Data for Autonomous DrivingCode2
Learning representations of learning representationsCode2
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and UnderstandingCode2
Non-stationary Diffusion For Probabilistic Time Series ForecastingCode2
Rethinking Efficient Lane Detection via Curve ModelingCode2
Generative Auto-Bidding with Value-Guided ExplorationsCode2
MonoCD: Monocular 3D Object Detection with Complementary DepthsCode2
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual DataCode2
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow MatchingCode2
OSSO: Obtaining Skeletal Shape from OutsideCode2
Composed Video Retrieval via Enriched Context and Discriminative EmbeddingsCode2
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their DefensesCode2
BRIO: Bringing Order to Abstractive SummarizationCode2
Towards Measuring and Modeling "Culture" in LLMs: A SurveyCode2
Vript: A Video Is Worth Thousands of WordsCode2
Show:102550
← PrevPage 226 of 3547Next →