SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1245112500 of 474278 papers

TitleStatusHype
Improving Causal Reasoning in Large Language Models: A SurveyCode2
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal UnderstandingCode2
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldCode2
LibreFace: An Open-Source Toolkit for Deep Facial Expression AnalysisCode2
RelationField: Relate Anything in Radiance FieldsCode2
Effector: A Python package for regional explanationsCode2
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech SynthesisCode2
Structure-informed Language Models Are Protein DesignersCode2
RETA-LLM: A Retrieval-Augmented Large Language Model ToolkitCode2
Scaling Multi-Camera 3D Object Detection through Weak-to-Strong ElicitingCode2
Reviving The Classics: Active Reward Modeling in Large Language Model AlignmentCode2
Decoupling Knowledge from Memorization: Retrieval-augmented Prompt LearningCode2
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization AbilityCode2
Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit RepresentationCode2
LeanVec: Searching vectors faster by making them fitCode2
BIGCity: A Universal Spatiotemporal Model for Unified Trajectory and Traffic State Data AnalysisCode2
CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance DesignCode2
Incremental Sequence Labeling: A Tale of Two ShiftsCode2
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and VerificationCode2
mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUsCode2
OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane RenderingCode2
Graphic Design with Large Multimodal ModelCode2
Humanoid Agents: Platform for Simulating Human-like Generative AgentsCode2
What Are Expected Queries in End-to-End Object Detection?Code2
Woodpecker: Hallucination Correction for Multimodal Large Language ModelsCode2
Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement LearningCode2
radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG ReconstructionCode2
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental LearningCode2
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text TasksCode2
SPA-RL: Reinforcing LLM Agents via Stepwise Progress AttributionCode2
Off-Policy Evaluation for Large Action Spaces via EmbeddingsCode2
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial ImageryCode2
DPoser: Diffusion Model as Robust 3D Human Pose PriorCode2
Asynchronous Large Language Model Enhanced Planner for Autonomous DrivingCode2
BanditPAM++: Faster k-medoids ClusteringCode2
TransST: Transfer Learning Embedded Spatial Factor Modeling of Spatial Transcriptomics DataCode2
Recent Advances in Medical Imaging Segmentation: A SurveyCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource SettingsCode2
EarthLoc: Astronaut Photography Localization by Indexing Earth from SpaceCode2
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic PerspectiveCode2
Revisiting Tampered Scene Text Detection in the Era of Generative AICode2
LinSATNet: The Positive Linear Satisfiability Neural NetworksCode2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AICode2
Discover and Mitigate Multiple Biased Subgroups in Image ClassifiersCode2
LinK3D: Linear Keypoints Representation for 3D LiDAR Point CloudCode2
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing ImagesCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and StyleCode2
Show:102550
← PrevPage 250 of 9486Next →