SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 20012050 of 659983 papers

TitleStatusHype
TerraTorch: The Geospatial Foundation Models ToolkitCode4
Video-R1: Reinforcing Video Reasoning in MLLMsCode4
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion RefinementCode4
SpatialTrackerV2: 3D Point Tracking Made EasyCode4
Proactive Detection of Voice Cloning with Localized WatermarkingCode4
Eliciting Latent Predictions from Transformers with the Tuned LensCode4
REFINE: Inversion-Free Backdoor Defense via Model ReprogrammingCode4
Relationships are Complicated! An Analysis of Relationships Between Datasets on the WebCode4
Benchmarking Graphormer on Large-Scale Molecular Modeling DatasetsCode4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative RefinementCode4
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal FormalizationCode4
Recurrent Partial Kernel Network for Efficient Optical Flow EstimationCode4
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to RealityCode4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement LearningCode4
Are Transformers Effective for Time Series Forecasting?Code4
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning ModelsCode4
Repurposing Diffusion-Based Image Generators for Monocular Depth EstimationCode4
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-ShotCode4
AlignScore: Evaluating Factual Consistency with a Unified Alignment FunctionCode4
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild VideosCode4
TableGPT2: A Large Multimodal Model with Tabular Data IntegrationCode4
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from DemonstrationCode4
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized SoundsCode4
MovieChat+: Question-aware Sparse Memory for Long Video Question AnsweringCode4
Knowledge Fusion of Chat LLMs: A Preliminary Technical ReportCode4
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement LearningCode4
The case for 4-bit precision: k-bit Inference Scaling LawsCode4
ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object DetectionCode4
DepGraph: Towards Any Structural PruningCode4
Improving Training Stability for Multitask Ranking Models in Recommender SystemsCode4
High-Resolution Image Synthesis with Latent Diffusion ModelsCode4
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open ResourcesCode4
Decoder Tuning: Efficient Language Understanding as DecodingCode4
Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code GenerationCode4
The CLRS-Text Algorithmic Reasoning Language BenchmarkCode4
PointMamba: A Simple State Space Model for Point Cloud AnalysisCode4
Reducing Activation Recomputation in Large Transformer ModelsCode4
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task AdaptationCode4
ReChorus2.0: A Modular and Task-Flexible Recommendation LibraryCode4
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsCode4
ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space ModelCode4
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth ApproachCode4
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object DetectionCode4
SNAC: Multi-Scale Neural Audio CodecCode4
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex InstructionsCode4
Boximator: Generating Rich and Controllable Motions for Video SynthesisCode4
Phoenix: Democratizing ChatGPT across LanguagesCode4
Blendify -- Python rendering framework for BlenderCode4
Benchmarking Retrieval-Augmented Generation for MedicineCode4
Show:102550
← PrevPage 41 of 13200Next →