SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 99269950 of 177340 papers

TitleStatusHype
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon TasksCode2
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile ApplicationsCode2
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language ModelsCode2
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in AlignmentCode2
AgentCourt: Simulating Court with Adversarial Evolvable Lawyer AgentsCode2
Accelerating Giant Impact Simulations with Machine LearningCode2
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned PolicyCode2
UTrack: Multi-Object Tracking with Uncertain DetectionsCode2
AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation ExtractionCode2
PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease SegmentationCode2
A Survey on Mixup Augmentations and BeyondCode2
PiEEG-16 to Measure 16 EEG Channels with Raspberry Pi for Brain-Computer Interfaces and EEG devicesCode2
The CMA Evolution Strategy: A TutorialCode2
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model InitializationCode2
MOSS: Enabling Code-Driven Evolution and Context Management for AI AgentsCode2
A Survey on the Honesty of Large Language ModelsCode2
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D ReconstructionCode2
Spiking Transformer with Spatial-Temporal AttentionCode2
Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal MaskingCode2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?Code2
PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly DetectionCode2
End-to-end Piano Performance-MIDI to Score Conversion with TransformersCode2
Mamba in Vision: A Comprehensive Survey of Techniques and ApplicationsCode2
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesCode2
Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing ImagesCode2
Show:102550
← PrevPage 398 of 7094Next →