SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 65266550 of 474278 papers

TitleStatusHype
Agent-SafetyBench: Evaluating the Safety of LLM AgentsCode2
LeviTor: 3D Trajectory Oriented Image-to-Video SynthesisCode2
Preventing Local Pitfalls in Vector Quantization via Optimal TransportCode2
Learning charges and long-range interactions from energies and forcesCode2
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow MatchingCode2
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation LocalizationCode2
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language ModelsCode2
RelationField: Relate Anything in Radiance FieldsCode2
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace ProjectionCode2
Alignment faking in large language modelsCode2
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel PlanningCode2
AnySat: One Earth Observation Model for Many Resolutions, Scales, and ModalitiesCode2
Large Language Model Enhanced Recommender Systems: A SurveyCode2
Joint Perception and Prediction for Autonomous Driving: A SurveyCode2
Open Universal Arabic ASR LeaderboardCode2
A Survey on LLM Inference-Time Self-ImprovementCode2
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image SegmentationCode2
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal RecommendationCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecastingCode2
AIR-Bench: Automated Heterogeneous Information Retrieval BenchmarkCode2
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial DomainCode2
Streaming Keyword Spotting Boosted by Cross-layer Discrimination ConsistencyCode2
Guiding Generative Protein Language Models with Reinforcement LearningCode2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
Show:102550
← PrevPage 262 of 18972Next →