SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 65016550 of 661570 papers

TitleStatusHype
Text2midi: Generating Symbolic Music from CaptionsCode2
Mamba-SEUNet: Mamba UNet for Monaural Speech EnhancementCode2
PruneVid: Visual Token Pruning for Efficient Video Large Language ModelsCode2
Personalized Representation from Personalized GenerationCode2
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
MR-GDINO: Efficient Open-World Continual Object DetectionCode2
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHFCode2
fluke: Federated Learning Utility frameworK for Experimentation and researchCode2
PyBOP: A Python package for battery model optimisation and parameterisationCode2
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent CollaborationCode2
XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented GenerationCode2
ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion ModelCode2
Mapping the Mind of an Instruction-based Image Editing using SMILECode2
Exploiting Multimodal Spatial-temporal Patterns for Video Object TrackingCode2
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding TransformerCode2
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal PropertiesCode2
PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind ChildrenCode2
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
LeviTor: 3D Trajectory Oriented Image-to-Video SynthesisCode2
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint SpaceCode2
Preventing Local Pitfalls in Vector Quantization via Optimal TransportCode2
Tests for model misspecification in simulation-based inference: from local distortions to global model checksCode2
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU RoutingCode2
Next Patch Prediction for Autoregressive Visual GenerationCode2
Learning charges and long-range interactions from energies and forcesCode2
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow MatchingCode2
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous DrivingCode2
Agent-SafetyBench: Evaluating the Safety of LLM AgentsCode2
Fietje: An open, efficient LLM for DutchCode2
DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT SpaceCode2
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation LocalizationCode2
Joint Perception and Prediction for Autonomous Driving: A SurveyCode2
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel PlanningCode2
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace ProjectionCode2
Open Universal Arabic ASR LeaderboardCode2
Large Language Model Enhanced Recommender Systems: A SurveyCode2
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language ModelsCode2
Alignment faking in large language modelsCode2
RelationField: Relate Anything in Radiance FieldsCode2
A Survey on LLM Inference-Time Self-ImprovementCode2
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image SegmentationCode2
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal RecommendationCode2
AnySat: One Earth Observation Model for Many Resolutions, Scales, and ModalitiesCode2
ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecastingCode2
SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented GenerationCode2
Guiding Generative Protein Language Models with Reinforcement LearningCode2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial DomainCode2
Streaming Keyword Spotting Boosted by Cross-layer Discrimination ConsistencyCode2
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion ModelsCode2
Show:102550
← PrevPage 131 of 13232Next →