SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1095111000 of 661570 papers

TitleStatusHype
GRID: A Platform for General Robot Intelligence DevelopmentCode2
PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series ForecastingCode2
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language ModelsCode2
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of DiffusionCode2
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal AssistantsCode2
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision GeneralistsCode2
Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change ProcessCode2
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
GAIA-1: A Generative World Model for Autonomous DrivingCode2
Graph-based Neural Weather Prediction for Limited Area ModelingCode2
nnSAM: Plug-and-play Segment Anything Model Improves nnUNet PerformanceCode2
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question AnsweringCode2
Directly Fine-Tuning Diffusion Models on Differentiable RewardsCode2
One for All: Towards Training One Graph Model for All Classification TasksCode2
UXsim: An open source macroscopic and mesoscopic traffic simulator in Python -- a technical overviewCode2
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized ToolsetsCode2
Denoising Diffusion Bridge ModelsCode2
Transformer-VQ: Linear-Time Transformers via Vector QuantizationCode2
LawBench: Benchmarking Legal Knowledge of Large Language ModelsCode2
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular QuantizersCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
MEM: Multi-Modal Elevation Mapping for Robotics and LearningCode2
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and BeyondCode2
Text-to-3D using Gaussian SplattingCode2
RLLTE: Long-Term Evolution Project of Reinforcement LearningCode2
Cross-Prediction-Powered InferenceCode2
MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural NetworkCode2
Deep Geometrized Cartoon Line InbetweeningCode2
OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANsCode2
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localizationCode2
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and FutureCode2
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis FunctionsCode2
Effective Long-Context Scaling of Foundation ModelsCode2
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement LearningCode2
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language ModelsCode2
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel BaselineCode2
ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and MetricsCode2
M^4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and ModelsCode2
PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for FinanceCode2
ICML 2023 Topological Deep Learning Challenge : Design and ResultsCode2
ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and DesignCode2
Joint Audio and Speech UnderstandingCode2
Detecting and Grounding Multi-Modal Media Manipulation and BeyondCode2
OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event UnderstandingCode2
Traj-LO: In Defense of LiDAR-Only Odometry Using an Effective Continuous-Time TrajectoryCode2
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level VisionCode2
VidChapters-7M: Video Chapters at ScaleCode2
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language ModelsCode2
P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech PromptingCode2
Show:102550
← PrevPage 220 of 13232Next →