SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1145111500 of 661570 papers

TitleStatusHype
WizMap: Scalable Interactive Visualization for Exploring Large Machine Learning EmbeddingsCode2
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsCode2
QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust ControlCode2
CMMLU: Measuring massive multitask language understanding in ChineseCode2
PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEsCode2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image SynthesisCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous DrivingCode2
Segment Any Point Cloud Sequences by Distilling Vision Foundation ModelsCode2
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly DetectionCode2
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic DataCode2
Fast Training of Diffusion Models with Masked TransformersCode2
LargeST: A Benchmark Dataset for Large-Scale Traffic ForecastingCode2
TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series ForecastingCode2
TryOnDiffusion: A Tale of Two UNetsCode2
NodeFormer: A Scalable Graph Structure Learning Transformer for Node ClassificationCode2
MiniLLM: Knowledge Distillation of Large Language ModelsCode2
Hidden Biases of End-to-End Driving ModelsCode2
Parting with Misconceptions about Learning-based Vehicle Motion PlanningCode2
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuningCode2
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language ModelsCode2
Efficient 3D Semantic Segmentation with Superpoint TransformerCode2
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language ModelsCode2
Controlling Text-to-Image Diffusion by Orthogonal FinetuningCode2
Scalable 3D Captioning with Pretrained ModelsCode2
Valley: Video Assistant with Large Language model Enhanced abilitYCode2
The Devil is in the Details: On the Pitfalls of Event Extraction EvaluationCode2
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained OptimizationCode2
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular PotentialsCode2
Mind2Web: Towards a Generalist Agent for the WebCode2
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point CloudsCode2
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision TransformersCode2
FasterViT: Fast Vision Transformers with Hierarchical AttentionCode2
Prodigy: An Expeditiously Adaptive Parameter-Free LearnerCode2
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene UnderstandingCode2
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated CasesCode2
Matting AnythingCode2
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for FinanceCode2
Prompt Injection attack against LLM-integrated ApplicationsCode2
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning OptimizationCode2
StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street ViewsCode2
Does Image Anonymization Impact Computer Vision Training?Code2
RETA-LLM: A Retrieval-Augmented Large Language Model ToolkitCode2
K2: A Foundation Language Model for Geoscience Knowledge Understanding and UtilizationCode2
ReliableSwap: Boosting General Face Swapping Via Reliable SupervisionCode2
UCTB: An Urban Computing Tool Box for Building Spatiotemporal Prediction ServicesCode2
On the Reliability of Watermarks for Large Language ModelsCode2
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion modelsCode2
ModuleFormer: Modularity Emerges from Mixture-of-ExpertsCode2
Show:102550
← PrevPage 230 of 13232Next →