SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1815118200 of 474278 papers

TitleStatusHype
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find ThemCode1
Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AICode1
Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case StudyCode1
A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language ModelsCode1
A Cartesian Encoding Graph Neural Network for Crystal Structures Property Prediction: Application to Thermal Ellipsoid EstimationCode1
Beyond Message Passing: Neural Graph Pattern MachineCode1
Distillation-Driven Diffusion Model for Multi-Scale MRI Super-Resolution: Make 1.5T MRI Great AgainCode1
HSRMamba: Contextual Spatial-Spectral State Space Model for Single Image Hyperspectral Super-ResolutionCode1
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning PerturbationCode1
o3-mini vs DeepSeek-R1: Which One is Safer?Code1
Efficient Neural Theorem Proving via Fine-grained Proof Structure AnalysisCode1
How to Select Datapoints for Efficient Human Evaluation of NLG Models?Code1
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval DefectsCode1
Accuracy and Robustness of Weight-Balancing Methods for Training PINNsCode1
MatIR: A Hybrid Mamba-Transformer Image Restoration ModelCode1
Wearanize+: A Multimodal Dataset for Evaluating Wearable Technologies in Sleep ResearchCode1
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-TrainingCode1
Towards Making Flowchart Images Machine InterpretableCode1
2SSP: A Two-Stage Framework for Structured Pruning of LLMsCode1
Yin-Yang: Developing Motifs With Long-Term Structure And ControllabilityCode1
TransRAD: Retentive Vision Transformer for Enhanced Radar Object DetectionCode1
Improving Your Model Ranking on Chatbot Arena by Vote RiggingCode1
Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A SurveyCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
ContourFormer:Real-Time Contour-Based End-to-End Instance Segmentation TransformerCode1
acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge DevicesCode1
Efficient Redundancy Reduction for Open-Vocabulary Semantic SegmentationCode1
RadioLLM: Introducing Large Language Model into Cognitive Radio via Hybrid Prompt and Token ReprogrammingsCode1
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset ExpansionCode1
Can Transformers Learn Full Bayesian Inference in Context?Code1
xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM JailbreakingCode1
Bayesian Analyses of Structural Vector Autoregressions with Sign, Zero, and Narrative Restrictions Using the R Package bsvarSIGNsCode1
RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative PerceptionCode1
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice RepresentationCode1
Dream to Drive with Predictive Individual World ModelCode1
Growing the Efficient Frontier on Panel TreesCode1
Ultra-high resolution multimodal MRI densely labelled holistic structural brain atlasCode1
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop DataCode1
CascadeV: An Implementation of Wurstchen Architecture for Video GenerationCode1
VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health RecordsCode1
SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series ForecastingCode1
Multi-Objective Reinforcement Learning for Power Grid Topology ControlCode1
Membership Inference Attacks Against Vision-Language ModelsCode1
Return of the Encoder: Maximizing Parameter Efficiency for SLMsCode1
SPECIAL: Zero-shot Hyperspectral Image Classification With CLIPCode1
Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge GraphsCode1
Atla Selene Mini: A General Purpose Evaluation ModelCode1
CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentationCode1
SeqSeg: Learning Local Segments for Automatic Vascular Model ConstructionCode1
Show:102550
← PrevPage 364 of 9486Next →