SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 995110000 of 661570 papers

TitleStatusHype
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware DecodingCode2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture DetectionCode2
Open-Vocabulary Segmentation with Unpaired Mask-Text SupervisionCode2
Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music AudioCode2
Attacks, Defenses and Evaluations for LLM Conversation Safety: A SurveyCode2
Generalized Portrait Quality AssessmentCode2
Extreme Video Compression with Pre-trained Diffusion ModelsCode2
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven AgentsCode2
Instruction Tuning for Secure Code GenerationCode2
LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized RecommendationsCode2
BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image SegmentationCode2
Learning to Produce Semi-dense Correspondences for Visual LocalizationCode2
DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA EmbeddingsCode2
Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and FeedbackCode2
RBF-PINN: Non-Fourier Positional Embedding in Physics-Informed Neural NetworksCode2
An Embarrassingly Simple Approach for LLM with Strong ASR CapacityCode2
Can LLMs Learn New Concepts Incrementally without Forgetting?Code2
Test-Time Backdoor Attacks on Multimodal Large Language ModelsCode2
Learning Continuous 3D Words for Text-to-Image GenerationCode2
Transductive Active Learning: Theory and ApplicationsCode2
LLaGA: Large Language and Graph AssistantCode2
Translating Images to Road Network: A Sequence-to-Sequence PerspectiveCode2
A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein GenerationCode2
InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference AlignmentCode2
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic ManipulationCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
ChatCell: Facilitating Single-Cell Analysis with Natural LanguageCode2
Higher Layers Need More LoRA ExpertsCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
COLD-Attack: Jailbreaking LLMs with Stealthiness and ControllabilityCode2
eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction DataCode2
One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive LearningCode2
Mercury: A Code Efficiency Benchmark for Code Large Language ModelsCode2
Fairness Evaluation for Uplift Modeling in the Absence of Ground TruthCode2
Do Membership Inference Attacks Work on Large Language Models?Code2
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity KnowledgeCode2
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Cartesian atomic cluster expansion for machine learning interatomic potentialsCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended TasksCode2
KVQ: Kwai Video Quality Assessment for Short-form VideosCode2
ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary PlanningCode2
Feature Mapping in Physics-Informed Neural Networks (PINNs)Code2
A Change Detection Reality CheckCode2
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine TranslatorsCode2
UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph ConstructionCode2
Neural SPH: Improved Neural Modeling of Lagrangian Fluid DynamicsCode2
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learningCode2
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction FollowingCode2
Show:102550
← PrevPage 200 of 13232Next →