SOTAVerified

Adversarial Attack

An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.

Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks

Papers

Showing 150 of 1808 papers

TitleStatusHype
Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense MechanismsCode5
Universal and Transferable Adversarial Attacks on Aligned Language ModelsCode4
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal AlignmentCode2
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse AutoencodersCode2
Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A SurveyCode2
On Discrete Prompt Optimization for Diffusion ModelsCode2
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language ModelsCode2
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy ProtectionCode2
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial AttackCode2
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous DrivingCode2
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language ModelsCode2
Fast Adversarial Attacks on Language Models In One GPU MinuteCode2
L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksCode2
Diffusion Models for Imperceptible and Transferable Adversarial AttackCode2
Ignore Previous Prompt: Attack Techniques For Language ModelsCode2
Efficient Neural Network Analysis with Sum-of-InfeasibilitiesCode2
Fast Minimum-norm Adversarial Attacks through Adaptive Norm ConstraintsCode2
Attacking and Defending Machine Learning Applications of Public CloudCode2
Backdoor Learning: A SurveyCode2
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLPCode2
BAE: BERT-based Adversarial Examples for Text ClassificationCode2
Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical StudiesCode2
A Little Fog for a Large TurnCode2
Adversarial Attacks and Defenses in Images, Graphs and Text: A ReviewCode2
Foolbox: A Python toolbox to benchmark the robustness of machine learning modelsCode2
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion ModelsCode1
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot NavigationCode1
Learning Safety Constraints for Large Language ModelsCode1
3D Gaussian Splat VulnerabilitiesCode1
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM AgentsCode1
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation ModelsCode1
Fast and Low-Cost Genomic Foundation Models via Outlier RemovalCode1
sudo rm -rf agentic_securityCode1
CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security DataCode1
Data-free Universal Adversarial Perturbation with Pseudo-semantic PriorCode1
Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial TrainingCode1
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning ModelsCode1
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera SystemsCode1
Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan ScriptCode1
A2RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image FusionCode1
Adversarial Vulnerabilities in Large Language Models for Time Series ForecastingCode1
Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language ModelsCode1
Hiding Faces in Plain Sight: Defending DeepFakes by Disrupting Face DetectionCode1
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language AttackCode1
Transferable Adversarial Attacks on SAM and Its Downstream ModelsCode1
Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein modelCode1
Ensemble everything everywhere: Multi-scale aggregation for adversarial robustnessCode1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
Show:102550
← PrevPage 1 of 37Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Xu et al.Attack: PGD2078.68Unverified
23-ensemble of multi-resolution self-ensemblesAttack: AutoAttack78.13Unverified
3TRADES-ANCRA/ResNet18Attack: AutoAttack59.7Unverified
4AdvTraining [madry2018]Attack: PGD2048.44Unverified
5TRADES [zhang2019b]Attack: PGD2045.9Unverified
6XU-NetRobust Accuracy1Unverified
#ModelMetricClaimedVerifiedStatus
13-ensemble of multi-resolution self-ensemblesAttack: AutoAttack51.28Unverified
2multi-resolution self-ensemblesAttack: AutoAttack47.85Unverified