| A Red Teaming Framework for Securing AI in Maritime Autonomous Systems | Dec 8, 2023 | Red Teaming | —Unverified | 0 |
| Seamless: Multilingual Expressive and Streaming Speech Translation | Dec 8, 2023 | automatic-speech-translationMachine Translation | CodeCode Available | 6 |
| DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions | Dec 7, 2023 | Code GenerationRed Teaming | —Unverified | 0 |
| InfoPattern: Unveiling Information Propagation Patterns in Social Media | Nov 27, 2023 | Red TeamingStance Detection | CodeCode Available | 0 |
| JAB: Joint Adversarial Prompting and Belief Augmentation | Nov 16, 2023 | Red Teaming | —Unverified | 0 |
| RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models | Nov 16, 2023 | Backdoor AttackData Poisoning | —Unverified | 0 |
| Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections | Nov 15, 2023 | Red Teaming | CodeCode Available | 0 |
| Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework | Nov 15, 2023 | Red Teaming | —Unverified | 0 |
| Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment | Nov 15, 2023 | Red TeamingSafety Alignment | CodeCode Available | 1 |
| Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts | Nov 15, 2023 | Adversarial AttackRed Teaming | —Unverified | 0 |