SOTAVerified|Agents Browse Leaderboard About Blog

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–210 of 251 papers

Title	Date	Tasks	Status	Hype
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints	Jan 14, 2025	Large Language ModelRed Teaming	—Unverified	0
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing	Jul 10, 2024	FairnessRed Teaming	—Unverified	0
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm	Jun 26, 2024	Cross-Lingual TransferRed Teaming	—Unverified	0
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward	Aug 28, 2023	EthicsPhilosophy	—Unverified	0
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming	May 21, 2024	Red Teaming	—Unverified	0
Towards medical AI misalignment: a preliminary study	May 22, 2025	Red Teaming	—Unverified	0
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework	Nov 15, 2023	Red Teaming	—Unverified	0
Towards Red Teaming in Multimodal and Multilingual Translation	Jan 29, 2024	Machine TranslationRed Teaming	—Unverified	0
Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges	May 30, 2025	Red Teaming	—Unverified	0
Understanding and Mitigating Risks of Generative AI in Financial Services	Apr 25, 2025	FairnessRed Teaming	—Unverified	0

Show:10 25 50

← PrevPage 21 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified