| AgentSims: An Open-Source Sandbox for Large Language Model Evaluation | Aug 8, 2023 | Language Model EvaluationLanguage Modeling | CodeCode Available | 2 | 5 |
| Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs | Feb 4, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Feb 13, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| 500xCompressor: Generalized Prompt Compression for Large Language Models | Aug 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms | Feb 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion | Feb 4, 2024 | In-Context LearningKnowledge Graph Completion | CodeCode Available | 2 | 5 |
| Introducing Visual Perception Token into Multimodal Large Language Model | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems | Jul 15, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| Large Language Model Instruction Following: A Survey of Progresses and Challenges | Mar 18, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |