| Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents | Oct 29, 2024 | Decision MakingIntent Discovery | —Unverified | 0 |
| Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MARCO: Multi-Agent Real-time Chat Orchestration | Oct 29, 2024 | HallucinationLanguage Modeling | —Unverified | 0 |
| Learning and Unlearning of Fabricated Knowledge in Language Models | Oct 29, 2024 | Data PoisoningLanguage Modeling | —Unverified | 0 |
| SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model | Oct 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | Oct 28, 2024 | Decision MakingLanguage Modeling | —Unverified | 0 |
| LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment | Oct 28, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Large Language Model Benchmarks in Medical Tasks | Oct 28, 2024 | Image CaptioningLanguage Modeling | —Unverified | 0 |