| DateLogicQA: Benchmarking Temporal Biases in Large Language Models | Dec 17, 2024 | Benchmarking | CodeCode Available | 0 |
| ShiftedBronzes: Benchmarking and Analysis of Domain Fine-Grained Classification in Open-World Settings | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Understanding Compositional Relational Reasoning of LLMs | Dec 17, 2024 | BenchmarkingRelational Reasoning | CodeCode Available | 0 |
| C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Dec 17, 2024 | BenchmarkingRAG | —Unverified | 0 |
| Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration | Dec 17, 2024 | BenchmarkingFace Generation | —Unverified | 0 |
| AI PERSONA: Towards Life-long Personalization of LLMs | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI | Dec 17, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Selective Shot Learning for Code Explanation | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| How Different AI Chatbots Behave? Benchmarking Large Language Models in Behavioral Economics Games | Dec 16, 2024 | BenchmarkingChatbot | —Unverified | 0 |