| LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Jul 17, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs | Jul 16, 2024 | Information RetrievalNavigate | CodeCode Available | 0 |
| PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation | Jul 16, 2024 | NavigateVision and Language Navigation | CodeCode Available | 1 |
| Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias | Jul 16, 2024 | MisinformationNavigate | —Unverified | 0 |
| Reliable Reasoning Beyond Natural Language | Jul 16, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Deep Learning Evidence for Global Optimality of Gerver's Sofa | Jul 15, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 0 |
| Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | Jul 13, 2024 | HallucinationNavigate | CodeCode Available | 1 |
| How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs | Jul 12, 2024 | DiversityLanguage Modelling | —Unverified | 0 |
| Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models | Jul 11, 2024 | Natural Language UnderstandingNavigate | CodeCode Available | 0 |
| Towards Explainable Evolution Strategies with Large Language Models | Jul 11, 2024 | Navigate | —Unverified | 0 |