| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents | Jun 13, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| ECBD: Evidence-Centered Benchmark Design for NLP | Jun 13, 2024 | Benchmarking | CodeCode Available | 0 |
| LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living | Jun 13, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| Decoding the Diversity: A Review of the Indic AI Research Landscape | Jun 13, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents | Jun 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| How well it works: Benchmarking performance of GPT models on medical natural language processing tasks | Jun 12, 2024 | Benchmarking | —Unverified | 0 |