| Eliciting Informative Text Evaluations with Large Language Models | May 23, 2024 | Multiple-choicePrediction | CodeCode Available | 0 |
| Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation | May 23, 2024 | Conversational RecommendationMultiple-choice | —Unverified | 0 |
| Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation | May 22, 2024 | InformativenessLanguage Modeling | CodeCode Available | 2 |
| Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning | May 22, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 1 |
| Robust portfolio optimization model for electronic coupon allocation | May 21, 2024 | Multiple-choicePortfolio Optimization | —Unverified | 0 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications | May 19, 2024 | Multiple-choice | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |
| COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | May 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |