| Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage | Dec 20, 2024 | AttributeBenchmarking | —Unverified | 0 |
| AI-generated Image Quality Assessment in Visual Communication | Dec 20, 2024 | BenchmarkingImage Quality Assessment | CodeCode Available | 0 |
| Enriching Social Science Research via Survey Item Linking | Dec 20, 2024 | BenchmarkingEntity Disambiguation | CodeCode Available | 0 |
| Benchmarking LLMs and SLMs for patient reported outcomes | Dec 20, 2024 | BenchmarkingPrivacy Preserving | —Unverified | 0 |
| A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| TelcoLM: collecting data, adapting, and benchmarking language models for the telecommunication domain | Dec 20, 2024 | Benchmarking | —Unverified | 0 |
| Pitfalls of topology-aware image segmentation | Dec 19, 2024 | BenchmarkingImage Segmentation | —Unverified | 0 |
| AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Dec 18, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow | Dec 18, 2024 | Benchmarking | —Unverified | 0 |
| Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Dec 18, 2024 | BenchmarkingPosition | —Unverified | 0 |