| Beyond Optimism: Exploration With Partially Observable Rewards | Jun 20, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 0 |
| FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability | Jun 20, 2024 | BenchmarkingFairness | CodeCode Available | 0 |
| CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines | Jun 20, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Jun 20, 2024 | Animal Pose EstimationAutonomous Driving | —Unverified | 0 |
| DASB -- Discrete Audio and Speech Benchmark | Jun 20, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| Selected Languages are All You Need for Cross-lingual Truthfulness Transfer | Jun 20, 2024 | AllBenchmarking | CodeCode Available | 0 |
| Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary | Jun 20, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Jun 20, 2024 | Animal Pose EstimationBenchmarking | —Unverified | 0 |
| Resource-efficient Medical Image Analysis with Self-adapting Forward-Forward Networks | Jun 20, 2024 | BenchmarkingMedical Image Analysis | —Unverified | 0 |
| QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules | Jun 20, 2024 | Benchmarking | CodeCode Available | 0 |