| Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology | Mar 11, 2024 | BenchmarkingContent-Based Image Retrieval | CodeCode Available | 1 |
| Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark | Mar 9, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Mar 8, 2024 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Mar 8, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 |
| R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations | Mar 7, 2024 | Benchmarking | CodeCode Available | 1 |
| Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation | Mar 7, 2024 | BenchmarkingMultimodal Recommendation | CodeCode Available | 1 |
| Benchmarking Segmentation Models with Mask-Preserved Attribute Editing | Mar 2, 2024 | AttributeBenchmarking | CodeCode Available | 1 |
| TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs | Mar 1, 2024 | Benchmarking | CodeCode Available | 1 |
| Efficient Lifelong Model Evaluation in an Era of Rapid Progress | Feb 29, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Feb 28, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 |