| Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets | Jan 29, 2024 | BenchmarkingMachine Translation | CodeCode Available | 1 |
| SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval | Jan 24, 2024 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception | Jan 24, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving | Jan 14, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital Twins | Jan 6, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| German Text Embedding Clustering Benchmark | Jan 5, 2024 | BenchmarkingClustering | CodeCode Available | 1 |
| FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models | Jan 1, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Large Language Models on Controllable Generation under Diversified Instructions | Jan 1, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |