| African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Jun 20, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| BeHonest: Benchmarking Honesty in Large Language Models | Jun 19, 2024 | BenchmarkingMisinformation | CodeCode Available | 1 |
| Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking | Jun 17, 2024 | BenchmarkingDemand Forecasting | CodeCode Available | 1 |
| MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | Jun 17, 2024 | BenchmarkingFact Checking | CodeCode Available | 1 |
| A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking | Jun 15, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Jun 14, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency | Jun 14, 2024 | Benchmarking | CodeCode Available | 1 |
| LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data | Jun 14, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 |
| SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution | Jun 13, 2024 | BenchmarkingImage Super-Resolution | CodeCode Available | 1 |
| SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models | Jun 13, 2024 | Benchmarking | CodeCode Available | 1 |