| Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time | Sep 20, 2024 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks | Sep 20, 2024 | Benchmarkingobject-detection | —Unverified | 0 |
| YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models | Sep 20, 2024 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions | Sep 20, 2024 | BenchmarkingSensitivity | CodeCode Available | 0 |
| CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data | Sep 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards | Sep 19, 2024 | Benchmarking | CodeCode Available | 0 |
| MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | Sep 19, 2024 | Benchmarking | —Unverified | 0 |
| Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation | Sep 19, 2024 | BenchmarkingSocial Navigation | —Unverified | 0 |
| Hard-Label Cryptanalytic Extraction of Neural Network Models | Sep 18, 2024 | Benchmarking | CodeCode Available | 0 |
| PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models | Sep 18, 2024 | BenchmarkingModel Selection | CodeCode Available | 0 |