| Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis | Jan 6, 2025 | BenchmarkingImage Enhancement | CodeCode Available | 1 |
| Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks | Jan 5, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation | Jan 3, 2025 | BenchmarkingCrowd Counting | CodeCode Available | 0 |
| AI-Powered Cow Detection in Complex Farm Environments | Jan 3, 2025 | Benchmarking | —Unverified | 0 |
| QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture | Jan 3, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 |
| PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | Jan 3, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Constraint-Based Bayesian Structure Learning Algorithms: Role of Network Topology | Jan 2, 2025 | BenchmarkingSensitivity | —Unverified | 0 |
| MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception | Jan 2, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 |
| CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | Jan 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |