| Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems | Mar 5, 2025 | BenchmarkingCPU | CodeCode Available | 0 |
| Technical report of a DMD-based Characterization Method for Vision Sensors | Mar 4, 2025 | BenchmarkingDataset Generation | —Unverified | 0 |
| Optimizing open-domain question answering with graph-based retrieval augmented generation | Mar 4, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| A2Perf: Real-World Autonomous Agents Benchmark | Mar 4, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Evaluation of Architectural Synthesis Using Generative AI | Mar 4, 2025 | Benchmarking | —Unverified | 0 |
| One ruler to measure them all: Benchmarking multilingual long-context language models | Mar 3, 2025 | 8kAll | CodeCode Available | 1 |
| Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics | Mar 3, 2025 | BenchmarkingSpoken Dialogue Systems | —Unverified | 0 |
| AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses | Mar 3, 2025 | Benchmarking | CodeCode Available | 1 |
| Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models | Mar 3, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation | Mar 3, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |