| Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room | Apr 22, 2025 | BenchmarkingFairness | —Unverified | 0 |
| WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks | Apr 22, 2025 | Benchmarking | CodeCode Available | 2 |
| Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation | Apr 21, 2025 | Benchmarking | CodeCode Available | 0 |
| Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture | Apr 21, 2025 | Benchmarkingclass-incremental learning | —Unverified | 0 |
| Establishing Reliability Metrics for Reward Models in Large Language Models | Apr 21, 2025 | Benchmarking | —Unverified | 0 |
| Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues | Apr 21, 2025 | BenchmarkingSpeaker Identification | —Unverified | 0 |
| IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays | Apr 20, 2025 | 3D ReconstructionAnatomy | —Unverified | 0 |
| A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents | Apr 20, 2025 | BenchmarkingTask Planning | —Unverified | 0 |
| Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation | Apr 19, 2025 | BenchmarkingImage Restoration | —Unverified | 0 |
| LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers | Apr 19, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 |
| Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering | Apr 19, 2025 | BenchmarkingDataset Generation | —Unverified | 0 |
| AI Idea Bench 2025: AI Research Idea Generation Benchmark | Apr 19, 2025 | Benchmarkingscientific discovery | —Unverified | 0 |
| CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Apr 19, 2025 | Benchmarking | —Unverified | 0 |
| Integrated Super-resolution Sensing and Symbiotic Communication with 3D Sparse MIMO for Low-Altitude UAV Swarm | Apr 18, 2025 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Apr 18, 2025 | Benchmarking | —Unverified | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 |
| Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies | Apr 17, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking LLM-based Relevance Judgment Methods | Apr 17, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 0 |
| Benchmarking Multi-National Value Alignment for Large Language Models | Apr 17, 2025 | Benchmarking | —Unverified | 0 |
| ALT: A Python Package for Lightweight Feature Representation in Time Series Classification | Apr 17, 2025 | BenchmarkingTime Series | —Unverified | 0 |
| Local Data Quantity-Aware Weighted Averaging for Federated Learning with Dishonest Clients | Apr 17, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| Featuremetric benchmarking: Quantum computer benchmarks based on circuit features | Apr 17, 2025 | Benchmarking | —Unverified | 0 |
| pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild | Apr 16, 2025 | Benchmarkingobject-detection | —Unverified | 0 |
| Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking Study | Apr 16, 2025 | BenchmarkingContinual Learning | CodeCode Available | 0 |