| CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks | Jul 3, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction | Jul 3, 2025 | Benchmarking | CodeCode Available | 0 |
| TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | Jul 1, 2025 | BenchmarkingMachine Translation | —Unverified | 0 |
| State and Memory is All You Need for Robust and Reliable AI Agents | Jun 30, 2025 | AllBenchmarking | —Unverified | 0 |
| Point Cloud Compression and Objective Quality Assessment: A Survey | Jun 28, 2025 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation | Jun 26, 2025 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Jun 26, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation | Jun 26, 2025 | AttributeBenchmarking | —Unverified | 0 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Jun 26, 2025 | Benchmarking | —Unverified | 0 |
| FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization | Jun 25, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |