| Hyperspectral Anomaly Detection Methods: A Survey and Comparative Study | Jul 8, 2025 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations | Jul 8, 2025 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 0 |
| Inaugural MOASEI Competition at AAMAS'2025: A Technical Report | Jul 7, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning | Jul 4, 2025 | BenchmarkingGraph Generation | CodeCode Available | 2 |
| STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking | Jul 4, 2025 | BenchmarkingNavigate | CodeCode Available | 0 |
| LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction | Jul 3, 2025 | Benchmarking | CodeCode Available | 0 |
| CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks | Jul 3, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data | Jul 3, 2025 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | Jul 1, 2025 | BenchmarkingMachine Translation | —Unverified | 0 |