| RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations | Mar 29, 2025 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains | Mar 29, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis | Mar 29, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 |
| MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation | Mar 29, 2025 | Answer GenerationBenchmarking | —Unverified | 0 |
| Generalization Bias in Large Language Model Summarization of Scientific Research | Mar 28, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model | Mar 28, 2025 | Algorithmic TradingBenchmarking | —Unverified | 0 |
| LIM: Large Interpolator Model for Dynamic Reconstruction | Mar 28, 2025 | 4D reconstructionBenchmarking | —Unverified | 0 |
| Benchmarking Ultra-Low-Power μNPUs | Mar 28, 2025 | Benchmarking | —Unverified | 0 |
| Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery | Mar 28, 2025 | BenchmarkingSegmentation | —Unverified | 0 |
| Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors | Mar 28, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |