| Unraveling the Capabilities of Language Models in News Summarization | Jan 30, 2025 | BenchmarkingFew-Shot Learning | CodeCode Available | 0 |
| Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Jan 30, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | Jan 30, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research | Jan 29, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection | Jan 28, 2025 | Benchmarking | —Unverified | 0 |
| Making Sense of Data in the Wild: Data Analysis Automation at Scale | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share | Jan 27, 2025 | BenchmarkingTransfer Learning | —Unverified | 0 |
| PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | Jan 27, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems | Jan 27, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation | Jan 27, 2025 | BenchmarkingC++ code | —Unverified | 0 |
| Benchmarking Quantum Reinforcement Learning | Jan 27, 2025 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |
| CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry | Jan 26, 2025 | BenchmarkingObject Detection | —Unverified | 0 |
| Beyond Benchmarks: On The False Promise of AI Regulation | Jan 26, 2025 | Benchmarking | —Unverified | 0 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets? | Jan 26, 2025 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 |
| Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study | Jan 25, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking global optimization techniques for unmanned aerial vehicle path planning | Jan 24, 2025 | Benchmarkingglobal-optimization | —Unverified | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems | Jan 24, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning | Jan 23, 2025 | Benchmarkingimage-classification | —Unverified | 0 |
| DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale | Jan 23, 2025 | Benchmarking | —Unverified | 0 |
| You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain | Jan 23, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization | Jan 22, 2025 | Benchmarkingregression | —Unverified | 0 |
| RAG-Reward: Optimizing RAG with Reward Modeling and RLHF | Jan 22, 2025 | BenchmarkingHallucination | —Unverified | 0 |