| Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Jan 30, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Unraveling the Capabilities of Language Models in News Summarization | Jan 30, 2025 | BenchmarkingFew-Shot Learning | CodeCode Available | 0 |
| Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms | Jan 30, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research | Jan 29, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection | Jan 28, 2025 | Benchmarking | —Unverified | 0 |
| Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation | Jan 27, 2025 | BenchmarkingC++ code | —Unverified | 0 |
| A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems | Jan 27, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | Jan 27, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Benchmarking Quantum Reinforcement Learning | Jan 27, 2025 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |
| IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share | Jan 27, 2025 | BenchmarkingTransfer Learning | —Unverified | 0 |
| Making Sense of Data in the Wild: Data Analysis Automation at Scale | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets? | Jan 26, 2025 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 |
| CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry | Jan 26, 2025 | BenchmarkingObject Detection | —Unverified | 0 |
| Beyond Benchmarks: On The False Promise of AI Regulation | Jan 26, 2025 | Benchmarking | —Unverified | 0 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study | Jan 25, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking global optimization techniques for unmanned aerial vehicle path planning | Jan 24, 2025 | Benchmarkingglobal-optimization | —Unverified | 0 |
| Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems | Jan 24, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning | Jan 23, 2025 | Benchmarkingimage-classification | —Unverified | 0 |
| You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain | Jan 23, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale | Jan 23, 2025 | Benchmarking | —Unverified | 0 |
| CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization | Jan 22, 2025 | Benchmarkingregression | —Unverified | 0 |
| Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities | Jan 22, 2025 | BenchmarkingReferring Expression | —Unverified | 0 |
| Leveraging LLMs to Create a Haptic Devices' Recommendation System | Jan 22, 2025 | Benchmarking | —Unverified | 0 |
| Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Jan 22, 2025 | Benchmarking | CodeCode Available | 0 |
| RAG-Reward: Optimizing RAG with Reward Modeling and RLHF | Jan 22, 2025 | BenchmarkingHallucination | —Unverified | 0 |
| Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs) | Jan 21, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes | Jan 21, 2025 | Benchmarking | —Unverified | 0 |
| Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning | Jan 21, 2025 | BenchmarkingContinual Learning | —Unverified | 0 |
| Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems | Jan 21, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing | Jan 20, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model | Jan 20, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Large Language Models via Random Variables | Jan 20, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version | Jan 18, 2025 | Benchmarking | —Unverified | 0 |
| FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring | Jan 17, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance | Jan 17, 2025 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 0 |
| Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data | Jan 16, 2025 | BenchmarkingClustering | —Unverified | 0 |
| PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU | Jan 16, 2025 | Benchmarkingcontinuous-control | CodeCode Available | 0 |
| Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction | Jan 15, 2025 | Activity PredictionBenchmarking | —Unverified | 0 |
| Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging | Jan 15, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents | Jan 15, 2025 | BenchmarkingOptical Character Recognition (OCR) | —Unverified | 0 |
| Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles | Jan 15, 2025 | Benchmarking | CodeCode Available | 0 |
| Off-policy Evaluation for Payments at Adyen | Jan 15, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval | Jan 15, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning | Jan 14, 2025 | BenchmarkingManagement | —Unverified | 0 |
| Keras Sig: Efficient Path Signature Computation on GPU in Keras 3 | Jan 14, 2025 | BenchmarkingC++ code | —Unverified | 0 |
| Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition | Jan 14, 2025 | Activity RecognitionBenchmarking | —Unverified | 0 |
| Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving | Jan 14, 2025 | Autonomous DrivingBenchmarking | —Unverified | 0 |