| The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong | May 12, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Generation for Chemistry | May 12, 2025 | BenchmarkingRAG | —Unverified | 0 |
| Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems | May 12, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration | May 11, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Optimizing Recommendations using Fine-Tuned LLMs | May 11, 2025 | BenchmarkingRecommendation Systems | —Unverified | 0 |
| From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | May 11, 2025 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 |
| JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes | May 10, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| FNBench: Benchmarking Robust Federated Learning against Noisy Labels | May 10, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA | May 9, 2025 | Benchmarkingscientific discovery | —Unverified | 0 |
| Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy | May 9, 2025 | BenchmarkingSentiment Analysis | —Unverified | 0 |
| The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization | May 9, 2025 | Benchmarking | CodeCode Available | 3 |
| Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information | May 9, 2025 | BenchmarkingForm | —Unverified | 0 |
| Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization | May 8, 2025 | AttributeBenchmarking | —Unverified | 0 |
| Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters | May 8, 2025 | Benchmarking | —Unverified | 0 |
| clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations | May 8, 2025 | BenchmarkingTask-Oriented Dialogue Systems | —Unverified | 0 |
| Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective | May 8, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models | May 8, 2025 | BenchmarkingGraph Representation Learning | CodeCode Available | 1 |
| scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction | May 8, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation | May 8, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge | May 8, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection | May 8, 2025 | BenchmarkingOut-of-Distribution Generalization | —Unverified | 0 |
| Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents | May 8, 2025 | Benchmarking | —Unverified | 0 |
| DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions | May 8, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers | May 7, 2025 | BenchmarkingFault Detection | CodeCode Available | 0 |