| Benchmarking Causal Study to Interpret Large Language Models for Source Code | Aug 23, 2023 | BenchmarkingCausal Inference | —Unverified | 0 |
| Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0 | Aug 23, 2023 | Benchmarkingregression | CodeCode Available | 0 |
| LLMRec: Benchmarking Large Language Models on Recommendation Task | Aug 23, 2023 | BenchmarkingExplanation Generation | CodeCode Available | 1 |
| Efficient Benchmarking of Language Models | Aug 22, 2023 | BenchmarkingGPU | —Unverified | 0 |
| Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection | Aug 22, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 |
| Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process | Aug 22, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Beyond MD17: the reactive xxMD dataset | Aug 22, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 0 |
| Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models | Aug 21, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| UGSL: A Unified Framework for Benchmarking Graph Structure Learning | Aug 21, 2023 | BenchmarkingGraph structure learning | —Unverified | 0 |
| VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations | Aug 19, 2023 | 6D Pose Estimation using RGBBenchmarking | CodeCode Available | 1 |
| Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks | Aug 17, 2023 | BenchmarkingEEG | CodeCode Available | 0 |
| Benchmarking Neural Network Generalization for Grammar Induction | Aug 16, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Adversarial Robustness of Compressed Deep Learning Models | Aug 16, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| IoT Data Trust Evaluation via Machine Learning | Aug 15, 2023 | BenchmarkingTime Series | CodeCode Available | 0 |
| Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions | Aug 15, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| A Survey on Model Compression for Large Language Models | Aug 15, 2023 | BenchmarkingKnowledge Distillation | —Unverified | 0 |
| Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation | Aug 15, 2023 | BenchmarkingMedical Image Analysis | CodeCode Available | 0 |
| Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models? | Aug 14, 2023 | BenchmarkingDrug Design | CodeCode Available | 1 |
| BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | Aug 11, 2023 | BenchmarkingDecision Making | CodeCode Available | 2 |
| Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields | Aug 11, 2023 | Benchmarking | —Unverified | 0 |
| DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity | Aug 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization | Aug 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations | Aug 10, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation | Aug 10, 2023 | AttributeBenchmarking | —Unverified | 0 |
| Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints | Aug 9, 2023 | Benchmarking | —Unverified | 0 |