| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Large-scale Benchmarking of Metaphor-based Optimization Heuristics | Feb 15, 2024 | BenchmarkingExperimental Design | —Unverified | 0 |
| AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator | Feb 15, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 |
| Multi-Fidelity Methods for Optimization: A Survey | Feb 15, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Recommendations for Baselines and Benchmarking Approximate Gaussian Processes | Feb 15, 2024 | BenchmarkingGaussian Processes | —Unverified | 0 |
| Evaluation of simulation methods for tumor subclonal reconstruction | Feb 14, 2024 | Benchmarking | —Unverified | 0 |
| Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking | Feb 14, 2024 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models | Feb 14, 2024 | BenchmarkingDiversity | CodeCode Available | 2 |
| Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms | Feb 14, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking multi-component signal processing methods in the time-frequency plane | Feb 13, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |
| LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents | Feb 13, 2024 | BenchmarkingModel Selection | CodeCode Available | 2 |
| Privacy-Preserving Language Model Inference with Instance Obfuscation | Feb 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| BdSLW60: A Word-Level Bangla Sign Language Dataset | Feb 13, 2024 | BenchmarkingGesture Recognition | CodeCode Available | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Customizable Perturbation Synthesis for Robust SLAM Benchmarking | Feb 12, 2024 | BenchmarkingSimultaneous Localization and Mapping | CodeCode Available | 2 |
| Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems | Feb 12, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT | Feb 12, 2024 | BenchmarkingChunking | —Unverified | 0 |
| AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension | Feb 12, 2024 | 2kAutomatic Speech Recognition | CodeCode Available | 2 |
| Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study | Feb 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Explainable Global Wildfire Prediction Models using Graph Neural Networks | Feb 11, 2024 | BenchmarkingCommunity Detection | CodeCode Available | 1 |
| ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation | Feb 10, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices | Feb 10, 2024 | Benchmarking | —Unverified | 0 |
| Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Feb 9, 2024 | 6D Pose Estimation using RGBBenchmarking | —Unverified | 0 |
| Retrieve, Merge, Predict: Augmenting Tables with Data Lakes | Feb 9, 2024 | AutoMLBenchmarking | CodeCode Available | 1 |
| LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education | Feb 9, 2024 | BenchmarkingChatbot | —Unverified | 0 |