| SAWEC: Sensing-Assisted Wireless Edge Computing | Feb 15, 2024 | BenchmarkingEdge-computing | CodeCode Available | 0 |
| AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator | Feb 15, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 |
| From Variability to Stability: Advancing RecSys Benchmarking Practices | Feb 15, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| Multi-Fidelity Methods for Optimization: A Survey | Feb 15, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Evaluation of simulation methods for tumor subclonal reconstruction | Feb 14, 2024 | Benchmarking | —Unverified | 0 |
| Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking | Feb 14, 2024 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models | Feb 14, 2024 | BenchmarkingDiversity | CodeCode Available | 2 |
| Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms | Feb 14, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking multi-component signal processing methods in the time-frequency plane | Feb 13, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |
| BdSLW60: A Word-Level Bangla Sign Language Dataset | Feb 13, 2024 | BenchmarkingGesture Recognition | CodeCode Available | 0 |
| LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents | Feb 13, 2024 | BenchmarkingModel Selection | CodeCode Available | 2 |
| Privacy-Preserving Language Model Inference with Instance Obfuscation | Feb 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Customizable Perturbation Synthesis for Robust SLAM Benchmarking | Feb 12, 2024 | BenchmarkingSimultaneous Localization and Mapping | CodeCode Available | 2 |
| Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems | Feb 12, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT | Feb 12, 2024 | BenchmarkingChunking | —Unverified | 0 |
| AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension | Feb 12, 2024 | 2kAutomatic Speech Recognition | CodeCode Available | 2 |
| Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study | Feb 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Explainable Global Wildfire Prediction Models using Graph Neural Networks | Feb 11, 2024 | BenchmarkingCommunity Detection | CodeCode Available | 1 |
| ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation | Feb 10, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices | Feb 10, 2024 | Benchmarking | —Unverified | 0 |
| LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education | Feb 9, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Feb 9, 2024 | 6D Pose Estimation using RGBBenchmarking | —Unverified | 0 |
| Retrieve, Merge, Predict: Augmenting Tables with Data Lakes | Feb 9, 2024 | AutoMLBenchmarking | CodeCode Available | 1 |
| A Functional Analysis Approach to Symbolic Regression | Feb 9, 2024 | Benchmarkingregression | —Unverified | 0 |
| Transparent and Scrutable Recommendations Using Natural Language User Profiles | Feb 8, 2024 | BenchmarkingDescriptive | CodeCode Available | 0 |
| Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction | Feb 8, 2024 | BenchmarkingFace Image Quality | —Unverified | 0 |
| Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset | Feb 8, 2024 | Benchmarking | CodeCode Available | 0 |
| SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | Feb 8, 2024 | BenchmarkingDiversity | CodeCode Available | 7 |
| Improved off-policy training of diffusion samplers | Feb 7, 2024 | Benchmarking | CodeCode Available | 1 |
| BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior | Feb 7, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| Towards Biologically Plausible and Private Gene Expression Data Generation | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology | Feb 6, 2024 | AllBenchmarking | CodeCode Available | 2 |
| LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K | Feb 6, 2024 | 16kBenchmarking | CodeCode Available | 2 |
| Quantitative Metrics for Benchmarking Medical Image Harmonization | Feb 6, 2024 | AnatomyBenchmarking | —Unverified | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection | Feb 6, 2024 | Benchmarking | CodeCode Available | 0 |
| Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation | Feb 5, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| PowerGraph: A power grid benchmark dataset for graph neural networks | Feb 5, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching | Feb 5, 2024 | BenchmarkingSentence | CodeCode Available | 1 |
| Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations | Feb 3, 2024 | Benchmarking | CodeCode Available | 0 |
| EffiBench: Benchmarking the Efficiency of Automatically Generated Code | Feb 3, 2024 | BenchmarkingCode Completion | CodeCode Available | 2 |
| Probing Critical Learning Dynamics of PLMs for Hate Speech Detection | Feb 3, 2024 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning | Feb 3, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| Can LLMs perform structured graph reasoning? | Feb 2, 2024 | BenchmarkingNavigate | CodeCode Available | 0 |
| Variational Quantum Circuits Enhanced Generative Adversarial Network | Feb 2, 2024 | BenchmarkingGenerative Adversarial Network | —Unverified | 0 |
| Benchmarking Spiking Neural Network Learning Methods with Varying Locality | Feb 1, 2024 | Benchmarking | —Unverified | 0 |
| MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures | Feb 1, 2024 | AnatomyBenchmarking | —Unverified | 0 |