| Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs | Feb 24, 2024 | BenchmarkingKnowledge Graphs | —Unverified | 0 |
| Benchmarking Observational Studies with Experimental Data under Right-Censoring | Feb 23, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking the Robustness of Panoptic Segmentation for Automated Driving | Feb 23, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 |
| PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models | Feb 21, 2024 | BenchmarkingForm | CodeCode Available | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 |
| CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models | Feb 21, 2024 | Benchmarking | —Unverified | 0 |
| MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms | Feb 21, 2024 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers | Feb 20, 2024 | Benchmarking | —Unverified | 0 |
| Synthetic location trajectory generation using categorical diffusion models | Feb 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation | Feb 19, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies | Feb 19, 2024 | Benchmarking | CodeCode Available | 0 |
| Learning Disentangled Audio Representations through Controlled Synthesis | Feb 16, 2024 | BenchmarkingDisentanglement | —Unverified | 0 |
| VATr++: Choose Your Words Wisely for Handwritten Text Generation | Feb 16, 2024 | BenchmarkingText Generation | —Unverified | 0 |
| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Recommendations for Baselines and Benchmarking Approximate Gaussian Processes | Feb 15, 2024 | BenchmarkingGaussian Processes | —Unverified | 0 |
| Multi-Fidelity Methods for Optimization: A Survey | Feb 15, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Large-scale Benchmarking of Metaphor-based Optimization Heuristics | Feb 15, 2024 | BenchmarkingExperimental Design | —Unverified | 0 |
| SAWEC: Sensing-Assisted Wireless Edge Computing | Feb 15, 2024 | BenchmarkingEdge-computing | CodeCode Available | 0 |
| Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data | Feb 15, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| From Variability to Stability: Advancing RecSys Benchmarking Practices | Feb 15, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| Evaluation of simulation methods for tumor subclonal reconstruction | Feb 14, 2024 | Benchmarking | —Unverified | 0 |
| Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms | Feb 14, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking multi-component signal processing methods in the time-frequency plane | Feb 13, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |
| Privacy-Preserving Language Model Inference with Instance Obfuscation | Feb 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |