| Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers | Sep 11, 2024 | Benchmarking | —Unverified | 0 |
| Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations | Sep 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 |
| Benchmarking Sub-Genre Classification For Mainstage Dance Music | Sep 10, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Ransomware Detection Using Machine Learning in the Linux Kernel | Sep 10, 2024 | Benchmarking | —Unverified | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | Sep 9, 2024 | Benchmarkingknowledge editing | —Unverified | 0 |
| Selecting Differential Splicing Methods: Practical Considerations | Sep 9, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5 | Sep 9, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks | Sep 9, 2024 | BenchmarkingClick-Through Rate Prediction | —Unverified | 0 |
| NeIn: Telling What You Don't Want | Sep 9, 2024 | BenchmarkingNegation | —Unverified | 0 |
| DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection | Sep 9, 2024 | Abuse DetectionAbusive Language | —Unverified | 0 |
| A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making | Sep 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Quantum Kernel Methods under Scrutiny: A Benchmarking Study | Sep 6, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms | Sep 6, 2024 | Bayesian InferenceBenchmarking | —Unverified | 0 |
| Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm | Sep 6, 2024 | Benchmarkingregression | —Unverified | 0 |
| Question-Answering Dense Video Events | Sep 6, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression | Sep 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Sep 5, 2024 | Benchmarking | CodeCode Available | 0 |
| InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management | Sep 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift | Sep 5, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Spurious Bias in Few-Shot Image Classifiers | Sep 4, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation | Sep 4, 2024 | Benchmarking | —Unverified | 0 |
| NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks | Sep 4, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Sep 3, 2024 | BenchmarkingMixed Reality | —Unverified | 0 |
| Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study | Sep 3, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture | Sep 3, 2024 | BenchmarkingRAG | —Unverified | 0 |
| From Grounding to Planning: Benchmarking Bottlenecks in Web Agents | Sep 3, 2024 | Benchmarking | —Unverified | 0 |
| Revisiting Safe Exploration in Safe Reinforcement learning | Sep 2, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification | Sep 2, 2024 | Benchmarking | —Unverified | 0 |
| A practical generalization metric for deep networks benchmarking | Sep 2, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages | Sep 1, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Accelerating the discovery of steady-states of planetary interior dynamics with machine learning | Aug 30, 2024 | Benchmarking | —Unverified | 0 |
| SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Aug 30, 2024 | BenchmarkingSentiment Analysis | CodeCode Available | 0 |
| Understanding the User: An Intent-Based Ranking Dataset | Aug 30, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction | Aug 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization | Aug 29, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | Aug 28, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities | Aug 27, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts | Aug 27, 2024 | BenchmarkingModel Predictive Control | —Unverified | 0 |
| Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation | Aug 27, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper | Aug 27, 2024 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization | Aug 27, 2024 | 3D Object DetectionBenchmarking | —Unverified | 0 |
| FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting | Aug 27, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Aug 26, 2024 | Benchmarking | —Unverified | 0 |
| Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Aug 26, 2024 | 8kBenchmarking | —Unverified | 0 |
| Comparative Analysis: Violence Recognition from Videos using Transfer Learning | Aug 26, 2024 | Action RecognitionBenchmarking | CodeCode Available | 0 |
| DHP Benchmark: Are LLMs Good NLG Evaluators? | Aug 25, 2024 | Benchmarkingnlg evaluation | —Unverified | 0 |