| Benchmarking VLMs' Reasoning About Persuasive Atypical Images | Sep 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 |
| Benchmarking Large Language Model Uncertainty for Prompt Optimization | Sep 16, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data | Sep 15, 2024 | Benchmarkingtext annotation | —Unverified | 0 |
| Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering | Sep 13, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study | Sep 13, 2024 | BenchmarkingGrapheme-to-Phoneme Conversion | —Unverified | 0 |
| Text-To-Speech Synthesis In The Wild | Sep 13, 2024 | BenchmarkingSpeaker Recognition | —Unverified | 0 |
| ODAQ: Open Dataset of Audio Quality - Benchmark on GitHub | Sep 13, 2024 | Audio Quality AssessmentBenchmarking | CodeCode Available | 1 |
| Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning | Sep 12, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints | Sep 12, 2024 | Benchmarking | CodeCode Available | 0 |
| Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification | Sep 12, 2024 | BenchmarkingClassification | —Unverified | 0 |
| The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Sep 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine | Sep 12, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I | Sep 12, 2024 | BenchmarkingCPU | CodeCode Available | 0 |
| Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG | Sep 12, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots | Sep 12, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers | Sep 11, 2024 | Benchmarking | —Unverified | 0 |
| Understanding Foundation Models: Are We Back in 1924? | Sep 11, 2024 | Benchmarking | —Unverified | 0 |
| Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition | Sep 11, 2024 | BenchmarkingNovelty Detection | CodeCode Available | 0 |
| Benchmarking 2D Egocentric Hand Pose Datasets | Sep 11, 2024 | Activity RecognitionBenchmarking | —Unverified | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Ransomware Detection Using Machine Learning in the Linux Kernel | Sep 10, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Sub-Genre Classification For Mainstage Dance Music | Sep 10, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations | Sep 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 |
| Selecting Differential Splicing Methods: Practical Considerations | Sep 9, 2024 | Benchmarking | —Unverified | 0 |
| RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks | Sep 9, 2024 | BenchmarkingClick-Through Rate Prediction | —Unverified | 0 |
| NeIn: Telling What You Don't Want | Sep 9, 2024 | BenchmarkingNegation | —Unverified | 0 |
| Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5 | Sep 9, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Assessing SPARQL capabilities of Large Language Models | Sep 9, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 2 |
| DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection | Sep 9, 2024 | Abuse DetectionAbusive Language | —Unverified | 0 |
| CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | Sep 9, 2024 | Benchmarkingknowledge editing | —Unverified | 0 |
| A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making | Sep 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Insights from Benchmarking Frontier Language Models on Web App Code Generation | Sep 8, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm | Sep 6, 2024 | Benchmarkingregression | —Unverified | 0 |
| Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms | Sep 6, 2024 | Bayesian InferenceBenchmarking | —Unverified | 0 |
| PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation | Sep 6, 2024 | Benchmarkingimage-classification | CodeCode Available | 2 |
| Quantum Kernel Methods under Scrutiny: A Benchmarking Study | Sep 6, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| Question-Answering Dense Video Events | Sep 6, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression | Sep 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift | Sep 5, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Sep 5, 2024 | Benchmarking | CodeCode Available | 0 |
| InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management | Sep 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| RTLRewriter: Methodologies for Large Models aided RTL Code Optimization | Sep 4, 2024 | Benchmarking | CodeCode Available | 1 |
| PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation | Sep 4, 2024 | Benchmarking | —Unverified | 0 |
| NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks | Sep 4, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Benchmarking Spurious Bias in Few-Shot Image Classifiers | Sep 4, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study | Sep 3, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs | Sep 3, 2024 | 16kBenchmarking | CodeCode Available | 1 |
| EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Sep 3, 2024 | BenchmarkingMixed Reality | —Unverified | 0 |
| Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture | Sep 3, 2024 | BenchmarkingRAG | —Unverified | 0 |