| Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control | Jul 14, 2024 | Benchmarking | —Unverified | 0 |
| NativQA: Multilingual Culturally-Aligned Natural Query for LLMs | Jul 13, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem | Jul 13, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment | Jul 12, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| A Comprehensive Survey on Retrieval Methods in Recommender Systems | Jul 11, 2024 | BenchmarkingRecommendation Systems | —Unverified | 0 |
| Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| How Aligned are Different Alignment Metrics? | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction | Jul 9, 2024 | Benchmarking | CodeCode Available | 0 |
| Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability | Jul 9, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems | Jul 9, 2024 | BenchmarkingClustering | —Unverified | 0 |
| GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation | Jul 8, 2024 | BenchmarkingGraph Embedding | —Unverified | 0 |
| Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation Experiments | Jul 8, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| TARGO: Benchmarking Target-driven Object Grasping under Occlusions | Jul 8, 2024 | BenchmarkingObject | —Unverified | 0 |
| MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition | Jul 8, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| A Benchmark for Multi-speaker Anonymization | Jul 8, 2024 | BenchmarkingDisentanglement | —Unverified | 0 |
| Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs | Jul 6, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano | Jul 5, 2024 | AttributeBenchmarking | —Unverified | 0 |
| Benchmarking GNNs Using Lightning Network Data | Jul 5, 2024 | Benchmarking | —Unverified | 0 |
| Towards Stable 3D Object Detection | Jul 5, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation | Jul 4, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias | Jul 3, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 |
| Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms | Jul 3, 2024 | BenchmarkingCPU | —Unverified | 0 |
| TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations | Jul 2, 2024 | Benchmarkingtext-to-speech | —Unverified | 0 |
| Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks | Jul 2, 2024 | Activity PredictionAnomaly Detection | CodeCode Available | 0 |
| Open foundation models for Azerbaijani language | Jul 2, 2024 | Benchmarking | —Unverified | 0 |
| ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions | Jul 1, 2024 | BenchmarkingQuestion Generation | —Unverified | 0 |
| EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting | Jul 1, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration | Jul 1, 2024 | Benchmarking | CodeCode Available | 0 |
| Modified CMA-ES Algorithm for Multi-Modal Optimization: Incorporating Niching Strategies and Dynamic Adaptation Mechanism | Jul 1, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| MIRAI: Evaluating LLM Agents for Event Forecasting | Jul 1, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy | Jul 1, 2024 | Benchmarking | —Unverified | 0 |
| GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing | Jun 30, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Commute Graph Neural Networks | Jun 30, 2024 | Benchmarking | —Unverified | 0 |
| PerSEval: Assessing Personalization in Text Summarizers | Jun 29, 2024 | BenchmarkingHuman Judgment Correlation | —Unverified | 0 |
| Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives | Jun 27, 2024 | Benchmarking | —Unverified | 0 |
| Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges | Jun 27, 2024 | BenchmarkingClinical Knowledge | —Unverified | 0 |
| Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI | Jun 26, 2024 | BenchmarkingCrop Type Mapping | —Unverified | 0 |
| Quantum-tunnelling deep neural network for optical illusion recognition | Jun 26, 2024 | Autonomous VehiclesBenchmarking | —Unverified | 0 |
| XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis | Jun 26, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making | Jun 25, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | Jun 25, 2024 | BenchmarkingRAG | —Unverified | 0 |
| Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Jun 25, 2024 | Action DetectionBenchmarking | CodeCode Available | 0 |
| NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods | Jun 25, 2024 | 3DGSBenchmarking | —Unverified | 0 |
| Towards Efficient and Scalable Training of Differentially Private Deep Learning | Jun 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems | Jun 25, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models | Jun 24, 2024 | Benchmarking | —Unverified | 0 |