| SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages | Mar 14, 2024 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors | Mar 14, 2024 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Recurrent Drafter for Fast Speculative Decoding in Large Language Models | Mar 14, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 3 |
| Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows | Mar 13, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models | Mar 12, 2024 | Benchmarking | CodeCode Available | 9 |
| IndicSTR12: A Dataset for Indic Scene Text Recognition | Mar 12, 2024 | BenchmarkingScene Text Recognition | —Unverified | 0 |
| An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems | Mar 12, 2024 | Benchmarking | —Unverified | 0 |
| A tutorial on multi-view autoencoders using the multi-view-AE library | Mar 12, 2024 | Benchmarking | —Unverified | 0 |
| Better than classical? The subtle art of benchmarking quantum machine learning models | Mar 11, 2024 | BenchmarkingBinary Classification | CodeCode Available | 7 |
| (N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model | Mar 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies | Mar 11, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages | Mar 11, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology | Mar 11, 2024 | BenchmarkingContent-Based Image Retrieval | CodeCode Available | 1 |
| A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation | Mar 11, 2024 | BenchmarkingTraffic Signal Control | —Unverified | 0 |
| Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark | Mar 9, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations | Mar 9, 2024 | BenchmarkingCPU | CodeCode Available | 0 |
| Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms | Mar 8, 2024 | BenchmarkingSynthetic Data Generation | CodeCode Available | 0 |
| Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Mar 8, 2024 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| Benchmarking Large Language Models for Molecule Prediction Tasks | Mar 8, 2024 | BenchmarkingPrediction | CodeCode Available | 0 |
| Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Mar 8, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume | Mar 8, 2024 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations | Mar 7, 2024 | Benchmarking | CodeCode Available | 1 |
| NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems | Mar 7, 2024 | BenchmarkingDependency Parsing | —Unverified | 0 |
| Benchmarking News Recommendation in the Era of Green AI | Mar 7, 2024 | BenchmarkingGPU | —Unverified | 0 |
| Improvements & Evaluations on the MLCommons CloudMask Benchmark | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 |
| Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation | Mar 7, 2024 | BenchmarkingMultimodal Recommendation | CodeCode Available | 1 |
| Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 |
| Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks | Mar 6, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task | Mar 6, 2024 | Benchmarking | CodeCode Available | 0 |
| A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video | Mar 6, 2024 | BenchmarkingCrowd Counting | —Unverified | 0 |
| BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving | Mar 6, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | Mar 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation | Mar 5, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering | Mar 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground | Mar 4, 2024 | Benchmarking | —Unverified | 0 |
| SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis | Mar 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 2 |
| REAL-Colon: A dataset for developing real-world AI applications in colonoscopy | Mar 4, 2024 | Benchmarking | CodeCode Available | 2 |
| Classification of the Fashion-MNIST Dataset on a Quantum Computer | Mar 4, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| Model Lakes | Mar 4, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks | Mar 4, 2024 | Benchmarking | CodeCode Available | 0 |
| a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification | Mar 3, 2024 | BenchmarkingSpeaker Verification | CodeCode Available | 0 |
| A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds | Mar 2, 2024 | BenchmarkingPosition | —Unverified | 0 |
| Benchmarking Segmentation Models with Mask-Preserved Attribute Editing | Mar 2, 2024 | AttributeBenchmarking | CodeCode Available | 1 |
| SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study | Mar 1, 2024 | Benchmarking | —Unverified | 0 |
| Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms | Mar 1, 2024 | BenchmarkingStochastic Optimization | —Unverified | 0 |
| Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance | Mar 1, 2024 | BenchmarkingStance Detection | —Unverified | 0 |
| Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | Mar 1, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs | Mar 1, 2024 | Benchmarking | CodeCode Available | 1 |
| Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking | Mar 1, 2024 | BenchmarkingImitation Learning | —Unverified | 0 |