| Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering | Mar 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground | Mar 4, 2024 | Benchmarking | —Unverified | 0 |
| Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks | Mar 4, 2024 | Benchmarking | CodeCode Available | 0 |
| Classification of the Fashion-MNIST Dataset on a Quantum Computer | Mar 4, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| Model Lakes | Mar 4, 2024 | BenchmarkingManagement | —Unverified | 0 |
| a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification | Mar 3, 2024 | BenchmarkingSpeaker Verification | CodeCode Available | 0 |
| A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds | Mar 2, 2024 | BenchmarkingPosition | —Unverified | 0 |
| SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study | Mar 1, 2024 | Benchmarking | —Unverified | 0 |
| Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms | Mar 1, 2024 | BenchmarkingStochastic Optimization | —Unverified | 0 |
| Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking | Mar 1, 2024 | BenchmarkingImitation Learning | —Unverified | 0 |
| Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | Mar 1, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance | Mar 1, 2024 | BenchmarkingStance Detection | —Unverified | 0 |
| The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition | Feb 29, 2024 | Action Unit DetectionArousal Estimation | —Unverified | 0 |
| FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking | Feb 28, 2024 | BenchmarkingInductive Learning | CodeCode Available | 0 |
| Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Feb 28, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies | Feb 27, 2024 | BenchmarkingSystematic Generalization | —Unverified | 0 |
| The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns | Feb 27, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | Feb 27, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |
| The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection | Feb 27, 2024 | Benchmarking | —Unverified | 0 |
| Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems | Feb 26, 2024 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset | Feb 26, 2024 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| Benchmarking LLMs on the Semantic Overlap Summarization Task | Feb 26, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| Partial Rankings of Optimizers | Feb 26, 2024 | Benchmarking | CodeCode Available | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| E(3)-equivariant models cannot learn chirality: Field-based molecular generation | Feb 24, 2024 | BenchmarkingGraph Neural Network | —Unverified | 0 |