| Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) | Apr 5, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios | Apr 5, 2024 | Benchmarking | —Unverified | 0 |
| No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Apr 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| Outlier-Efficient Hopfield Layers for Large Transformer-Based Models | Apr 4, 2024 | BenchmarkingQuantization | CodeCode Available | 1 |
| PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model | Apr 4, 2024 | 3D Part SegmentationBenchmarking | CodeCode Available | 1 |
| Benchmarking ChatGPT on Algorithmic Reasoning | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| Schroedinger's Threshold: When the AUC doesn't predict Accuracy | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior | Apr 4, 2024 | BenchmarkingImage Restoration | —Unverified | 0 |
| NL2KQL: From Natural Language to Kusto Query | Apr 3, 2024 | BenchmarkingNatural Language Queries | —Unverified | 0 |
| Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Apr 3, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 |
| Atom-Level Optical Chemical Structure Recognition with Limited Supervision | Apr 2, 2024 | Benchmarking | CodeCode Available | 1 |
| On the reduction of Linear Parameter-Varying State-Space models | Apr 2, 2024 | BenchmarkingDimensionality Reduction | —Unverified | 0 |
| PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics | Apr 2, 2024 | Benchmarking | CodeCode Available | 0 |
| PREGO: online mistake detection in PRocedural EGOcentric videos | Apr 2, 2024 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking | Apr 2, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 2 |
| Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach | Apr 2, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Diffusion-Driven Domain Adaptation for Generating 3D Molecules | Apr 1, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Apr 1, 2024 | BenchmarkingMath | —Unverified | 0 |
| Are large language models superhuman chemists? | Apr 1, 2024 | Benchmarking | CodeCode Available | 2 |
| SpiralMLP: A Lightweight Vision MLP Architecture | Mar 31, 2024 | Benchmarking | —Unverified | 0 |
| Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells | Mar 29, 2024 | Benchmarking | —Unverified | 0 |
| IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context | Mar 29, 2024 | BenchmarkingSentence | CodeCode Available | 0 |