| Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data | May 29, 2024 | BenchmarkingSpecificity | —Unverified | 0 |
| MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification | May 29, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Improving Detail Image Caption | May 29, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| Quantitative Certification of Bias in Large Language Models | May 29, 2024 | Benchmarking | CodeCode Available | 1 |
| Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion | May 28, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| Risk-Neutral Generative Networks | May 28, 2024 | Benchmarking | —Unverified | 0 |
| DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime | May 28, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 1 |
| Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences | May 28, 2024 | BenchmarkingFeature Engineering | CodeCode Available | 1 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving | May 27, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 3 |
| A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis | May 27, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking General-Purpose In-Context Learning | May 27, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| BOLD: Boolean Logic Deep Learning | May 25, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | May 25, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum" | May 24, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification | May 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| NuwaTS: a Foundation Model Mending Every Incomplete Time Series | May 24, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images | May 24, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Full-stack evaluation of Machine Learning inference workloads for RISC-V systems | May 24, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model | May 24, 2024 | BenchmarkingDemand Forecasting | —Unverified | 0 |
| Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study | May 24, 2024 | BenchmarkingVulnerability Detection | —Unverified | 0 |
| Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks | May 24, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling | May 23, 2024 | Benchmarking | CodeCode Available | 1 |
| S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models | May 23, 2024 | Benchmarking | CodeCode Available | 2 |