| IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context | Mar 29, 2024 | BenchmarkingSentence | CodeCode Available | 0 |
| Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions | Mar 29, 2024 | Action DetectionBenchmarking | CodeCode Available | 1 |
| TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods | Mar 29, 2024 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 5 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | Mar 28, 2024 | Benchmarking | CodeCode Available | 1 |
| ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | Mar 27, 2024 | Benchmarking | CodeCode Available | 1 |
| Towards Image Ambient Lighting Normalization | Mar 27, 2024 | BenchmarkingImage Restoration | CodeCode Available | 1 |
| RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers | Mar 27, 2024 | BenchmarkingDocument Ranking | CodeCode Available | 1 |
| Benchmarking Object Detectors with COCO: A New Path Forward | Mar 27, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | Mar 27, 2024 | BenchmarkingCancer Classification | —Unverified | 0 |
| GPTs and Language Barrier: A Cross-Lingual Legal QA Examination | Mar 26, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Mar 26, 2024 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 |
| Benchmarking Video Frame Interpolation | Mar 25, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts | Mar 25, 2024 | Benchmarking | —Unverified | 0 |
| NSINA: A News Corpus for Sinhala | Mar 25, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 |
| Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark | Mar 23, 2024 | BenchmarkingImage to Point Cloud Registration | CodeCode Available | 1 |
| TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring | Mar 23, 2024 | BenchmarkingText to SQL | CodeCode Available | 0 |
| On the Fragility of Active Learners for Text Classification | Mar 23, 2024 | Active LearningBenchmarking | CodeCode Available | 0 |
| Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards | Mar 22, 2024 | Benchmarkingenergy management | —Unverified | 0 |
| Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation | Mar 22, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes | Mar 22, 2024 | Benchmarking | —Unverified | 0 |
| Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos | Mar 22, 2024 | BenchmarkingTone Mapping | —Unverified | 0 |
| Can 3D Vision-Language Models Truly Understand Natural Language? | Mar 21, 2024 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 |