| Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions | Mar 29, 2024 | Action DetectionBenchmarking | CodeCode Available | 1 |
| IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context | Mar 29, 2024 | BenchmarkingSentence | CodeCode Available | 0 |
| TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods | Mar 29, 2024 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 5 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | Mar 28, 2024 | Benchmarking | CodeCode Available | 1 |
| RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers | Mar 27, 2024 | BenchmarkingDocument Ranking | CodeCode Available | 1 |
| ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | Mar 27, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Object Detectors with COCO: A New Path Forward | Mar 27, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| Towards Image Ambient Lighting Normalization | Mar 27, 2024 | BenchmarkingImage Restoration | CodeCode Available | 1 |
| Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | Mar 27, 2024 | BenchmarkingCancer Classification | —Unverified | 0 |
| GPTs and Language Barrier: A Cross-Lingual Legal QA Examination | Mar 26, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Mar 26, 2024 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 |
| Benchmarking Video Frame Interpolation | Mar 25, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts | Mar 25, 2024 | Benchmarking | —Unverified | 0 |
| NSINA: A News Corpus for Sinhala | Mar 25, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 |
| Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark | Mar 23, 2024 | BenchmarkingImage to Point Cloud Registration | CodeCode Available | 1 |
| On the Fragility of Active Learners for Text Classification | Mar 23, 2024 | Active LearningBenchmarking | CodeCode Available | 0 |
| TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring | Mar 23, 2024 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation | Mar 22, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards | Mar 22, 2024 | Benchmarkingenergy management | —Unverified | 0 |
| Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes | Mar 22, 2024 | Benchmarking | —Unverified | 0 |
| Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos | Mar 22, 2024 | BenchmarkingTone Mapping | —Unverified | 0 |
| Can 3D Vision-Language Models Truly Understand Natural Language? | Mar 21, 2024 | BenchmarkingDiversity | CodeCode Available | 1 |
| RoDLA: Benchmarking the Robustness of Document Layout Analysis Models | Mar 21, 2024 | BenchmarkingDocument Layout Analysis | CodeCode Available | 1 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 |
| ChatGPT Alternative Solutions: Large Language Models Survey | Mar 21, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| DomainLab: A modular Python package for domain generalization in deep learning | Mar 21, 2024 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| Practical End-to-End Optical Music Recognition for Pianoform Music | Mar 20, 2024 | Benchmarking | CodeCode Available | 1 |
| MARTA: a model for the automatic phonemic grouping of the parkinsonian speech | Mar 19, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning | Mar 19, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |
| Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Mar 19, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 3 |
| MELTing point: Mobile Evaluation of Language Transformers | Mar 19, 2024 | BenchmarkingQuantization | CodeCode Available | 1 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems | Mar 19, 2024 | Benchmarkingfeature selection | CodeCode Available | 1 |
| Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation | Mar 19, 2024 | BenchmarkingSegmentation | —Unverified | 0 |
| Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset | Mar 19, 2024 | Action RecognitionBenchmarking | —Unverified | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens | Mar 18, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Mar 18, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |
| Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks | Mar 18, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking the Robustness of UAV Tracking Against Common Corruptions | Mar 18, 2024 | Benchmarking | CodeCode Available | 0 |
| A Sober Look at the Robustness of CLIPs to Spurious Features | Mar 18, 2024 | Benchmarking | —Unverified | 0 |
| FlowMind: Automatic Workflow Generation with LLMs | Mar 17, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking | Mar 17, 2024 | BenchmarkingDialogue State Tracking | —Unverified | 0 |
| Depression Detection on Social Media with Large Language Models | Mar 16, 2024 | BenchmarkingDepression Detection | —Unverified | 0 |
| An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models | Mar 15, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks | Mar 15, 2024 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images | Mar 15, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 |
| Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study | Mar 15, 2024 | Benchmarking | CodeCode Available | 0 |