| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 |
| ChatGPT Alternative Solutions: Large Language Models Survey | Mar 21, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| DomainLab: A modular Python package for domain generalization in deep learning | Mar 21, 2024 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| Practical End-to-End Optical Music Recognition for Pianoform Music | Mar 20, 2024 | Benchmarking | CodeCode Available | 1 |
| MARTA: a model for the automatic phonemic grouping of the parkinsonian speech | Mar 19, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning | Mar 19, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |
| Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Mar 19, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 3 |
| MELTing point: Mobile Evaluation of Language Transformers | Mar 19, 2024 | BenchmarkingQuantization | CodeCode Available | 1 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems | Mar 19, 2024 | Benchmarkingfeature selection | CodeCode Available | 1 |
| Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation | Mar 19, 2024 | BenchmarkingSegmentation | —Unverified | 0 |
| Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset | Mar 19, 2024 | Action RecognitionBenchmarking | —Unverified | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens | Mar 18, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Mar 18, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |
| Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks | Mar 18, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking the Robustness of UAV Tracking Against Common Corruptions | Mar 18, 2024 | Benchmarking | CodeCode Available | 0 |
| A Sober Look at the Robustness of CLIPs to Spurious Features | Mar 18, 2024 | Benchmarking | —Unverified | 0 |
| FlowMind: Automatic Workflow Generation with LLMs | Mar 17, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking | Mar 17, 2024 | BenchmarkingDialogue State Tracking | —Unverified | 0 |
| Depression Detection on Social Media with Large Language Models | Mar 16, 2024 | BenchmarkingDepression Detection | —Unverified | 0 |
| An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models | Mar 15, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks | Mar 15, 2024 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images | Mar 15, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 |
| Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study | Mar 15, 2024 | Benchmarking | CodeCode Available | 0 |