| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 | 5 |
| Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic Environments | Feb 20, 2023 | BenchmarkingRobot Navigation | CodeCode Available | 0 | 5 |
| Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks | Feb 2, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Learn How to Query from Unlabeled Data Streams in Federated Learning | Dec 11, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking | Jul 30, 2018 | Benchmarkingfeature selection | CodeCode Available | 0 | 5 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 | 5 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation | Jul 17, 2020 | BenchmarkingDisentanglement | CodeCode Available | 0 | 5 |
| Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks | Nov 15, 2023 | BenchmarkingNetwork Pruning | CodeCode Available | 0 | 5 |
| Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M | May 15, 2025 | BenchmarkingMemorization | CodeCode Available | 0 | 5 |
| Do LLM Evaluators Prefer Themselves for a Reason? | Apr 4, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Jan 22, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset | Feb 8, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Graph Convolutional Networks Meet with High Dimensionality Reduction | Nov 7, 2019 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 | 5 |
| Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts | Aug 19, 2018 | BenchmarkingClassification | CodeCode Available | 0 | 5 |
| Strong and Simple Baselines for Multimodal Utterance Embeddings | May 14, 2019 | Benchmarking | CodeCode Available | 0 | 5 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 | 5 |
| Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams | Jun 17, 2024 | AllBenchmarking | CodeCode Available | 0 | 5 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Image Classification of Marine Mammals | Oct 22, 2024 | Benchmarkingimage-classification | CodeCode Available | 0 | 5 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 | 5 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 | 5 |