| The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects | Jun 1, 2023 | BenchmarkingObject | —Unverified | 0 |
| HySpecNet-11k: A Large-Scale Hyperspectral Dataset for Benchmarking Learning-Based Hyperspectral Image Compression Methods | Jun 1, 2023 | BenchmarkingHyperspectral image analysis | —Unverified | 0 |
| Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces | May 31, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion | May 30, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models | May 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning | May 30, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Design and implementation of intelligent packet filtering in IoT microcontroller-based devices | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States | May 30, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Large-scale Ridesharing DARP Instances Based on Real Travel Demand | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics | May 30, 2023 | Benchmarking | CodeCode Available | 1 |
| Human Body Shape Classification Based on a Single Image | May 29, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Decoding the Underlying Meaning of Multimodal Hateful Memes | May 28, 2023 | BenchmarkingHateful Meme Classification | CodeCode Available | 1 |
| InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion | May 28, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Exploring the Practicality of Generative Retrieval on Dynamic Corpora | May 27, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring | May 27, 2023 | BenchmarkingDeblurring | CodeCode Available | 0 |
| Learning from Integral Losses in Physics Informed Neural Networks | May 27, 2023 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Diverse-Modal Entity Linking with Generative Models | May 27, 2023 | BenchmarkingDecoder | —Unverified | 0 |
| The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs) | May 26, 2023 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 2 |
| Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks | May 26, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking state-of-the-art gradient boosting algorithms for classification | May 26, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts | May 25, 2023 | Benchmarkingobject-detection | CodeCode Available | 0 |
| CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset | May 25, 2023 | BenchmarkingText to SQL | CodeCode Available | 0 |
| KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration | May 25, 2023 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite | May 24, 2023 | Benchmarking | —Unverified | 0 |
| Barkour: Benchmarking Animal-level Agility with Quadruped Robots | May 24, 2023 | BenchmarkingNavigate | —Unverified | 0 |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 |
| BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer | May 24, 2023 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 |
| ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment | May 23, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces | May 23, 2023 | Benchmarking | CodeCode Available | 1 |
| R2H: Building Multimodal Navigation Helpers that Respond to Help Requests | May 23, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| When the Music Stops: Tip-of-the-Tongue Retrieval for Music | May 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Benchmarking Machine Translation with Cultural Awareness | May 23, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Robust Model-Based Optimization for Challenging Fitness Landscapes | May 23, 2023 | Benchmarkingmodel | CodeCode Available | 0 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Multilingual Large Language Models Are Not (Yet) Code-Switchers | May 23, 2023 | BenchmarkingLanguage Identification | —Unverified | 0 |
| How Fragile is Relation Extraction under Entity Replacements? | May 22, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches | May 22, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks | May 22, 2023 | Adversarial AttackAutonomous Driving | CodeCode Available | 1 |
| Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market | May 21, 2023 | BenchmarkingFinancial Analysis | —Unverified | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 |
| Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models | May 19, 2023 | BenchmarkingDiversity | CodeCode Available | 2 |
| Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses | May 19, 2023 | BenchmarkingForm | CodeCode Available | 0 |
| TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks | May 19, 2023 | Benchmarking | —Unverified | 0 |
| Ahead-of-Time P-Tuning | May 18, 2023 | Benchmarkingparameter-efficient fine-tuning | —Unverified | 0 |
| Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation | May 18, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | May 18, 2023 | BenchmarkingGPU | —Unverified | 0 |