| Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces | May 23, 2023 | Benchmarking | CodeCode Available | 1 |
| Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks | May 22, 2023 | Adversarial AttackAutonomous Driving | CodeCode Available | 1 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models | May 18, 2023 | BenchmarkingImage Generation | CodeCode Available | 1 |
| PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering | May 17, 2023 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| An Empirical Study on Google Research Football Multi-agent Scenarios | May 16, 2023 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| A Platform for the Biomedical Application of Large Language Models | May 10, 2023 | BenchmarkingPrivacy Preserving | CodeCode Available | 1 |
| Benchmarking large language models for biomedical natural language processing applications and recommendations | May 10, 2023 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation | May 10, 2023 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects | May 9, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Working Memory Capacity of ChatGPT: An Empirical Study | Apr 30, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Event-Free Moving Object Segmentation from Moving Ego Vehicle | Apr 28, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table | Apr 25, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds | Apr 25, 2023 | BenchmarkingPose Estimation | CodeCode Available | 1 |
| RGB-D Indiscernible Object Counting in Underwater Scenes | Apr 23, 2023 | BenchmarkingDepth Estimation | CodeCode Available | 1 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 |
| SCoDA: Domain Adaptive Shape Completion for Real Scans | Apr 20, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Graph Neural Network-Based Anomaly Detection for River Network Systems | Apr 19, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints | Apr 18, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| A Comparison of Image Denoising Methods | Apr 18, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |
| NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems | Apr 10, 2023 | Benchmarking | CodeCode Available | 1 |
| Interpretable statistical representations of neural population dynamics and geometry | Apr 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding | Apr 5, 2023 | BenchmarkingMS-SSIM | CodeCode Available | 1 |
| SLPerf: a Unified Framework for Benchmarking Split Learning | Apr 4, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection | Apr 3, 2023 | BenchmarkingSentence | CodeCode Available | 1 |
| ScandEval: A Benchmark for Scandinavian Natural Language Processing | Apr 3, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry | Apr 1, 2023 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 |
| What Makes for Effective Few-shot Point Cloud Classification? | Mar 31, 2023 | BenchmarkingClassification | CodeCode Available | 1 |
| A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models | Mar 31, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Mar 30, 2023 | AttributeBenchmarking | CodeCode Available | 1 |
| MGTBench: Benchmarking Machine-Generated Text Detection | Mar 26, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| MEGA: Multilingual Evaluation of Generative AI | Mar 22, 2023 | Benchmarking | CodeCode Available | 1 |
| DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 | Mar 20, 2023 | BenchmarkingDe-identification | CodeCode Available | 1 |
| Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-Training | Mar 20, 2023 | BenchmarkingClustering | CodeCode Available | 1 |
| CCTV-Gun: Benchmarking Handgun Detection in CCTV Images | Mar 19, 2023 | Benchmarkingobject-detection | CodeCode Available | 1 |
| COVID-19 event extraction from Twitter via extractive question answering with continuous prompts | Mar 19, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 |
| TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution Testing | Mar 13, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers | Feb 23, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 1 |
| Revisiting the Gumbel-Softmax in MADDPG | Feb 23, 2023 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| A framework for benchmarking class-out-of-distribution detection and its application to ImageNet | Feb 23, 2023 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 |
| A SWAT-based Reinforcement Learning Framework for Crop Management | Feb 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery | Feb 6, 2023 | BenchmarkingCamera Calibration | CodeCode Available | 1 |
| CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks | Feb 4, 2023 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 |
| Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler | Feb 2, 2023 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 |
| Rethinking low-cost microscopy workflow: Image enhancement using deep based Extended Depth of Field methods | Feb 1, 2023 | BenchmarkingImage Deblurring | CodeCode Available | 1 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 |
| Benchmarking Robustness to Adversarial Image Obfuscations | Jan 30, 2023 | Benchmarking | CodeCode Available | 1 |
| TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine | Jan 28, 2023 | BenchmarkingCausal Inference | CodeCode Available | 1 |
| BiBench: Benchmarking and Analyzing Network Binarization | Jan 26, 2023 | BenchmarkingBinarization | CodeCode Available | 1 |
| Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces Recognition | Jan 13, 2023 | BenchmarkingFace Recognition | CodeCode Available | 1 |