| FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things | Sep 29, 2023 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| Benchmarking Cognitive Biases in Large Language Models as Evaluators | Sep 29, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 1 |
| MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data | Sep 29, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle? | Sep 29, 2023 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 |
| Revisiting Neural Program Smoothing for Fuzzing | Sep 28, 2023 | BenchmarkingCPU | CodeCode Available | 1 |
| The Trickle-down Impact of Reward (In-)consistency on RLHF | Sep 28, 2023 | Benchmarking | CodeCode Available | 1 |
| LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite | Sep 28, 2023 | Benchmarking | CodeCode Available | 1 |
| FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding | Sep 28, 2023 | BenchmarkingImage Retrieval | CodeCode Available | 1 |
| OceanBench: The Sea Surface Height Edition | Sep 27, 2023 | BenchmarkingSensor Fusion | CodeCode Available | 1 |
| Unified Long-Term Time-Series Forecasting Benchmark | Sep 27, 2023 | BenchmarkingTime Series | CodeCode Available | 1 |
| NLPBench: Evaluating Large Language Models on Solving NLP Problems | Sep 27, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step Retrosynthesis | Sep 27, 2023 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition | Sep 25, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction | Sep 24, 2023 | 3D Shape ReconstructionAnatomy | CodeCode Available | 1 |
| Grad DFT: a software library for machine learning enhanced density functional theory | Sep 23, 2023 | Benchmarking | CodeCode Available | 1 |
| Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation | Sep 21, 2023 | BenchmarkingClassification | CodeCode Available | 1 |
| An Image Dataset for Benchmarking Recommender Systems with Raw Pixels | Sep 13, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 1 |
| Formalizing Multimedia Recommendation through Multimodal Deep Learning | Sep 11, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions | Sep 10, 2023 | 3D Human Pose Estimation3D Pose Estimation | CodeCode Available | 1 |
| RecAD: Towards A Unified Library for Recommender Attack and Defense | Sep 9, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 1 |
| Evaluation of large language models for discovery of gene set function | Sep 7, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| A skeletonization algorithm for gradient-based optimization | Sep 5, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow Simulation | Sep 4, 2023 | Benchmarking | CodeCode Available | 1 |
| Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering | Aug 31, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 |
| Benchmarking the Generation of Fact Checking Explanations | Aug 29, 2023 | Abstractive Text SummarizationArticles | CodeCode Available | 1 |
| Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadata | Aug 29, 2023 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| MLLM-DataEngine: An Iterative Refinement Approach for MLLM | Aug 25, 2023 | Benchmarking | CodeCode Available | 1 |
| LLMRec: Benchmarking Large Language Models on Recommendation Task | Aug 23, 2023 | BenchmarkingExplanation Generation | CodeCode Available | 1 |
| VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations | Aug 19, 2023 | 6D Pose Estimation using RGBBenchmarking | CodeCode Available | 1 |
| Benchmarking Neural Network Generalization for Grammar Induction | Aug 16, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models? | Aug 14, 2023 | BenchmarkingDrug Design | CodeCode Available | 1 |
| DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity | Aug 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization | Aug 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking | Aug 9, 2023 | BenchmarkingFew-Shot Learning | CodeCode Available | 1 |
| Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK | Aug 8, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| XFlow: Benchmarking Flow Behaviors over Graphs | Aug 7, 2023 | Benchmarking | CodeCode Available | 1 |
| qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation | Aug 1, 2023 | BenchmarkingOpenAI Gym | CodeCode Available | 1 |
| Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples | Jul 31, 2023 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localization | Jul 31, 2023 | Autonomous NavigationAutonomous Vehicles | CodeCode Available | 1 |
| Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment | Jul 30, 2023 | BenchmarkingEntity Alignment | CodeCode Available | 1 |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | Jul 28, 2023 | Benchmarkingreinforcement-learning | CodeCode Available | 1 |
| PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular Docking | Jul 22, 2023 | BenchmarkingMolecular Docking | CodeCode Available | 1 |
| JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning | Jul 21, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models | Jul 20, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory | Jul 20, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Examining the Effects of Degree Distribution and Homophily in Graph Learning Models | Jul 17, 2023 | BenchmarkingGraph Clustering | CodeCode Available | 1 |
| Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding | Jul 17, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox | Jul 17, 2023 | Benchmarking | CodeCode Available | 1 |
| GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection | Jul 16, 2023 | Benchmarking | CodeCode Available | 1 |
| IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation | Jul 13, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 1 |