| Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Jun 25, 2024 | Action DetectionBenchmarking | CodeCode Available | 0 |
| Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Jun 25, 2024 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 |
| Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| MatText: Do Language Models Need More than Text & Scale for Materials Modeling? | Jun 25, 2024 | Benchmarking | CodeCode Available | 1 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems | Jun 25, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| Towards Efficient and Scalable Training of Differentially Private Deep Learning | Jun 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods | Jun 25, 2024 | 3DGSBenchmarking | —Unverified | 0 |
| Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models | Jun 24, 2024 | Benchmarking | —Unverified | 0 |
| CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization | Jun 24, 2024 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| FaceScore: Benchmarking and Enhancing Face Quality in Human Generation | Jun 24, 2024 | BenchmarkingDenoising | CodeCode Available | 2 |
| A Closer Look at Mortality Risk Prediction from Electrocardiograms | Jun 24, 2024 | BenchmarkingPrediction | CodeCode Available | 1 |
| From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking | Jun 24, 2024 | BenchmarkingNeRF | CodeCode Available | 2 |
| AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | Jun 24, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments | Jun 24, 2024 | Benchmarking | CodeCode Available | 4 |
| DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | Jun 24, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Jun 24, 2024 | BenchmarkingRAG | CodeCode Available | 1 |
| General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design | Jun 24, 2024 | BenchmarkingDrug Design | CodeCode Available | 1 |
| PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs | Jun 24, 2024 | BenchmarkingMachine Unlearning | —Unverified | 0 |
| GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets | Jun 23, 2024 | Benchmarking | —Unverified | 0 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking | Jun 23, 2024 | Benchmarking | CodeCode Available | 2 |
| Position: Benchmarking is Limited in Reinforcement Learning Research | Jun 23, 2024 | BenchmarkingPosition | —Unverified | 0 |
| MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication | Jun 22, 2024 | BenchmarkingMeta-Learning | CodeCode Available | 0 |