| Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making | Jun 25, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | Jun 25, 2024 | BenchmarkingRAG | —Unverified | 0 |
| Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language | Jun 25, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Jun 25, 2024 | Action DetectionBenchmarking | CodeCode Available | 0 |
| NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods | Jun 25, 2024 | 3DGSBenchmarking | —Unverified | 0 |
| Towards Efficient and Scalable Training of Differentially Private Deep Learning | Jun 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems | Jun 25, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models | Jun 24, 2024 | Benchmarking | —Unverified | 0 |