| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels | Oct 5, 2024 | Benchmarking | —Unverified | 0 |
| How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension | Oct 4, 2024 | BenchmarkingComputational chemistry | —Unverified | 0 |
| PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Oct 4, 2024 | BenchmarkingDialogue Generation | CodeCode Available | 0 |
| ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities | Oct 4, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Towards a Benchmark for Large Language Models for Business Process Management Tasks | Oct 4, 2024 | BenchmarkingManagement | CodeCode Available | 0 |
| Benchmarking the Fidelity and Utility of Synthetic Relational Data | Oct 4, 2024 | BenchmarkingFeature Importance | —Unverified | 0 |
| Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning | Oct 4, 2024 | BenchmarkingUncertainty Quantification | —Unverified | 0 |
| Ward: Provable RAG Dataset Inference via LLM Watermarks | Oct 4, 2024 | BenchmarkingRAG | —Unverified | 0 |
| Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices | Oct 4, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models | Oct 3, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| MANTRA: The Manifold Triangulations Assemblage | Oct 3, 2024 | Benchmarking | CodeCode Available | 0 |
| Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning | Oct 3, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Repurposing Foundation Model for Generalizable Medical Time Series Classification | Oct 3, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| Deep learning for action spotting in association football videos | Oct 2, 2024 | Action SpottingBenchmarking | —Unverified | 0 |
| ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Oct 2, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations | Oct 2, 2024 | BenchmarkingLong Form Question Answering | —Unverified | 0 |
| The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Oct 2, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description | Oct 2, 2024 | BenchmarkingFacial expression generation | —Unverified | 0 |
| A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning | Oct 2, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Deep Unlearn: Benchmarking Machine Unlearning | Oct 2, 2024 | BenchmarkingMachine Unlearning | —Unverified | 0 |
| CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset | Oct 1, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks | Oct 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents | Oct 1, 2024 | BenchmarkingConversational Question Answering | —Unverified | 0 |
| Match Stereo Videos via Bidirectional Alignment | Sep 30, 2024 | BenchmarkingStereo Matching | —Unverified | 0 |