| PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms | Oct 5, 2024 | BenchmarkingGPU | —Unverified | 0 |
| Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels | Oct 5, 2024 | Benchmarking | —Unverified | 0 |
| PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Oct 4, 2024 | BenchmarkingDialogue Generation | CodeCode Available | 0 |
| How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension | Oct 4, 2024 | BenchmarkingComputational chemistry | —Unverified | 0 |
| Ward: Provable RAG Dataset Inference via LLM Watermarks | Oct 4, 2024 | BenchmarkingRAG | —Unverified | 0 |
| ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities | Oct 4, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Towards a Benchmark for Large Language Models for Business Process Management Tasks | Oct 4, 2024 | BenchmarkingManagement | CodeCode Available | 0 |
| Benchmarking the Fidelity and Utility of Synthetic Relational Data | Oct 4, 2024 | BenchmarkingFeature Importance | —Unverified | 0 |
| Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning | Oct 4, 2024 | BenchmarkingUncertainty Quantification | —Unverified | 0 |
| Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices | Oct 4, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models | Oct 3, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| MANTRA: The Manifold Triangulations Assemblage | Oct 3, 2024 | Benchmarking | CodeCode Available | 0 |
| Repurposing Foundation Model for Generalizable Medical Time Series Classification | Oct 3, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning | Oct 3, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Deep learning for action spotting in association football videos | Oct 2, 2024 | Action SpottingBenchmarking | —Unverified | 0 |
| ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Oct 2, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations | Oct 2, 2024 | BenchmarkingLong Form Question Answering | —Unverified | 0 |
| The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Oct 2, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description | Oct 2, 2024 | BenchmarkingFacial expression generation | —Unverified | 0 |
| A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning | Oct 2, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Deep Unlearn: Benchmarking Machine Unlearning | Oct 2, 2024 | BenchmarkingMachine Unlearning | —Unverified | 0 |
| CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset | Oct 1, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks | Oct 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents | Oct 1, 2024 | BenchmarkingConversational Question Answering | —Unverified | 0 |
| Match Stereo Videos via Bidirectional Alignment | Sep 30, 2024 | BenchmarkingStereo Matching | —Unverified | 0 |
| Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration | Sep 30, 2024 | BenchmarkingIntent Detection | —Unverified | 0 |
| ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning | Sep 30, 2024 | BenchmarkingDisparity Estimation | CodeCode Available | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Constrained Reinforcement Learning for Safe Heat Pump Control | Sep 29, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |
| Tracking Everything in Robotic-Assisted Surgery | Sep 29, 2024 | Benchmarking | —Unverified | 0 |
| GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks | Sep 29, 2024 | Benchmarking | —Unverified | 0 |
| AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy | Sep 29, 2024 | AstronomyBenchmarking | —Unverified | 0 |
| SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement | Sep 28, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Data Analysis in the Era of Generative AI | Sep 27, 2024 | Benchmarking | —Unverified | 0 |
| Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study | Sep 27, 2024 | Benchmarkingtabular-regression | CodeCode Available | 0 |
| CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting | Sep 27, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| bnRep: A repository of Bayesian networks from the academic literature | Sep 27, 2024 | Benchmarking | —Unverified | 0 |
| MCUBench: A Benchmark of Tiny Object Detectors on MCUs | Sep 27, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes | Sep 27, 2024 | BenchmarkingDataset Generation | —Unverified | 0 |
| Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs | Sep 26, 2024 | BenchmarkingConformal Prediction | CodeCode Available | 0 |
| Benchmarking Domain Generalization Algorithms in Computational Pathology | Sep 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices | Sep 25, 2024 | Autonomous VehiclesBenchmarking | —Unverified | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 |
| Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics | Sep 25, 2024 | Benchmarking | —Unverified | 0 |
| SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking | Sep 25, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework | Sep 24, 2024 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| HLB: Benchmarking LLMs' Humanlikeness in Language Use | Sep 24, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data | Sep 24, 2024 | BenchmarkingDepth Estimation | CodeCode Available | 0 |
| Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling | Sep 24, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation | Sep 24, 2024 | BenchmarkingMovie Recommendation | CodeCode Available | 0 |