| Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models | Apr 1, 2025 | Benchmarking | —Unverified | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics | Nov 28, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs | Oct 18, 2024 | BenchmarkingFairness | —Unverified | 0 |
| LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama | Mar 14, 2025 | BenchmarkingMMLU | —Unverified | 0 |
| LAMBDA: Covering the Solution Set of Black-Box Inequality by Search Space Quantization | Mar 25, 2022 | BenchmarkingQuantization | —Unverified | 0 |
| Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification | Sep 2, 2024 | Benchmarking | —Unverified | 0 |
| LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions | Jun 3, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance | Feb 17, 2025 | BenchmarkingDependency Parsing | —Unverified | 0 |
| Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance | Jul 18, 2024 | Benchmarking | —Unverified | 0 |
| Language Models for Automated Classification of Brain MRI Reports and Growth Chart Generation | Mar 15, 2025 | Benchmarking | —Unverified | 0 |
| Can LLMs Capture Human Preferences? | May 4, 2023 | Benchmarking | —Unverified | 0 |
| Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning | Oct 3, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices | Oct 4, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Large Language Models are Null-Shot Learners | Jan 16, 2024 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| Large Language Models are Few-Shot Clinical Information Extractors | May 25, 2022 | Benchmarkingcoreference-resolution | —Unverified | 0 |
| Large Language Models as Automated Aligners for benchmarking Vision-Language Models | Nov 24, 2023 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens | Jun 10, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level | Nov 5, 2024 | Bayesian OptimisationBenchmarking | —Unverified | 0 |
| Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding | Jan 24, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models | Jan 9, 2025 | BenchmarkingPhilosophical Reflection | —Unverified | 0 |
| Large-scale Benchmarking of Metaphor-based Optimization Heuristics | Feb 15, 2024 | BenchmarkingExperimental Design | —Unverified | 0 |
| Large-Scale Quantum Separability Through a Reproducible Machine Learning Lens | Jun 15, 2023 | Benchmarking | —Unverified | 0 |
| Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics | Jan 10, 2024 | Anomaly SegmentationAutonomous Driving | —Unverified | 0 |
| Latent Variable Models for Visual Question Answering | Jan 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |