| Unleashing OpenTitan's Potential: a Silicon-Ready Embedded Secure Element for Root of Trust and Cryptographic Offloading | Jun 17, 2024 | Autonomous VehiclesBenchmarking | —Unverified | 0 |
| InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States | Jun 17, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking | Jun 17, 2024 | BenchmarkingDemand Forecasting | CodeCode Available | 1 |
| A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models | Jun 17, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams | Jun 17, 2024 | AllBenchmarking | CodeCode Available | 0 |
| The Liouville Generator for Producing Integrable Expressions | Jun 17, 2024 | Benchmarking | —Unverified | 0 |
| MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | Jun 17, 2024 | BenchmarkingFact Checking | CodeCode Available | 1 |
| Standardizing Structural Causal Models | Jun 17, 2024 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | Jun 17, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |