| DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs | Apr 10, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs | Apr 9, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution | Apr 9, 2024 | Benchmarking | —Unverified | 0 |
| Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS | Apr 9, 2024 | BenchmarkingNeural Architecture Search | CodeCode Available | 0 |
| MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Apr 8, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 |
| Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level | Apr 8, 2024 | Benchmarking | CodeCode Available | 0 |
| HOEG: A New Approach for Object-Centric Predictive Process Monitoring | Apr 8, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 |
| EFSA: Towards Event-Level Financial Sentiment Analysis | Apr 8, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Apr 7, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes | Apr 7, 2024 | Benchmarking | —Unverified | 0 |