| Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset | May 24, 2025 | BenchmarkingRAG | CodeCode Available | 0 |
| Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs | May 24, 2025 | Benchmarking | —Unverified | 0 |
| CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions | May 24, 2025 | Benchmarking | CodeCode Available | 2 |
| Benchmarking and Rethinking Knowledge Editing for Large Language Models | May 24, 2025 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation | May 24, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 |
| SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models | May 24, 2025 | BenchmarkingVideo Grounding | —Unverified | 0 |
| Benchmarking Poisoning Attacks against Retrieval-Augmented Generation | May 24, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 |
| ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation | May 24, 2025 | BenchmarkingChart Understanding | CodeCode Available | 3 |
| SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs | May 24, 2025 | Benchmarking | CodeCode Available | 0 |