| Building a Functional Machine Translation Corpus for Kpelle | May 24, 2025 | Data AugmentationLanguage Modelling | —Unverified | 0 |
| MedScore: Factuality Evaluation of Free-Form Medical Answers | May 24, 2025 | FormHallucination | CodeCode Available | 0 |
| PD^3: A Project Duplication Detection Framework via Adapted Multi-Agent Debate | May 23, 2025 | Sentence | —Unverified | 0 |
| Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models | May 23, 2025 | Sentence | —Unverified | 0 |
| A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP | May 22, 2025 | Continual PretrainingDiagnostic | CodeCode Available | 0 |
| LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods | May 22, 2025 | DecoderMachine Translation | CodeCode Available | 0 |
| Memorization or Reasoning? Exploring the Idiom Understanding of LLMs | May 22, 2025 | Machine TranslationMemorization | —Unverified | 0 |
| SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning | May 22, 2025 | Sentence | —Unverified | 0 |
| Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites | May 21, 2025 | Sentence | —Unverified | 0 |
| Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI | May 21, 2025 | Deep LearningFairness | CodeCode Available | 0 |