| LLMs in Coding and their Impact on the Commercial Software Engineering Landscape | Jun 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research | Jun 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Watermarking Autoregressive Image Generation | Jun 19, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | Jun 18, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks | Jun 18, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 |
| RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments | Jun 18, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Show-o2: Improved Native Unified Multimodal Models | Jun 18, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Finance Language Model Evaluation (FLaME) | Jun 18, 2025 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models | Jun 17, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition | Jun 17, 2025 | Data AugmentationLanguage Modeling | —Unverified | 0 |