| The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs | Jun 13, 2025 | Large Language Model | —Unverified | 0 |
| SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks | Jun 13, 2025 | BenchmarkingLarge Language Model | CodeCode Available | 2 |
| Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | Jun 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Intelligent Automation for FDI Facilitation: Optimizing Tariff Exemption Processes with OCR And Large Language Models | Jun 12, 2025 | Large Language ModelOptical Character Recognition | —Unverified | 0 |
| LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic | Jun 12, 2025 | Large Language ModelPrompt Engineering | CodeCode Available | 0 |
| Nowcasting the euro area with social media data | Jun 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices | Jun 12, 2025 | CPUGPU | —Unverified | 0 |
| Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding | Jun 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills | Jun 12, 2025 | Large Language ModelTask Planning | —Unverified | 0 |
| Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework | Jun 12, 2025 | Adversarial AttackDiversity | —Unverified | 0 |