| MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation | Mar 13, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Hybrid Agents for Image Restoration | Mar 13, 2025 | Image RestorationIn-Context Learning | —Unverified | 0 |
| OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model | Mar 13, 2025 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| PRISM: Preference Refinement via Implicit Scene Modeling for 3D Vision-Language Preference-Based Reinforcement Learning | Mar 13, 2025 | Autonomous NavigationDecision Making | —Unverified | 0 |
| Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search | Mar 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation | Mar 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Toward a method for LLM-enabled Indoor Navigation | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Knowledge Graphs and LLMs for Context-Aware Messaging | Mar 12, 2025 | Entity LinkingEvent Detection | —Unverified | 0 |
| Medical Large Language Model Benchmarks Should Prioritize Construct Validity | Mar 12, 2025 | Clinical KnowledgeLanguage Modeling | —Unverified | 0 |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models | Mar 12, 2025 | DenoisingLanguage Modeling | CodeCode Available | 4 |