| CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale | Jun 3, 2025 | Large Language Model | CodeCode Available | 2 |
| TestAgent: An Adaptive and Intelligent Expert for Human Assessment | Jun 3, 2025 | Large Language ModelQuestion Selection | —Unverified | 0 |
| TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Jun 3, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback | Jun 2, 2025 | Large Language Model | —Unverified | 0 |
| PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization | Jun 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation | Jun 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Why Gradients Rapidly Increase Near the End of Training | Jun 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| PointT2I: LLM-based text-to-image generation via keypoints | Jun 2, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding | Jun 2, 2025 | 3D GenerationLarge Language Model | CodeCode Available | 4 |