| MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Jun 23, 2025 | DiagnosticLarge Language Model | CodeCode Available | 1 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| DreamJourney: Perpetual View Generation with Video Diffusion Models | Jun 21, 2025 | Image to 3DLarge Language Model | —Unverified | 0 |
| The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units | Jun 19, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM | Jun 17, 2025 | HallucinationLanguage Modeling | —Unverified | 0 |
| CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model | Jun 16, 2025 | Decision MakingFinancial Analysis | —Unverified | 0 |
| VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation | Jun 16, 2025 | Data VisualizationLanguage Modeling | CodeCode Available | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| PHRASED: Phrase Dictionary Biasing for Speech Translation | Jun 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin | Jun 5, 2025 | Large Language ModelMorphological Analysis | —Unverified | 0 |