| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language | Dec 31, 2024 | 2k | —Unverified | 0 |
| Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces | Dec 30, 2024 | 2kRobot Navigation | —Unverified | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| AnalogXpert: Automating Analog Topology Synthesis by Incorporating Circuit Design Expertise into Large Language Models | Dec 17, 2024 | 2kCode Generation | —Unverified | 0 |
| Block-Based Multi-Scale Image Rescaling | Dec 16, 2024 | 2k4k | —Unverified | 0 |
| Do Large Language Models Show Biases in Causal Learning? | Dec 13, 2024 | 2kMisinformation | —Unverified | 0 |
| Elevating Flow-Guided Video Inpainting with Reference Generation | Dec 12, 2024 | 2kVideo Inpainting | CodeCode Available | 2 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 |
| Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video | Dec 4, 2024 | 2k | —Unverified | 0 |