| Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Mar 6, 2024 | Multimodal ReasoningQuestion Answering | CodeCode Available | 2 | 5 |
| Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Jun 11, 2024 | HallucinationImage Description | CodeCode Available | 2 | 5 |
| Tetrahedron Splatting for 3D Generation | Jun 3, 2024 | 3D Generation3DGS | CodeCode Available | 2 | 5 |
| Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation | Sep 4, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 2 | 5 |
| A Text-guided Protein Design Framework | Feb 9, 2023 | DecoderProperty Prediction | CodeCode Available | 2 | 5 |
| OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Jan 8, 2025 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 | 5 |
| Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models | Feb 11, 2025 | Style Transfer | CodeCode Available | 2 | 5 |
| Point-to-Box Network for Accurate Object Detection via Single Point Supervision | Jul 14, 2022 | AttributeMultiple Instance Learning | CodeCode Available | 2 | 5 |
| FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation | Dec 7, 2023 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 | 5 |
| The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings | Aug 24, 2023 | Gesture Generation | CodeCode Available | 2 | 5 |
| MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration | Dec 28, 2024 | DeblurringDenoising | CodeCode Available | 2 | 5 |
| FSTA-SNN:Frequency-based Spatial-Temporal Attention Module for Spiking Neural Networks | Dec 15, 2024 | | CodeCode Available | 2 | 5 |
| Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model | Aug 8, 2022 | Aerial Scene ClassificationFew-Shot Learning | CodeCode Available | 2 | 5 |
| Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning | Mar 19, 2024 | Inductive BiasReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Communication Learning in Multi-Agent Systems from Graph Modeling Perspective | Nov 1, 2024 | | CodeCode Available | 2 | 5 |
| Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Oct 6, 2024 | Community DetectionInformation Retrieval | CodeCode Available | 2 | 5 |
| Towards Natural Image Matting in the Wild via Real-Scenario Prior | Oct 9, 2024 | DecoderImage Matting | CodeCode Available | 2 | 5 |
| Generative Active Learning for Long-tailed Instance Segmentation | Jun 4, 2024 | Active LearningInstance Segmentation | CodeCode Available | 2 | 5 |
| ODRL: A Benchmark for Off-Dynamics Reinforcement Learning | Oct 28, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 2 | 5 |
| Expressive Text-to-Image Generation with Rich Text | Apr 13, 2023 | Image GenerationText Generation | CodeCode Available | 2 | 5 |
| Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark | Nov 24, 2022 | 2D Object DetectionImage Retrieval | CodeCode Available | 2 | 5 |
| Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Oct 11, 2024 | Camera Pose EstimationNovel View Synthesis | CodeCode Available | 2 | 5 |
| InPars: Data Augmentation for Information Retrieval using Large Language Models | Feb 10, 2022 | Data AugmentationDiversity | CodeCode Available | 2 | 5 |
| Root Mean Square Layer Normalization | Oct 16, 2019 | | CodeCode Available | 2 | 5 |
| WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |