| Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships | Feb 19, 2024 | 3d scene graph generationObject | CodeCode Available | 2 | 5 |
| GREEN: a lightweight architecture using learnable wavelets and Riemannian geometry for biomarker exploration | May 14, 2024 | EEG | CodeCode Available | 2 | 5 |
| Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling | Feb 20, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 | 5 |
| Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations | Feb 20, 2024 | Sentence | CodeCode Available | 2 | 5 |
| Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | Sep 5, 2023 | DecoderImage Generation | CodeCode Available | 2 | 5 |
| RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation | Feb 23, 2024 | | CodeCode Available | 2 | 5 |
| Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition | Feb 23, 2024 | Image GenerationPersonalized Image Generation | CodeCode Available | 2 | 5 |
| Towards Multi-spatiotemporal-scale Generalized PDE Modeling | Sep 30, 2022 | PDE Surrogate Modeling | CodeCode Available | 2 | 5 |
| ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | Feb 23, 2024 | | CodeCode Available | 2 | 5 |
| An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation | Feb 26, 2024 | Dataset Generationtext-to-speech | CodeCode Available | 2 | 5 |
| Deep Homography Estimation for Visual Place Recognition | Feb 25, 2024 | Homography EstimationRe-Ranking | CodeCode Available | 2 | 5 |
| Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings | Feb 27, 2024 | DiversityOffline RL | CodeCode Available | 2 | 5 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 | 5 |
| UN-SAM: Universal Prompt-Free Segmentation for Generalized Nuclei Images | Feb 26, 2024 | DecoderSegmentation | CodeCode Available | 2 | 5 |
| Contextualized Diffusion Models for Text-Guided Image and Video Generation | Feb 26, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 | 5 |
| Retrieval is Accurate Generation | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards | Feb 28, 2024 | | CodeCode Available | 2 | 5 |
| DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning | Feb 28, 2024 | Contrastive LearningDecision Making | CodeCode Available | 2 | 5 |
| A Survey on Remote Sensing Foundation Models: From Vision to Multimodality | Mar 28, 2025 | Change DetectionLand Cover Classification | CodeCode Available | 2 | 5 |
| A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving | Feb 29, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 | 5 |
| How do Large Language Models Handle Multilingualism? | Feb 29, 2024 | | CodeCode Available | 2 | 5 |
| NARUTO: Neural Active Reconstruction from Uncertain Target Observations | Feb 29, 2024 | Surface Reconstruction | CodeCode Available | 2 | 5 |
| Deep learning for 3D human pose estimation and mesh recovery: A survey | Feb 29, 2024 | 3D Human Pose EstimationAutonomous Driving | CodeCode Available | 2 | 5 |
| Global and Local Prompts Cooperation via Optimal Transport for Federated Learning | Feb 29, 2024 | Federated LearningPrompt Learning | CodeCode Available | 2 | 5 |
| VNLP: Turkish NLP Package | Mar 2, 2024 | Morphological Analysisnamed-entity-recognition | CodeCode Available | 2 | 5 |
| TempCompass: Do Video LLMs Really Understand Videos? | Mar 1, 2024 | Diversity | CodeCode Available | 2 | 5 |
| DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Mar 1, 2024 | Objectobject-detection | CodeCode Available | 2 | 5 |
| Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning | Mar 2, 2024 | DecoderMulti-agent Reinforcement Learning | CodeCode Available | 2 | 5 |
| VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT | Mar 4, 2024 | Image CaptioningZero-shot Moment Retrieval | CodeCode Available | 2 | 5 |
| Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Jun 25, 2024 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 | 5 |
| xT: Nested Tokenization for Larger Context in Large Images | Mar 4, 2024 | | CodeCode Available | 2 | 5 |
| MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection | Mar 4, 2024 | GPUMamba | CodeCode Available | 2 | 5 |
| InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | Mar 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 | 5 |
| HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation | Aug 21, 2024 | Image SegmentationMamba | CodeCode Available | 2 | 5 |
| Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People | Mar 6, 2024 | | CodeCode Available | 2 | 5 |
| Learning to Decode Collaboratively with Multiple Language Models | Mar 6, 2024 | Instruction Following | CodeCode Available | 2 | 5 |
| Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers | Jun 25, 2024 | Image GenerationModel Compression | CodeCode Available | 2 | 5 |
| QAQ: Quality Adaptive Quantization for LLM KV Cache | Mar 7, 2024 | QuantizationQuestion Answering | CodeCode Available | 2 | 5 |
| VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models | Mar 8, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems | Mar 8, 2024 | | CodeCode Available | 2 | 5 |
| Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Mar 8, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 | 5 |
| VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models | Mar 10, 2024 | Copy DetectionImage Generation | CodeCode Available | 2 | 5 |
| Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | Mar 12, 2024 | DeblurringDecoder | CodeCode Available | 2 | 5 |
| RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | Mar 12, 2024 | Change DetectionZero-shot Generalization | CodeCode Available | 2 | 5 |
| The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023 | Apr 1, 2024 | MRI Reconstruction | CodeCode Available | 2 | 5 |
| MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning | Mar 13, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments | Mar 14, 2024 | Zero-Shot Learning | CodeCode Available | 2 | 5 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment | Mar 16, 2024 | Image Quality Assessment | CodeCode Available | 2 | 5 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 | 5 |