| AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans | Nov 27, 2024 | Navigate | CodeCode Available | 2 |
| GaussianSpeech: Audio-Driven Gaussian Avatars | Nov 27, 2024 | 3DGS | CodeCode Available | 2 |
| TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability | Nov 27, 2024 | Temporal LocalizationVideo Understanding | CodeCode Available | 2 |
| Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators | Nov 27, 2024 | GPU | CodeCode Available | 2 |
| PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution | Nov 26, 2024 | DenoisingImage Super-Resolution | CodeCode Available | 2 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |
| Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment | Nov 26, 2024 | Image Quality AssessmentQuestion Answering | CodeCode Available | 2 |
| vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Nov 26, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis | Nov 26, 2024 | Denoising | CodeCode Available | 2 |
| Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Nov 26, 2024 | 3D ReconstructionCamera Calibration | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension | Nov 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 |
| Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative Trading | Nov 26, 2024 | Offline RLparameter-efficient fine-tuning | CodeCode Available | 2 |
| TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Nov 26, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers | Nov 26, 2024 | Contrastive LearningImage Restoration | CodeCode Available | 2 |
| Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering | Nov 26, 2024 | PrognosisQuestion Answering | CodeCode Available | 2 |
| Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints | Nov 26, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Task Singular Vectors: Reducing Task Interference in Model Merging | Nov 26, 2024 | ClassificationImage Classification | CodeCode Available | 2 |
| Exploring Discrete Flow Matching for 3D De Novo Molecule Generation | Nov 25, 2024 | | CodeCode Available | 2 |
| Open Vocabulary Monocular 3D Object Detection | Nov 25, 2024 | 3D Object DetectionMonocular 3D Object Detection | CodeCode Available | 2 |
| Interpreting Object-level Foundation Models via Visual Precision Search | Nov 25, 2024 | Explainable Artificial Intelligence (XAI)Object | CodeCode Available | 2 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 |
| Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency | Nov 25, 2024 | QuantizationVideo Restoration | CodeCode Available | 2 |