| Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency | Nov 25, 2024 | QuantizationVideo Restoration | CodeCode Available | 2 |
| MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model | Nov 25, 2024 | Novel View Synthesis | CodeCode Available | 2 |
| Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation | Nov 25, 2024 | Image to 3D | CodeCode Available | 2 |
| SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Nov 25, 2024 | 3D Generation3DGS | CodeCode Available | 2 |
| Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing | Nov 25, 2024 | Privacy Preserving | CodeCode Available | 2 |
| Exploring Discrete Flow Matching for 3D De Novo Molecule Generation | Nov 25, 2024 | | CodeCode Available | 2 |
| Open Vocabulary Monocular 3D Object Detection | Nov 25, 2024 | 3D Object DetectionMonocular 3D Object Detection | CodeCode Available | 2 |
| UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets | Nov 25, 2024 | Segmentation | CodeCode Available | 2 |
| Interpreting Object-level Foundation Models via Visual Precision Search | Nov 25, 2024 | Explainable Artificial Intelligence (XAI)Object | CodeCode Available | 2 |
| Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training | Nov 25, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing | Nov 25, 2024 | DenoisingVideo Generation | CodeCode Available | 2 |
| Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | Nov 25, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| Monocular Lane Detection Based on Deep Learning: A Survey | Nov 25, 2024 | 3D Lane DetectionAutonomous Driving | CodeCode Available | 2 |
| ResCLIP: Residual Attention for Training-free Dense Vision-language Inference | Nov 24, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 |
| Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation | Nov 24, 2024 | Semantic Segmentation | CodeCode Available | 2 |
| Gotta Hear Them All: Sound Source Aware Vision to Audio Generation | Nov 23, 2024 | AllAudio Generation | CodeCode Available | 2 |
| AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation | Nov 23, 2024 | Data AugmentationDiversity | CodeCode Available | 2 |
| Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks | Nov 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable Generation | Nov 23, 2024 | Image GenerationScene Generation | CodeCode Available | 2 |
| Large Language Model with Region-guided Referring and Grounding for CT Report Generation | Nov 23, 2024 | Computed Tomography (CT)Diagnostic | CodeCode Available | 2 |
| A Survey on LLM-as-a-Judge | Nov 23, 2024 | Models AlignmentSurvey | CodeCode Available | 2 |
| Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method | Nov 23, 2024 | Autonomous Driving | CodeCode Available | 2 |
| Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens | Nov 23, 2024 | Hallucination | CodeCode Available | 2 |
| Multi-Reranker: Maximizing performance of retrieval-augmented generation in the FinanceRAG challenge | Nov 23, 2024 | RAGRetrieval | CodeCode Available | 2 |