| Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion | Mar 20, 2024 | Autonomous VehiclesDenoising | CodeCode Available | 3 |
| Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion | Mar 20, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping | Mar 20, 2024 | Optical Flow EstimationSensor Fusion | CodeCode Available | 3 |
| FaceXFormer: A Unified Transformer for Facial Analysis | Mar 19, 2024 | Age and Gender EstimationAge Estimation | CodeCode Available | 3 |
| WHAC: World-grounded Humans and Cameras | Mar 19, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 3 |
| Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Mar 19, 2024 | Hallucination | CodeCode Available | 3 |
| STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space Model | Mar 19, 2024 | Computational EfficiencyGraph Learning | CodeCode Available | 3 |
| AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Mar 19, 2024 | BenchmarkingFinancial Analysis | CodeCode Available | 3 |
| Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images | Mar 19, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 3 |
| Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Mar 19, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 3 |
| VmambaIR: Visual State Space Model for Image Restoration | Mar 18, 2024 | DenoisingImage Restoration | CodeCode Available | 3 |
| SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion | Mar 18, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation | Mar 18, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting | Mar 18, 2024 | 3D Scene ReconstructionDeblurring | CodeCode Available | 3 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 |
| From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models | Mar 18, 2024 | Chart UnderstandingData Visualization | CodeCode Available | 3 |
| Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning | Mar 18, 2024 | Graph SamplingKnowledge Graphs | CodeCode Available | 3 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 |
| LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images | Mar 18, 2024 | Long-Context UnderstandingTextVQA | CodeCode Available | 3 |
| Generic 3D Diffusion Adapter Using Controlled Multi-View Editing | Mar 18, 2024 | 3D GenerationImage Generation | CodeCode Available | 3 |
| Is Mamba Effective for Time Series Forecasting? | Mar 17, 2024 | Computational EfficiencyMamba | CodeCode Available | 3 |
| Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting | Mar 15, 2024 | 3D GenerationImage to 3D | CodeCode Available | 3 |
| EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields | Mar 14, 2024 | Denoising | CodeCode Available | 3 |
| LocalMamba: Visual State Space Model with Windowed Selective Scan | Mar 14, 2024 | MambaState Space Models | CodeCode Available | 3 |
| Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Mar 14, 2024 | MambaMoment Retrieval | CodeCode Available | 3 |
| Deep Limit Order Book Forecasting | Mar 14, 2024 | Deep Learning | CodeCode Available | 3 |
| Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting | Mar 14, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| Recurrent Drafter for Fast Speculative Decoding in Large Language Models | Mar 14, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 3 |
| Score-Guided Diffusion for 3D Human Recovery | Mar 14, 2024 | DenoisingHuman Mesh Recovery | CodeCode Available | 3 |
| GiT: Towards Generalist Vision Transformer through Universal Language Interface | Mar 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting | Mar 14, 2024 | Computational EfficiencyMamba | CodeCode Available | 3 |
| OverleafCopilot: Empowering Academic Writing in Overleaf with Large Language Models | Mar 13, 2024 | | CodeCode Available | 3 |
| From human experts to machines: An LLM supported approach to ontology and knowledge graph construction | Mar 13, 2024 | graph constructionKnowledge Graphs | CodeCode Available | 3 |
| ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation | Mar 13, 2024 | Simulated Gaussian Manipulation | CodeCode Available | 3 |
| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 13, 2024 | Instance SegmentationObject Detection | CodeCode Available | 3 |
| GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting | Mar 13, 2024 | GPUQuantization | CodeCode Available | 3 |
| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 12, 2024 | Prediction | CodeCode Available | 3 |
| SemCity: Semantic Scene Generation with Triplane Diffusion | Mar 12, 2024 | Scene Generation | CodeCode Available | 3 |
| Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model | Mar 12, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting | Mar 12, 2024 | 3DGSDecoder | CodeCode Available | 3 |
| MoAI: Mixture of All Intelligence for Large Language and Vision Models | Mar 12, 2024 | AllMixture-of-Experts | CodeCode Available | 3 |
| SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression | Mar 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction | Mar 12, 2024 | | CodeCode Available | 3 |
| Unified Source-Free Domain Adaptation | Mar 12, 2024 | Domain AdaptationLanguage Modelling | CodeCode Available | 3 |
| DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations | Mar 11, 2024 | Disentanglement | CodeCode Available | 3 |
| DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization | Mar 11, 2024 | Novel View Synthesis | CodeCode Available | 3 |
| IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages | Mar 11, 2024 | Articles | CodeCode Available | 3 |
| Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts | Mar 11, 2024 | Anomaly Detection | CodeCode Available | 3 |
| DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Mar 11, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |