| Graph Prompt Learning: A Comprehensive Survey and Beyond | Nov 28, 2023 | Prompt LearningSurvey | CodeCode Available | 2 |
| TransNeXt: Robust Foveal Visual Perception for Vision Transformers | Nov 28, 2023 | ClassificationDomain Generalization | CodeCode Available | 2 |
| LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS | Nov 28, 2023 | Knowledge DistillationNeRF | CodeCode Available | 2 |
| Text-Driven Image Editing via Learnable Regions | Nov 28, 2023 | Image Generation | CodeCode Available | 2 |
| SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery | Nov 28, 2023 | Contrastive Learning | CodeCode Available | 2 |
| SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors | Nov 28, 2023 | DecoderTexture Synthesis | CodeCode Available | 2 |
| SEED-Bench-2: Benchmarking Multimodal Large Language Models | Nov 28, 2023 | BenchmarkingImage Generation | CodeCode Available | 2 |
| War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | Nov 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Panacea: Panoramic and Controllable Video Generation for Autonomous Driving | Nov 28, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras | Nov 28, 2023 | Neural Rendering | CodeCode Available | 2 |
| Source-Free Domain Adaptation with Frozen Multimodal Foundation Model | Nov 27, 2023 | Domain AdaptationPrompt Learning | CodeCode Available | 2 |
| OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving | Nov 27, 2023 | Autonomous Driving | CodeCode Available | 2 |
| CoSeR: Bridging Image and Language for Cognitive Super-Resolution | Nov 27, 2023 | Super-Resolution | CodeCode Available | 2 |
| LLMGA: Multimodal Large Language Model based Generation Assistant | Nov 27, 2023 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Nov 27, 2023 | 6D Pose Estimation using RGBInstance Segmentation | CodeCode Available | 2 |
| SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion | Nov 27, 2023 | Lifelike 3D Human Generation | CodeCode Available | 2 |
| Optimal Transport Aggregation for Visual Place Recognition | Nov 27, 2023 | Re-RankingVisual Place Recognition | CodeCode Available | 2 |
| XLB: A differentiable massively parallel lattice Boltzmann library in Python | Nov 27, 2023 | CPUGPU | CodeCode Available | 2 |
| On Bringing Robots Home | Nov 27, 2023 | | CodeCode Available | 2 |
| YUAN 2.0: A Large Language Model with Localized Filtering-based Attention | Nov 27, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| GS-IR: 3D Gaussian Splatting for Inverse Rendering | Nov 26, 2023 | Inverse RenderingNeRF | CodeCode Available | 2 |
| Flow-Guided Diffusion for Video Inpainting | Nov 26, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Algorithm Evolution Using Large Language Model | Nov 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sketch Video Synthesis | Nov 26, 2023 | Video Editing | CodeCode Available | 2 |
| NeuRAD: Neural Rendering for Autonomous Driving | Nov 26, 2023 | Autonomous DrivingData Augmentation | CodeCode Available | 2 |
| Adapter is All You Need for Tuning Visual Tasks | Nov 25, 2023 | Allimage-classification | CodeCode Available | 2 |
| MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation | Nov 24, 2023 | 3D GenerationImage Generation | CodeCode Available | 2 |
| OneFormer3D: One Transformer for Unified Point Cloud Segmentation | Nov 24, 2023 | 3D Instance Segmentation3D Object Detection | CodeCode Available | 2 |
| Differentiable and accelerated spherical harmonic and Wigner transforms | Nov 24, 2023 | | CodeCode Available | 2 |
| Controlled Text Generation via Language Model Arithmetic | Nov 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GeoChat: Grounded Large Vision-Language Model for Remote Sensing | Nov 24, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence | Nov 23, 2023 | 3D Reconstruction6D Pose Estimation | CodeCode Available | 2 |
| FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design | Nov 23, 2023 | Decision MakingLanguage Modelling | CodeCode Available | 2 |
| PyVRP: a high-performance VRP solver package | Nov 22, 2023 | | CodeCode Available | 2 |
| SegVol: Universal and Interactive Volumetric Medical Image Segmentation | Nov 22, 2023 | Computed Tomography (CT)Image Segmentation | CodeCode Available | 2 |
| Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model | Nov 22, 2023 | DenoisingGPU | CodeCode Available | 2 |
| Learning to Fly in Seconds | Nov 22, 2023 | GPUReinforcement Learning (RL) | CodeCode Available | 2 |
| PG-Video-LLaVA: Pixel Grounding Large Video-Language Models | Nov 22, 2023 | BenchmarkingPhrase Grounding | CodeCode Available | 2 |
| ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | Nov 22, 2023 | | CodeCode Available | 2 |
| Compact 3D Gaussian Representation for Radiance Field | Nov 22, 2023 | 3DGSModel Compression | CodeCode Available | 2 |
| Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models | Nov 22, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images | Nov 22, 2023 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| Intrinsic Image Decomposition via Ordinal Shading | Nov 21, 2023 | Intrinsic Image DecompositionInverse Rendering | CodeCode Available | 2 |
| A Survey of Graph Meets Large Language Model: Progress and Future Directions | Nov 21, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey | Nov 21, 2023 | Navigate | CodeCode Available | 2 |
| GAIA: a benchmark for General AI Assistants | Nov 21, 2023 | Philosophy | CodeCode Available | 2 |
| SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction | Nov 21, 2023 | Autonomous DrivingDepth Estimation | CodeCode Available | 2 |
| Swift Parameter-free Attention Network for Efficient Super-Resolution | Nov 21, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Diffusion Model Alignment Using Direct Preference Optimization | Nov 21, 2023 | modelText-to-Image Generation | CodeCode Available | 2 |
| AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Nov 21, 2023 | Image AnimationImage to Video Generation | CodeCode Available | 2 |