| Universal Segmentation at Arbitrary Granularity with Language Instruction | Dec 4, 2023 | Referring Expression SegmentationSegmentation | CodeCode Available | 2 |
| Towards Learning a Generalist Model for Embodied Navigation | Dec 4, 2023 | 3D Question Answering (3D-QA)Embodied Question Answering | CodeCode Available | 2 |
| Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Dec 4, 2023 | 3D Human Pose EstimationAction Recognition | CodeCode Available | 2 |
| GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | Dec 4, 2023 | Motion Estimation | CodeCode Available | 2 |
| GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation | Dec 4, 2023 | Novel View Synthesis | CodeCode Available | 2 |
| SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System | Dec 4, 2023 | Computational Efficiency | CodeCode Available | 2 |
| Data Management For Training Large Language Models: A Survey | Dec 4, 2023 | ManagementSurvey | CodeCode Available | 2 |
| D-Bot: Database Diagnosis System using Large Language Models | Dec 3, 2023 | | CodeCode Available | 2 |
| ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation | Dec 2, 2023 | 3D GenerationObject | CodeCode Available | 2 |
| VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity | Dec 1, 2023 | Graph EmbeddingKnowledge Graph Embedding | CodeCode Available | 2 |
| Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting | Dec 1, 2023 | Time SeriesTraffic Prediction | CodeCode Available | 2 |
| DeepCache: Accelerating Diffusion Models for Free | Dec 1, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| 3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation | Dec 1, 2023 | 3D Face ReconstructionFace Reconstruction | CodeCode Available | 2 |
| CoLLiE: Collaborative Training of Large Language Models in an Efficient Way | Dec 1, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| Dense Optical Tracking: Connecting the Dots | Dec 1, 2023 | Optical Flow EstimationPoint Tracking | CodeCode Available | 2 |
| Gaussian Grouping: Segment and Edit Anything in 3D Scenes | Dec 1, 2023 | ColorizationNeRF | CodeCode Available | 2 |
| StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter | Dec 1, 2023 | DisentanglementText-to-Video Generation | CodeCode Available | 2 |
| Segment and Caption Anything | Dec 1, 2023 | Caption Generationobject-detection | CodeCode Available | 2 |
| FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting | Dec 1, 2023 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | Dec 1, 2023 | Image ClassificationMulti-Object Tracking | CodeCode Available | 2 |
| CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization | Nov 30, 2023 | 3DGSNeRF | CodeCode Available | 2 |
| Fast ODE-based Sampling for Diffusion Models in Around 5 Steps | Nov 30, 2023 | Image Generation | CodeCode Available | 2 |
| Distributed Global Structure-from-Motion with a Deep Front-End | Nov 30, 2023 | | CodeCode Available | 2 |
| VTimeLLM: Empower LLM to Grasp Video Moments | Nov 30, 2023 | Dense Video CaptioningTemporal Relation Extraction | CodeCode Available | 2 |
| LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning | Nov 30, 2023 | 3D dense captioningDense Captioning | CodeCode Available | 2 |
| CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation | Nov 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video | Nov 30, 2023 | 3D ReconstructionObject | CodeCode Available | 2 |
| AlignBench: Benchmarking Chinese Alignment of Large Language Models | Nov 30, 2023 | Benchmarking | CodeCode Available | 2 |
| Zero Bubble Pipeline Parallelism | Nov 30, 2023 | Scheduling | CodeCode Available | 2 |
| Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives | Nov 30, 2023 | Video Understanding | CodeCode Available | 2 |
| BioCLIP: A Vision Foundation Model for the Tree of Life | Nov 30, 2023 | | CodeCode Available | 2 |
| Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving | Nov 29, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting | Nov 29, 2023 | | CodeCode Available | 2 |
| HUGS: Human Gaussian Splats | Nov 29, 2023 | 3DGSNeural Rendering | CodeCode Available | 2 |
| Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | Nov 29, 2023 | Autonomous DrivingPrediction | CodeCode Available | 2 |
| A Graph-Based Approach for Category-Agnostic Pose Estimation | Nov 29, 2023 | 2D Pose EstimationAnimal Pose Estimation | CodeCode Available | 2 |
| MMA-Diffusion: MultiModal Attack on Diffusion Models | Nov 29, 2023 | | CodeCode Available | 2 |
| 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling | Nov 29, 2023 | | CodeCode Available | 2 |
| OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation | Nov 29, 2023 | Hallucination | CodeCode Available | 2 |
| Biomedical knowledge graph-optimized prompt generation for large language models | Nov 29, 2023 | BenchmarkingKnowledge Graphs | CodeCode Available | 2 |
| Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution | Nov 29, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information | Nov 29, 2023 | NeRFUncertainty Quantification | CodeCode Available | 2 |
| Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing | Nov 29, 2023 | Super-Resolution | CodeCode Available | 2 |
| GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation | Nov 29, 2023 | 3D GenerationText to 3D | CodeCode Available | 2 |
| Gaussian Shell Maps for Efficient 3D Human Generation | Nov 29, 2023 | | CodeCode Available | 2 |
| TransNeXt: Robust Foveal Visual Perception for Vision Transformers | Nov 28, 2023 | ClassificationDomain Generalization | CodeCode Available | 2 |
| War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | Nov 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding | Nov 28, 2023 | HallucinationObject | CodeCode Available | 2 |
| Graph Prompt Learning: A Comprehensive Survey and Beyond | Nov 28, 2023 | Prompt LearningSurvey | CodeCode Available | 2 |
| SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery | Nov 28, 2023 | Contrastive Learning | CodeCode Available | 2 |