| Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting | Dec 6, 2023 | Simultaneous Localization and Mapping | CodeCode Available | 3 |
| MatterGen: a generative model for inorganic materials design | Dec 6, 2023 | model | CodeCode Available | 3 |
| Style Aligned Image Generation via Shared Attention | Dec 4, 2023 | Image Generation | CodeCode Available | 3 |
| AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix | Dec 4, 2023 | Recommendation Systems | CodeCode Available | 3 |
| Class Symbolic Regression: Gotta Fit 'Em All | Dec 4, 2023 | AllDeep Reinforcement Learning | CodeCode Available | 3 |
| StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On | Dec 4, 2023 | Semantic correspondenceVirtual Try-on | CodeCode Available | 3 |
| UniGS: Unified Representation for Image Generation and Segmentation | Dec 4, 2023 | Image GenerationSegmentation | CodeCode Available | 3 |
| PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation | Dec 4, 2023 | Depth Estimation | CodeCode Available | 3 |
| Sequential Modeling Enables Scalable Learning for Large Vision Models | Dec 1, 2023 | Diversity | CodeCode Available | 3 |
| Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment | Dec 1, 2023 | Contrastive LearningFew-Shot Learning | CodeCode Available | 3 |
| Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering | Nov 30, 2023 | Neural Rendering | CodeCode Available | 3 |
| One-step Diffusion with Distribution Matching Distillation | Nov 30, 2023 | | CodeCode Available | 3 |
| Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model | Nov 29, 2023 | DiversityLanguage Modeling | CodeCode Available | 3 |
| MoMask: Generative Masked Modeling of 3D Human Motions | Nov 29, 2023 | Human motion predictionMotion Forecasting | CodeCode Available | 3 |
| VBench: Comprehensive Benchmark Suite for Video Generative Models | Nov 29, 2023 | Image GenerationVideo Generation | CodeCode Available | 3 |
| SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis | Nov 29, 2023 | NeRFTalking Face Generation | CodeCode Available | 3 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition | Nov 27, 2023 | Image ClassificationObject Detection | CodeCode Available | 3 |
| Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling | Nov 27, 2023 | NeRF | CodeCode Available | 3 |
| MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Nov 27, 2023 | Image Animation | CodeCode Available | 3 |
| Mip-Splatting: Alias-free 3D Gaussian Splatting | Nov 27, 2023 | Novel View Synthesis | CodeCode Available | 3 |
| Robust Self-calibration of Focal Lengths from the Fundamental Matrix | Nov 27, 2023 | | CodeCode Available | 3 |
| GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting | Nov 24, 2023 | NeRF | CodeCode Available | 3 |
| Language Model Inversion | Nov 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? | Nov 22, 2023 | AllData Compression | CodeCode Available | 3 |
| HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis | Nov 21, 2023 | Speech SynthesisSuper-Resolution | CodeCode Available | 3 |
| Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models | Nov 20, 2023 | Image Generation | CodeCode Available | 3 |
| SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks | Nov 20, 2023 | DiversityImage Segmentation | CodeCode Available | 3 |
| FinanceBench: A New Benchmark for Financial Question Answering | Nov 20, 2023 | How to refund a wrong transaction in PhonePeQuestion Answering | CodeCode Available | 3 |
| Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection | Nov 19, 2023 | 2D Object DetectionDeepFake Detection | CodeCode Available | 3 |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Nov 18, 2023 | Video Generation | CodeCode Available | 3 |
| Mind the map! Accounting for existing map information when estimating online HDMaps from sensor | Nov 17, 2023 | Autonomous Driving | CodeCode Available | 3 |
| Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | Nov 14, 2023 | Acoustic Scene ClassificationAudio captioning | CodeCode Available | 3 |
| Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection | Nov 13, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models | Nov 11, 2023 | Image CaptioningMMR total | CodeCode Available | 3 |
| A Survey of Large Language Models in Medicine: Progress, Application, and Challenge | Nov 9, 2023 | | CodeCode Available | 3 |
| LRM: Large Reconstruction Model for Single Image to 3D | Nov 8, 2023 | Image to 3DNeRF | CodeCode Available | 3 |
| Large Language Model based Long-tail Query Rewriting in Taobao Search | Nov 7, 2023 | Contrastive LearningLanguage Modeling | CodeCode Available | 3 |
| Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion | Nov 6, 2023 | Segmentation | CodeCode Available | 3 |
| S-LoRA: Serving Thousands of Concurrent LoRA Adapters | Nov 6, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 3 |
| LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion | Nov 4, 2023 | BenchmarkingImitation Learning | CodeCode Available | 3 |
| LLM4Drive: A Survey of Large Language Models for Autonomous Driving | Nov 2, 2023 | Autonomous DrivingFew-Shot Learning | CodeCode Available | 3 |
| Skywork: A More Open Bilingual Foundation Model | Oct 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Punica: Multi-Tenant LoRA Serving | Oct 28, 2023 | GPU | CodeCode Available | 3 |
| Mosaic: An Architecture for Scalable & Interoperable Data Views | Oct 26, 2023 | | CodeCode Available | 3 |
| SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence | Oct 25, 2023 | Code Generation | CodeCode Available | 3 |
| TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs | Oct 25, 2023 | Autonomous DrivingGPU | CodeCode Available | 3 |
| SkyMath: Technical Report | Oct 25, 2023 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection | Oct 24, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| SALMONN: Towards Generic Hearing Abilities for Large Language Models | Oct 20, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 3 |
| Safe RLHF: Safe Reinforcement Learning from Human Feedback | Oct 19, 2023 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |