| Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | Feb 5, 2024 | Science Question AnsweringText-to-Video Generation | CodeCode Available | 4 |
| ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection | Feb 5, 2024 | 3D Object DetectionActive Learning | CodeCode Available | 4 |
| InstanceDiffusion: Instance-level Control for Image Generation | Feb 5, 2024 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 4 |
| Timer: Generative Pre-trained Transformers Are Large Time Series Models | Feb 4, 2024 | Anomaly DetectionImputation | CodeCode Available | 4 |
| VM-UNet: Vision Mamba UNet for Medical Image Segmentation | Feb 4, 2024 | Image SegmentationMamba | CodeCode Available | 4 |
| DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing | Feb 4, 2024 | Image Generation | CodeCode Available | 4 |
| LLM-Enhanced Data Management | Feb 4, 2024 | HallucinationManagement | CodeCode Available | 4 |
| Image Fusion via Vision-Language Model | Feb 3, 2024 | DecoderLanguage Modeling | CodeCode Available | 4 |
| Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey | Feb 3, 2024 | parameter-efficient fine-tuningTransfer Learning | CodeCode Available | 4 |
| Boximator: Generating Rich and Controllable Motions for Video Synthesis | Feb 2, 2024 | | CodeCode Available | 4 |
| Nomic Embed: Training a Reproducible Long Context Text Embedder | Feb 2, 2024 | | CodeCode Available | 4 |
| KTO: Model Alignment as Prospect Theoretic Optimization | Feb 2, 2024 | Attributemodel | CodeCode Available | 4 |
| Large Language Models for Time Series: A Survey | Feb 2, 2024 | QuantizationSurvey | CodeCode Available | 4 |
| A Comprehensive Survey on 3D Content Generation | Feb 2, 2024 | Survey | CodeCode Available | 4 |
| Lightweight Pixel Difference Networks for Efficient Visual Representation Learning | Feb 1, 2024 | Edge DetectionObject Recognition | CodeCode Available | 4 |
| Recurrent Partial Kernel Network for Efficient Optical Flow Estimation | Feb 1, 2024 | Optical Flow Estimation | CodeCode Available | 4 |
| AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data | Feb 1, 2024 | Conditional Image GenerationDenoising | CodeCode Available | 4 |
| Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion | Jan 31, 2024 | | CodeCode Available | 4 |
| I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench | Jan 31, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 4 |
| Proactive Detection of Voice Cloning with Localized Watermarking | Jan 30, 2024 | Voice Cloning | CodeCode Available | 4 |
| InstructIR: High-Quality Image Restoration Following Human Instructions | Jan 29, 2024 | DeblurringDenoising | CodeCode Available | 4 |
| Continual Learning with Pre-Trained Models: A Survey | Jan 29, 2024 | Continual LearningFairness | CodeCode Available | 4 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 |
| SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation | Jan 24, 2024 | Image SegmentationMamba | CodeCode Available | 4 |
| OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics | Jan 22, 2024 | object-detectionObject Detection | CodeCode Available | 4 |
| Orion-14B: Open-source Multilingual Large Language Models | Jan 20, 2024 | Scheduling | CodeCode Available | 4 |
| Knowledge Fusion of Large Language Models | Jan 19, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 |
| GPAvatar: Generalizable and Precise Head Avatar from Image(s) | Jan 18, 2024 | Neural RenderingNovel View Synthesis | CodeCode Available | 4 |
| PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Jan 17, 2024 | GPUIncremental Learning | CodeCode Available | 4 |
| ReFT: Reasoning with Reinforced Fine-Tuning | Jan 17, 2024 | GSM8KMath | CodeCode Available | 4 |
| Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation | Jan 16, 2024 | DecoderMachine Translation | CodeCode Available | 4 |
| Transformer for Object Re-Identification: A Survey | Jan 13, 2024 | ObjectSurvey | CodeCode Available | 4 |
| Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering | Jan 12, 2024 | 3D Panoptic Segmentation3D Semantic Segmentation | CodeCode Available | 4 |
| Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications | Jan 11, 2024 | image-classificationImage Classification | CodeCode Available | 4 |
| TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering | Jan 11, 2024 | Computational EfficiencyNovel View Synthesis | CodeCode Available | 4 |
| TOFU: A Task of Fictitious Unlearning for LLMs | Jan 11, 2024 | | CodeCode Available | 4 |
| TrustLLM: Trustworthiness in Large Language Models | Jan 10, 2024 | EthicsFairness | CodeCode Available | 4 |
| Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series | Jan 8, 2024 | CPUFew-Shot Learning | CodeCode Available | 4 |
| Mixtral of Experts | Jan 8, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 |
| CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | Jan 5, 2024 | HumanEvalPrediction | CodeCode Available | 4 |
| Efficient Parameter Optimisation for Quantum Kernel Alignment: A Sub-sampling Approach in Variational Training | Jan 5, 2024 | Quantum Machine Learning | CodeCode Available | 4 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 |
| GPT-4V(ision) is a Generalist Web Agent, if Grounded | Jan 3, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 4 |
| LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning | Jan 2, 2024 | | CodeCode Available | 4 |
| PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor | Jan 1, 2024 | Object | CodeCode Available | 4 |
| V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs | Jan 1, 2024 | Visual GroundingWorld Knowledge | CodeCode Available | 4 |
| Video Understanding with Large Language Models: A Survey | Dec 29, 2023 | SurveyVideo Understanding | CodeCode Available | 4 |
| Fast Inference of Mixture-of-Experts Language Models with Offloading | Dec 28, 2023 | Mixture-of-ExpertsQuantization | CodeCode Available | 4 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 |
| G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | Dec 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |