| A Survey on Large Language Model Acceleration based on KV Cache Management | Dec 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Dual Feature Caching | Dec 25, 2024 | Video Generation | CodeCode Available | 3 |
| Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Dec 24, 2024 | Attribute | CodeCode Available | 3 |
| Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation | Dec 24, 2024 | AttributeFraud Detection | CodeCode Available | 3 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 |
| MineStudio: A Streamlined Package for Minecraft AI Agent Development | Dec 24, 2024 | AI AgentDecision Making | CodeCode Available | 3 |
| YuLan-Mini: An Open Data-efficient Language Model | Dec 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Automating the Search for Artificial Life with Foundation Models | Dec 23, 2024 | Artificial LifeIngenuity | CodeCode Available | 3 |
| VidTwin: Video VAE with Decoupled Structure and Dynamics | Dec 23, 2024 | DecoderVideo Generation | CodeCode Available | 3 |
| ResearchTown: Simulator of Human Research Community | Dec 23, 2024 | | CodeCode Available | 3 |
| DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought | Dec 23, 2024 | Machine TranslationMath | CodeCode Available | 3 |
| PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World | Dec 23, 2024 | AI Agent | CodeCode Available | 3 |
| PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask | Dec 22, 2024 | In-Context LearningVirtual Try-on | CodeCode Available | 3 |
| CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Dec 20, 2024 | 8kGPU | CodeCode Available | 3 |
| Aria-UI: Visual Grounding for GUI Instructions | Dec 20, 2024 | Natural Language Visual GroundingVisual Grounding | CodeCode Available | 3 |
| EnvGS: Modeling View-Dependent Appearance with Environment Gaussian | Dec 19, 2024 | Novel View Synthesis | CodeCode Available | 3 |
| MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | Dec 19, 2024 | Image RetrievalRetrieval | CodeCode Available | 3 |
| Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Dec 19, 2024 | Contrastive LearningImage Reconstruction | CodeCode Available | 3 |
| Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Dec 18, 2024 | Representation LearningRobot Manipulation | CodeCode Available | 3 |
| PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling | Dec 18, 2024 | One-Shot Learning | CodeCode Available | 3 |
| DarkIR: Robust Low-Light Image Restoration | Dec 18, 2024 | DeblurringImage Enhancement | CodeCode Available | 3 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer | Dec 18, 2024 | AttributeText Generation | CodeCode Available | 3 |
| CAD-Recode: Reverse Engineering CAD Code from Point Clouds | Dec 18, 2024 | CAD ReconstructionDecoder | CodeCode Available | 3 |
| GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding | Dec 17, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 3 |
| Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance | Dec 17, 2024 | Image GenerationObject | CodeCode Available | 3 |
| VidTok: A Versatile and Open-Source Video Tokenizer | Dec 17, 2024 | QuantizationSSIM | CodeCode Available | 3 |
| BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement | Dec 16, 2024 | Script GenerationText to 3D | CodeCode Available | 3 |
| DARWIN 1.5: Large Language Models as Materials Science Adapted Learners | Dec 16, 2024 | Large Language ModelMulti-Task Learning | CodeCode Available | 3 |
| Embodied CoT Distillation From LLM To Off-the-shelf Agents | Dec 16, 2024 | Decision MakingIn-Context Learning | CodeCode Available | 3 |
| Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation | Dec 16, 2024 | Translation | CodeCode Available | 3 |
| Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey | Dec 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation | Dec 16, 2024 | DecoderSemantic Segmentation | CodeCode Available | 3 |
| PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting | Dec 16, 2024 | 3D Reconstruction4k | CodeCode Available | 3 |
| From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision | Dec 15, 2024 | Active Learning | CodeCode Available | 3 |
| SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer | Dec 14, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games | Dec 14, 2024 | Decision Making | CodeCode Available | 3 |
| FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction | Dec 14, 2024 | Blind DockingDrug Discovery | CodeCode Available | 3 |
| DisPose: Disentangling Pose Guidance for Controllable Human Image Animation | Dec 12, 2024 | Image Animation | CodeCode Available | 3 |
| HadaCore: Tensor Core Accelerated Hadamard Transform Kernel | Dec 12, 2024 | GPUMMLU | CodeCode Available | 3 |
| ATPrompt: Textual Prompt Learning with Embedded Attributes | Dec 12, 2024 | AttributeLarge Language Model | CodeCode Available | 3 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 |
| Memory Layers at Scale | Dec 12, 2024 | | CodeCode Available | 3 |
| Olympus: A Universal Task Router for Computer Vision Tasks | Dec 12, 2024 | | CodeCode Available | 3 |
| Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Dec 11, 2024 | Pose EstimationVisual Localization | CodeCode Available | 3 |
| TapeAgents: a Holistic Framework for Agent Development and Optimization | Dec 11, 2024 | | CodeCode Available | 3 |
| SINERGYM -- A virtual testbed for building energy optimization with Reinforcement Learning | Dec 11, 2024 | continuous-controlContinuous Control | CodeCode Available | 3 |
| TryOffAnyone: Tiled Cloth Generation from a Dressed Person | Dec 11, 2024 | Image-to-Image TranslationVirtual Try-Off | CodeCode Available | 3 |
| CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding | Dec 10, 2024 | EEGEeg Decoding | CodeCode Available | 3 |