| MegActor: Harness the Power of Raw Video for Vivid Portrait Animation | May 31, 2024 | Portrait AnimationStyle Transfer | CodeCode Available | 4 |
| SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation | May 30, 2024 | AttributeAutonomous Driving | CodeCode Available | 4 |
| Grokfast: Accelerated Grokking by Amplifying Slow Gradients | May 30, 2024 | | CodeCode Available | 4 |
| S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving | May 30, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 |
| PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices | May 30, 2024 | Scheduling | CodeCode Available | 4 |
| MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | May 30, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets | May 30, 2024 | 2k3D geometry | CodeCode Available | 4 |
| Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI | May 29, 2024 | EEGElectroencephalogram (EEG) | CodeCode Available | 4 |
| MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | May 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| LLMs Meet Multimodal Generation and Editing: A Survey | May 29, 2024 | multimodal generationSurvey | CodeCode Available | 4 |
| ReChorus2.0: A Modular and Task-Flexible Recommendation Library | May 28, 2024 | Click-Through Rate PredictionRecommendation Systems | CodeCode Available | 4 |
| Phased Consistency Models | May 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 4 |
| Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | May 28, 2024 | Computational EfficiencyComputed Tomography (CT) | CodeCode Available | 4 |
| GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction | May 27, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 |
| PromptFix: You Prompt and We Fix the Photo | May 27, 2024 | DenoisingImage Generation | CodeCode Available | 4 |
| The Importance of Directional Feedback for LLM-based Optimizers | May 26, 2024 | | CodeCode Available | 4 |
| Trackastra: Transformer-based cell tracking for live-cell microscopy | May 24, 2024 | Cell TrackingMultiple Object Tracking | CodeCode Available | 4 |
| Looking Backward: Streaming Video-to-Video Translation with Feature Banks | May 24, 2024 | GPUTranslation | CodeCode Available | 4 |
| Quality-aware Masked Diffusion Transformer for Enhanced Music Generation | May 24, 2024 | DiversityMusic Generation | CodeCode Available | 4 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 |
| SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow | May 23, 2024 | Optical Flow Estimation | CodeCode Available | 4 |
| CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner | May 23, 2024 | 3D Generation3D geometry | CodeCode Available | 4 |
| AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct | May 23, 2024 | Class-level Code GenerationCode Completion | CodeCode Available | 4 |
| A Survey on Vision-Language-Action Models for Embodied AI | May 23, 2024 | Image CaptioningInstruction Following | CodeCode Available | 4 |
| AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | May 23, 2024 | Benchmarking | CodeCode Available | 4 |
| Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation | May 21, 2024 | Deep Learning | CodeCode Available | 4 |
| OmniGlue: Generalizable Feature Matching with Foundation Model Guidance | May 21, 2024 | | CodeCode Available | 4 |
| TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models | May 20, 2024 | Philosophy | CodeCode Available | 4 |
| MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo | May 20, 2024 | NeRFNovel View Synthesis | CodeCode Available | 4 |
| ViViD: Video Virtual Try-on using Diffusion Models | May 20, 2024 | Virtual Try-on | CodeCode Available | 4 |
| Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology | May 19, 2024 | Multiple Instance LearningRepresentation Learning | CodeCode Available | 4 |
| Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System | May 17, 2024 | Data AugmentationSpeech Dereverberation | CodeCode Available | 4 |
| PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology | May 16, 2024 | whole slide images | CodeCode Available | 4 |
| MarkLLM: An Open-Source Toolkit for LLM Watermarking | May 16, 2024 | | CodeCode Available | 4 |
| IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency | May 16, 2024 | | CodeCode Available | 4 |
| Conformalized Physics-Informed Neural Networks | May 13, 2024 | Conformal Prediction | CodeCode Available | 4 |
| PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator | May 13, 2024 | | CodeCode Available | 4 |
| The Platonic Representation Hypothesis | May 13, 2024 | | CodeCode Available | 4 |
| Look Once to Hear: Target Speech Hearing with Noisy Examples | May 10, 2024 | CPUSpeech Extraction | CodeCode Available | 4 |
| Exploring the Capabilities of Large Multimodal Models on Dense Text | May 9, 2024 | Prompt EngineeringVisual Question Answering (VQA) | CodeCode Available | 4 |
| LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit | May 9, 2024 | BenchmarkingComputational Efficiency | CodeCode Available | 4 |
| LangCell: Language-Cell Pre-training for Cell Identity Understanding | May 9, 2024 | | CodeCode Available | 4 |
| Aequitas Flow: Streamlining Fair ML Experimentation | May 9, 2024 | BenchmarkingFairness | CodeCode Available | 4 |
| A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective | May 8, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 |
| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | May 7, 2024 | GPULanguage Modelling | CodeCode Available | 4 |
| DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks | May 7, 2024 | BinarizationDeblurring | CodeCode Available | 4 |
| SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing | May 7, 2024 | Image ManipulationLanguage Modeling | CodeCode Available | 4 |
| Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods | May 6, 2024 | | CodeCode Available | 4 |
| Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond | May 6, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 4 |