| SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | Mar 25, 2024 | DecoderGPU | CodeCode Available | 4 |
| Long-CLIP: Unlocking the Long-Text Capability of CLIP | Mar 22, 2024 | Image GenerationImage Retrieval | CodeCode Available | 4 |
| KeyPoint Relative Position Encoding for Face Recognition | Mar 21, 2024 | Face RecognitionGait Recognition | CodeCode Available | 4 |
| AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks | Mar 21, 2024 | Image to Video GenerationStyle Transfer | CodeCode Available | 4 |
| GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation | Mar 21, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 4 |
| An Entropy-based Text Watermarking Detection Method | Mar 20, 2024 | | CodeCode Available | 4 |
| RewardBench: Evaluating Reward Models for Language Modeling | Mar 20, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 4 |
| DepthFM: Fast Monocular Depth Estimation with Flow Matching | Mar 20, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 4 |
| The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence | Mar 20, 2024 | | CodeCode Available | 4 |
| RGBD GS-ICP SLAM | Mar 19, 2024 | 3DGSSimultaneous Localization and Mapping | CodeCode Available | 4 |
| FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation | Mar 19, 2024 | Translationvalid | CodeCode Available | 4 |
| Arc2Face: A Foundation Model for ID-Consistent Human Faces | Mar 18, 2024 | Diffusion PersonalizationDiffusion Personalization Tuning Free | CodeCode Available | 4 |
| LSKNet: A Foundation Lightweight Backbone for Remote Sensing | Mar 18, 2024 | Change Detectionobject-detection | CodeCode Available | 4 |
| EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models | Mar 18, 2024 | | CodeCode Available | 4 |
| GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image | Mar 18, 2024 | 3D geometry3D Reconstruction | CodeCode Available | 4 |
| OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models | Mar 16, 2024 | DenoisingImage Generation | CodeCode Available | 4 |
| HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation | Mar 15, 2024 | | CodeCode Available | 4 |
| WavCraft: Audio Editing and Generation with Large Language Models | Mar 14, 2024 | In-Context Learning | CodeCode Available | 4 |
| SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models | Mar 14, 2024 | BlockingGPU | CodeCode Available | 4 |
| depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers | Mar 14, 2024 | | CodeCode Available | 4 |
| Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | Mar 14, 2024 | GSM8KLanguage Modelling | CodeCode Available | 4 |
| Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Mar 13, 2024 | Image AnimationImage to Video Generation | CodeCode Available | 4 |
| SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation | Mar 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| Couler: Unified Machine Learning Workflow Optimization in Cloud | Mar 12, 2024 | CPU | CodeCode Available | 4 |
| An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | Mar 11, 2024 | Computational EfficiencyVideo Understanding | CodeCode Available | 4 |
| SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Mar 11, 2024 | 2D Object Detection2k | CodeCode Available | 4 |
| NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning | Mar 11, 2024 | Collision AvoidanceMotion Generation | CodeCode Available | 4 |
| V3D: Video Diffusion Models are Effective 3D Generators | Mar 11, 2024 | 3D GenerationNovel View Synthesis | CodeCode Available | 4 |
| No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks | Mar 10, 2024 | Financial Analysis | CodeCode Available | 4 |
| SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting | Mar 8, 2024 | GPU | CodeCode Available | 4 |
| Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level | Mar 7, 2024 | | CodeCode Available | 4 |
| UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining | Mar 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis | Mar 7, 2024 | CT ReconstructionNeRF | CodeCode Available | 4 |
| Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | Mar 7, 2024 | 3D ReconstructionImage Retrieval | CodeCode Available | 4 |
| MedMamba: Vision Mamba for Medical Image Classification | Mar 6, 2024 | Classificationimage-classification | CodeCode Available | 4 |
| The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | Mar 5, 2024 | Multiple-choice | CodeCode Available | 4 |
| Evolution Transformer: In-Context Evolutionary Optimization | Mar 5, 2024 | | CodeCode Available | 4 |
| Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures | Mar 4, 2024 | image-classificationImage Classification | CodeCode Available | 4 |
| 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors | Mar 4, 2024 | 3D GenerationText to 3D | CodeCode Available | 4 |
| ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models | Mar 4, 2024 | Image Generation | CodeCode Available | 4 |
| TUMTraf V2X Cooperative Perception Dataset | Mar 2, 2024 | 3D Object DetectionAutonomous Vehicles | CodeCode Available | 4 |
| Rethinking Inductive Biases for Surface Normal Estimation | Mar 1, 2024 | Surface Normal Estimation | CodeCode Available | 4 |
| UniTS: A Unified Multi-Task Time Series Model | Feb 29, 2024 | Anomaly DetectionImputation | CodeCode Available | 4 |
| The All-Seeing Project V2: Towards General Relation Comprehension of the Open World | Feb 29, 2024 | AllHallucination | CodeCode Available | 4 |
| Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Feb 29, 2024 | RetrievalText Retrieval | CodeCode Available | 4 |
| DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models | Feb 29, 2024 | GPU | CodeCode Available | 4 |
| Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation | Feb 28, 2024 | AttributeExtractive Question-Answering | CodeCode Available | 4 |
| Diffusion Model-Based Image Editing: A Survey | Feb 27, 2024 | DenoisingImage Generation | CodeCode Available | 4 |
| Tower: An Open Multilingual Large Language Model for Translation-Related Tasks | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | Feb 27, 2024 | All | CodeCode Available | 4 |