| MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost | Dec 2, 2024 | Image Generation | CodeCode Available | 3 |
| FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration | Dec 2, 2024 | Image RestorationIncremental Learning | CodeCode Available | 3 |
| Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes | Dec 2, 2024 | In-Context LearningVideo Segmentation | CodeCode Available | 3 |
| Towards Universal Soccer Video Understanding | Dec 2, 2024 | Action ClassificationSports Understanding | CodeCode Available | 3 |
| Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Dec 2, 2024 | Human Instance SegmentationPose-Based Human Instance Segmentation | CodeCode Available | 3 |
| HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving | Dec 2, 2024 | Autonomous DrivingNovel View Synthesis | CodeCode Available | 3 |
| HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | Dec 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation | Dec 2, 2024 | Image ReconstructionQuantization | CodeCode Available | 3 |
| emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Dec 2, 2024 | AnatomyHand Pose Estimation | CodeCode Available | 3 |
| Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion | Dec 1, 2024 | DenoisingOptical Flow Estimation | CodeCode Available | 3 |
| Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives | Nov 30, 2024 | 3D Scene ReconstructionNeRF | CodeCode Available | 3 |
| o1-Coder: an o1 Replication for Coding | Nov 29, 2024 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Nov 29, 2024 | Decision MakingRAG | CodeCode Available | 3 |
| Scaling Transformers for Low-Bitrate High-Quality Speech Coding | Nov 29, 2024 | Quantization | CodeCode Available | 3 |
| Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction | Nov 28, 2024 | 3D ReconstructionDiagnostic | CodeCode Available | 3 |
| TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution | Nov 27, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 3 |
| HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction | Nov 27, 2024 | 3DGS | CodeCode Available | 3 |
| Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models | Nov 27, 2024 | ClassificationSentence | CodeCode Available | 3 |
| ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | Nov 27, 2024 | | CodeCode Available | 3 |
| Large Language Model-Brained GUI Agents: A Survey | Nov 27, 2024 | Code GenerationLanguage Modeling | CodeCode Available | 3 |
| SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation | Nov 26, 2024 | DiversityImage Segmentation | CodeCode Available | 3 |
| Star Attention: Efficient LLM Inference over Long Sequences | Nov 26, 2024 | Computational Efficiency | CodeCode Available | 3 |
| OSDFace: One-Step Diffusion Model for Face Restoration | Nov 26, 2024 | Face RecognitionGenerative Adversarial Network | CodeCode Available | 3 |
| DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting | Nov 26, 2024 | Camera CalibrationDepth Estimation | CodeCode Available | 3 |
| SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation | Nov 26, 2024 | Natural Language UnderstandingReferring Video Object Segmentation | CodeCode Available | 3 |