| SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion | Nov 27, 2023 | Lifelike 3D Human Generation | CodeCode Available | 2 |
| TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data | Feb 12, 2025 | | CodeCode Available | 2 |
| SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Nov 27, 2023 | 6D Pose Estimation using RGBInstance Segmentation | CodeCode Available | 2 |
| Text-Driven Image Editing via Learnable Regions | Nov 28, 2023 | Image Generation | CodeCode Available | 2 |
| LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers | Feb 25, 2025 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 |
| Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models | May 25, 2023 | All | CodeCode Available | 2 |
| RevColV2: Exploring Disentangled Representations in Masked Image Modeling | Sep 2, 2023 | Decoderimage-classification | CodeCode Available | 2 |
| Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier | Oct 28, 2024 | Audio Deepfake DetectionAudio Generation | CodeCode Available | 2 |
| UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild | May 18, 2023 | Image Generation | CodeCode Available | 2 |
| SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints | Dec 5, 2023 | Model OptimizationNovel Concepts | CodeCode Available | 2 |
| Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey | Dec 5, 2023 | | CodeCode Available | 2 |
| HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising | Nov 23, 2022 | DenoisingVector Graphics | CodeCode Available | 2 |
| Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | Dec 7, 2023 | Domain Generalization | CodeCode Available | 2 |
| DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data | Jun 15, 2023 | | CodeCode Available | 2 |
| Training-Free Text-Guided Image Editing with Visual Autoregressive Model | Mar 31, 2025 | text-guided-image-editing | CodeCode Available | 2 |
| SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining | Mar 25, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models | Dec 21, 2023 | 2k | CodeCode Available | 2 |
| HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | Dec 21, 2023 | 2kImage Inpainting | CodeCode Available | 2 |
| Visual Point Cloud Forecasting enables Scalable Autonomous Driving | Dec 29, 2023 | 3D geometryAutonomous Driving | CodeCode Available | 2 |
| PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation | Sep 10, 2024 | | CodeCode Available | 2 |
| Malla: Demystifying Real-world Large Language Model Integrated Malicious Services | Jan 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry | Jan 7, 2024 | Data AugmentationDrug Discovery | CodeCode Available | 2 |
| Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism | Nov 25, 2022 | GPU | CodeCode Available | 2 |
| DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI | Jul 19, 2023 | Conversational RecommendationDiversity | CodeCode Available | 2 |
| Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection | Sep 13, 2024 | MambaOpen Vocabulary Object Detection | CodeCode Available | 2 |
| TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network | Feb 6, 2022 | Data AugmentationDimensionality Reduction | CodeCode Available | 2 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 |
| Continual Test-Time Domain Adaptation | Mar 25, 2022 | Domain AdaptationTest-time Adaptation | CodeCode Available | 2 |
| InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists | Sep 30, 2023 | Depth EstimationImage Generation | CodeCode Available | 2 |
| Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models | Sep 15, 2022 | image-classificationImage Classification | CodeCode Available | 2 |
| SimpleClick: Interactive Image Segmentation with Simple Vision Transformers | Oct 20, 2022 | Image SegmentationInteractive Segmentation | CodeCode Available | 2 |
| Golden Cudgel Network for Real-Time Semantic Segmentation | Mar 5, 2025 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning | Mar 8, 2023 | Semantic Segmentation | CodeCode Available | 2 |
| SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding | Jul 3, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| PALO: A Polyglot Large Multimodal Model for 5B People | Feb 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs | Oct 31, 2024 | | CodeCode Available | 2 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| RecDiffusion: Rectangling for Image Stitching with Diffusion Models | Mar 28, 2024 | Image Stitching | CodeCode Available | 2 |
| KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | Jan 2, 2025 | MTEB BenchmarkRetrieval-augmented Generation | CodeCode Available | 2 |
| GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | Jul 7, 2023 | AttributeCommon Sense Reasoning | CodeCode Available | 2 |
| Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | Jun 20, 2024 | HallucinationQuestion Answering | CodeCode Available | 2 |
| NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates | Jun 17, 2022 | Audio Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 |
| Deep PCB To COCO Convertor | May 1, 2022 | ClassificationData Augmentation | CodeCode Available | 2 |
| Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | Oct 12, 2024 | Experimental Designscientific discovery | CodeCode Available | 2 |
| InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions | Jan 24, 2024 | document understandingQuestion Answering | CodeCode Available | 2 |
| AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image Classification | Jun 18, 2024 | Diversityimage-classification | CodeCode Available | 2 |
| Blockwise Parallel Transformers for Large Context Models | Sep 21, 2023 | | CodeCode Available | 2 |
| Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Feb 4, 2025 | DenoisingDomain Generalization | CodeCode Available | 2 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 |