| Mixed Diffusion for 3D Indoor Scene Synthesis | May 31, 2024 | DenoisingIndoor Scene Synthesis | CodeCode Available | 2 | 5 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| CoSeR: Bridging Image and Language for Cognitive Super-Resolution | Nov 27, 2023 | Super-Resolution | CodeCode Available | 2 | 5 |
| FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation | Mar 8, 2024 | Federated LearningImage Segmentation | CodeCode Available | 2 | 5 |
| Improved Canonicalization for Model Agnostic Equivariance | May 23, 2024 | Contrastive Learningmodel | CodeCode Available | 2 | 5 |
| PENCIL: Long Thoughts with Short Memory | Mar 18, 2025 | | CodeCode Available | 2 | 5 |
| GenN2N: Generative NeRF2NeRF Translation | Apr 3, 2024 | ColorizationContrastive Learning | CodeCode Available | 2 | 5 |
| Translating Images to Road Network: A Sequence-to-Sequence Perspective | Feb 13, 2024 | | CodeCode Available | 2 | 5 |
| BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream | Jul 2, 2024 | NeRF | CodeCode Available | 2 | 5 |
| Unsupervised Semantic Segmentation by Distilling Feature Correspondences | Mar 16, 2022 | FormSemantic Segmentation | CodeCode Available | 2 | 5 |
| I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models | Dec 27, 2023 | Video Generation | CodeCode Available | 2 | 5 |
| Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery | Jan 11, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 | 5 |
| LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels | Mar 22, 2024 | 3D Semantic SegmentationLIDAR Semantic Segmentation | CodeCode Available | 2 | 5 |
| PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views | Oct 24, 2024 | | CodeCode Available | 2 | 5 |
| How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | Feb 8, 2024 | | CodeCode Available | 2 | 5 |
| Blockwise Parallel Transformers for Large Context Models | Sep 21, 2023 | | CodeCode Available | 2 | 5 |
| Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | Jun 20, 2024 | HallucinationQuestion Answering | CodeCode Available | 2 | 5 |
| Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models | Sep 15, 2022 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism | Nov 25, 2022 | GPU | CodeCode Available | 2 | 5 |
| Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming | Dec 11, 2024 | Text to 3DText-to-Image Generation | CodeCode Available | 2 | 5 |
| ETSformer: Exponential Smoothing Transformers for Time-series Forecasting | Feb 3, 2022 | Time SeriesTime Series Analysis | CodeCode Available | 2 | 5 |
| YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion | Mar 10, 2025 | | CodeCode Available | 2 | 5 |
| CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | May 29, 2022 | Chinese Sentiment AnalysisConversational Response Generation | CodeCode Available | 2 | 5 |
| V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer | Mar 20, 2022 | 3D Object DetectionAutonomous Vehicles | CodeCode Available | 2 | 5 |
| Preference Leakage: A Contamination Problem in LLM-as-a-judge | Feb 3, 2025 | | CodeCode Available | 2 | 5 |
| Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression | Jun 11, 2025 | Image Generation | CodeCode Available | 2 | 5 |
| Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution | Feb 27, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 | 5 |
| Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge | Apr 2, 2024 | Robotic Grasping | CodeCode Available | 2 | 5 |
| Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers | Mar 22, 2024 | Information Retrieval | CodeCode Available | 2 | 5 |
| ParCo: Part-Coordinating Text-to-Motion Synthesis | Mar 27, 2024 | Motion Synthesis | CodeCode Available | 2 | 5 |
| A Comprehensive Survey on Self-Supervised Learning for Recommendation | Apr 4, 2024 | Contrastive LearningRecommendation Systems | CodeCode Available | 2 | 5 |
| Self-Supervised Visual Preference Alignment | Apr 16, 2024 | 8kMM-Vet | CodeCode Available | 2 | 5 |
| Kandinsky 3.0 Technical Report | Dec 6, 2023 | Image GenerationImage to Video Generation | CodeCode Available | 2 | 5 |
| SparseLLM: Towards Global Pruning for Pre-trained Language Models | Feb 28, 2024 | Computational EfficiencyProblem Decomposition | CodeCode Available | 2 | 5 |
| AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery | Jan 10, 2025 | | CodeCode Available | 2 | 5 |
| Diffusion Time-step Curriculum for One Image to 3D Generation | Apr 6, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 | 5 |
| Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs | Apr 17, 2025 | Position | CodeCode Available | 2 | 5 |
| FORA: Fast-Forward Caching in Diffusion Transformer Acceleration | Jul 1, 2024 | Denoising | CodeCode Available | 2 | 5 |
| Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction | Nov 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| Combinatorial Optimization with Automated Graph Neural Networks | Jun 5, 2024 | Combinatorial OptimizationGraph Embedding | CodeCode Available | 2 | 5 |
| PIGEON: Predicting Image Geolocations | Jul 11, 2023 | Photo geolocation estimation | CodeCode Available | 2 | 5 |
| JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs | Feb 8, 2024 | Ethics | CodeCode Available | 2 | 5 |
| Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning | Jul 9, 2024 | Image GenerationSentence | CodeCode Available | 2 | 5 |
| cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree | Jun 18, 2025 | ChunkingCode Generation | CodeCode Available | 2 | 5 |
| MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages | Apr 18, 2022 | intent-classificationIntent Classification | CodeCode Available | 2 | 5 |
| Exponentially Faster Language Modelling | Nov 15, 2023 | BenchmarkingCPU | CodeCode Available | 2 | 5 |
| Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Feb 18, 2025 | DecoderGPU | CodeCode Available | 2 | 5 |
| Q-Diffusion: Quantizing Diffusion Models | Feb 8, 2023 | Image GenerationNoise Estimation | CodeCode Available | 2 | 5 |
| Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Oct 2, 2024 | Mixture-of-ExpertsNavigate | CodeCode Available | 2 | 5 |
| LinkBERT: Pretraining Language Models with Document Links | Mar 29, 2022 | Document ClassificationLanguage Modeling | CodeCode Available | 2 | 5 |