| Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens | Mar 3, 2025 | Attributetext-to-speech | CodeCode Available | 11 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 |
| aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing | Oct 17, 2024 | AttributeCode Completion | CodeCode Available | 7 |
| Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following | Nov 28, 2023 | AttributeDenoising | CodeCode Available | 5 |
| Learning Flow Fields in Attention for Controllable Person Image Generation | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 5 |
| OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations | Dec 10, 2024 | AttributeBenchmarking | CodeCode Available | 5 |
| IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation | Oct 9, 2024 | AttributeImage Generation | CodeCode Available | 5 |
| Zero-shot forecasting of chaotic systems | Sep 24, 2024 | AttributeIn-Context Learning | CodeCode Available | 4 |
| HelpSteer2: Open-source dataset for training top-performing reward models | Jun 12, 2024 | Attribute | CodeCode Available | 4 |
| ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model | Apr 4, 2024 | 2D Semantic SegmentationAttribute | CodeCode Available | 4 |
| SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation | May 30, 2024 | AttributeAutonomous Driving | CodeCode Available | 4 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 |
| MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations | Jun 13, 2024 | 3D visual groundingAttribute | CodeCode Available | 4 |
| Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation | Feb 28, 2024 | AttributeExtractive Question-Answering | CodeCode Available | 4 |
| Deepfake Generation and Detection: A Benchmark and Survey | Mar 26, 2024 | AttributeFace Generation | CodeCode Available | 4 |
| MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis | Jul 2, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| Sparse Tensor-based Point Cloud Attribute Compression | Apr 3, 2022 | Attribute | CodeCode Available | 4 |
| DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection | Jan 1, 2020 | AttributeDeepFake Detection | CodeCode Available | 4 |
| A Closer Look at Deep Learning Methods on Tabular Datasets | Jul 1, 2024 | AttributeDeep Learning | CodeCode Available | 4 |
| Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement | Nov 10, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis | Feb 8, 2024 | AttributeConditional Text-to-Image Synthesis | CodeCode Available | 4 |
| KTO: Model Alignment as Prospect Theoretic Optimization | Feb 2, 2024 | Attributemodel | CodeCode Available | 4 |
| XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation | Jun 26, 2025 | AttributeImage Generation | CodeCode Available | 4 |
| GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Apr 24, 2024 | 3DGSAttribute | CodeCode Available | 3 |
| FaceXFormer: A Unified Transformer for Facial Analysis | Mar 19, 2024 | Age and Gender EstimationAge Estimation | CodeCode Available | 3 |
| FateZero: Fusing Attentions for Zero-shot Text-based Video Editing | Mar 16, 2023 | AttributeText-to-Video Editing | CodeCode Available | 3 |
| Taming 3DGS: High-Quality Radiance Fields with Limited Resources | Jun 21, 2024 | 3DGSAttribute | CodeCode Available | 3 |
| Face Anonymization Made Simple | Nov 1, 2024 | AttributeFace Anonymization | CodeCode Available | 3 |
| X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages | May 7, 2023 | AttributeInstruction Following | CodeCode Available | 3 |
| An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases | Jul 15, 2024 | Attributecounterfactual | CodeCode Available | 3 |
| Scaling Diffusion Transformers to 16 Billion Parameters | Jul 16, 2024 | AttributeConditional Image Generation | CodeCode Available | 3 |
| RemoteSAM: Towards Segment Anything for Earth Observation | May 23, 2025 | AttributeEarth Observation | CodeCode Available | 3 |
| Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation | Dec 24, 2024 | AttributeFraud Detection | CodeCode Available | 3 |
| Delay-penalized CTC implemented based on Finite State Transducer | May 19, 2023 | Attribute | CodeCode Available | 3 |
| 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | Oct 16, 2024 | AttributeImage Generation | CodeCode Available | 3 |
| Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Dec 24, 2024 | Attribute | CodeCode Available | 3 |
| LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer | Dec 18, 2024 | AttributeText Generation | CodeCode Available | 3 |
| DisCo: Disentangled Control for Realistic Human Dance Generation | Jun 30, 2023 | Attribute | CodeCode Available | 3 |
| Controllable Text Generation for Large Language Models: A Survey | Aug 22, 2024 | AttributePrompt Engineering | CodeCode Available | 3 |
| A Comprehensive Survey on Composed Image Retrieval | Feb 19, 2025 | AttributeImage Retrieval | CodeCode Available | 3 |
| ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning | Dec 4, 2024 | AttributeTime Series | CodeCode Available | 3 |
| Bird-Eye Transformers for Text Generation Models | Oct 8, 2022 | AttributeInductive Bias | CodeCode Available | 3 |
| Large Spatial Model: End-to-end Unposed Images to Semantic 3D | Oct 24, 2024 | 3D ReconstructionAttribute | CodeCode Available | 3 |
| ATPrompt: Textual Prompt Learning with Embedded Attributes | Dec 12, 2024 | AttributeLarge Language Model | CodeCode Available | 3 |
| Harnessing the Universal Geometry of Embeddings | May 18, 2025 | Attribute | CodeCode Available | 3 |
| HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression | Mar 21, 2024 | 3DGSAttribute | CodeCode Available | 3 |
| HAC++: Towards 100X Compression of 3D Gaussian Splatting | Jan 21, 2025 | 3DGSAttribute | CodeCode Available | 3 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 |
| Faceptor: A Generalist Model for Face Perception | Mar 14, 2024 | Age EstimationAttribute | CodeCode Available | 2 |
| GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Oct 17, 2023 | AttributeObject | CodeCode Available | 2 |