| Measuring Taiwanese Mandarin Language Understanding | Mar 29, 2024 | | CodeCode Available | 5 |
| Exploring GLU Expansion Ratios: A Study of Structured Pruning in LLaMA-3.2 Models | Dec 26, 2024 | Computational EfficiencyNetwork Pruning | CodeCode Available | 5 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Feb 22, 2024 | Code GenerationHumanEval | CodeCode Available | 5 |
| LAB: Large-Scale Alignment for ChatBots | Mar 2, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 5 |
| ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning | Mar 25, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 5 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 |
| OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Oct 12, 2024 | Mathreinforcement-learning | CodeCode Available | 5 |
| MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Oct 4, 2024 | 4D reconstructionCamera Pose Estimation | CodeCode Available | 5 |
| MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos | Dec 5, 2024 | Depth Estimation | CodeCode Available | 5 |
| Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions | Jan 20, 2023 | text-to-speechText to Speech | CodeCode Available | 5 |
| IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models | Aug 13, 2023 | Diffusion Personalization Tuning FreeImage Generation | CodeCode Available | 5 |
| Uni-Mol2: Exploring Molecular Pretraining Model at Scale | Jun 21, 2024 | model | CodeCode Available | 5 |
| TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second | Jul 5, 2022 | AutoMLBayesian Inference | CodeCode Available | 5 |
| Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling | Mar 7, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 5 |
| Zero-shot Image Editing with Reference Imitation | Jun 11, 2024 | Semantic correspondence | CodeCode Available | 5 |
| LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models | Apr 10, 2024 | Decision Making | CodeCode Available | 5 |
| StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models | Jun 13, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 5 |
| Focus Anywhere for Fine-grained Multi-page Document Understanding | May 23, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 5 |
| Improving Text-To-Audio Models with Synthetic Captions | Jun 18, 2024 | AudioCapsAudio captioning | CodeCode Available | 5 |
| DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Sep 3, 2024 | Depth EstimationDiversity | CodeCode Available | 5 |
| DreamFusion: Text-to-3D using 2D Diffusion | Sep 29, 2022 | DenoisingImage Generation | CodeCode Available | 5 |
| OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Jun 2, 2025 | Data AugmentationHuman Animation | CodeCode Available | 5 |
| 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Jun 13, 2024 | Instance Segmentationmultimodal generation | CodeCode Available | 5 |
| BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval | Jul 16, 2024 | Question AnsweringRetrieval | CodeCode Available | 5 |
| EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts | Jun 13, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 5 |
| StarCoder: may the source be with you! | May 9, 2023 | 8kCode Generation | CodeCode Available | 5 |
| Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | Aug 6, 2024 | | CodeCode Available | 5 |
| Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Nov 7, 2024 | Image Generation | CodeCode Available | 5 |
| Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Aug 22, 2024 | ChatbotInstruction Following | CodeCode Available | 5 |
| XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Nov 22, 2024 | GPU | CodeCode Available | 5 |
| SpinQuant: LLM quantization with learned rotations | May 26, 2024 | Quantization | CodeCode Available | 5 |
| Image Vectorization: a Review | Jun 10, 2023 | Image GenerationVector Graphics | CodeCode Available | 5 |
| Zephyr: Direct Distillation of LM Alignment | Oct 25, 2023 | 2D Cyclist DetectionFew-Shot Learning | CodeCode Available | 5 |
| ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models | Oct 6, 2023 | | CodeCode Available | 5 |
| DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | Aug 25, 2022 | Diffusion PersonalizationImage Generation | CodeCode Available | 5 |
| ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing | Jun 26, 2025 | Audio GenerationLarge Language Model | CodeCode Available | 5 |
| 3D Reconstruction with Spatial Memory | Aug 28, 2024 | 3D Reconstruction | CodeCode Available | 5 |
| RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism | Jun 30, 2025 | Question AnsweringRAG | CodeCode Available | 5 |
| Transformers without Normalization | Mar 13, 2025 | Self-Supervised Learning | CodeCode Available | 5 |
| VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark | Jul 16, 2024 | DiversitySpeaker Identification | CodeCode Available | 5 |
| Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens | Jan 13, 2025 | | CodeCode Available | 5 |
| Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following | Nov 28, 2023 | AttributeDenoising | CodeCode Available | 5 |
| Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Apr 19, 2024 | Event ExtractionIn-Context Learning | CodeCode Available | 5 |
| Benchmarking the Myopic Trap: Positional Bias in Information Retrieval | May 20, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 5 |
| Randomized Autoregressive Visual Generation | Nov 1, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 |
| DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Apr 30, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 5 |
| FlowTok: Flowing Seamlessly Across Text and Image Tokens | Mar 13, 2025 | DenoisingImage to text | CodeCode Available | 5 |
| Loki: An Open-Source Tool for Fact Verification | Oct 2, 2024 | Claim VerificationFact Checking | CodeCode Available | 5 |
| NeuralSVG: An Implicit Representation for Text-to-Vector Generation | Jan 7, 2025 | Vector Graphics | CodeCode Available | 5 |
| The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | Nov 15, 2024 | | CodeCode Available | 5 |