| Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Visually Descriptive Language Model for Vector Graphics Reasoning | Apr 9, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 9 |
| LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Mar 26, 2024 | GPUGSM8K | CodeCode Available | 9 |
| VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild | Mar 25, 2024 | DecoderLanguage Modeling | CodeCode Available | 9 |
| Arcee's MergeKit: A Toolkit for Merging Large Language Models | Mar 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 |
| Natural language guidance of high-fidelity text-to-speech with synthetic annotations | Feb 2, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| OLMo: Accelerating the Science of Language Models | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 |
| Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition | Jul 17, 2023 | DecoderLanguage Modeling | CodeCode Available | 8 |
| From Bytes to Ideas: Language Modeling with Autoregressive U-Nets | Jun 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model | Jun 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Chinese-Vicuna: A Chinese Instruction-following Llama-based Model | Apr 17, 2025 | Code GenerationCPU | CodeCode Available | 7 |
| Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought | Apr 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Large Language Model Agent: A Survey on Methodology, Applications and Challenges | Mar 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Simulating 500 million years of evolution with a language model | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Large Concept Models: Language Modeling in a Sentence Representation Space | Dec 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Dec 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| Tulu 3: Pushing Frontiers in Open Language Model Post-Training | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| MagicQuill: An Intelligent Interactive Image Editing System | Nov 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| AutoTrain: No-code training for state-of-the-art models | Oct 21, 2024 | Classificationimage-classification | CodeCode Available | 7 |