| GOFA: A Generative One-For-All Model for Joint Graph Language Modeling | Jul 12, 2024 | AllLanguage Modeling | CodeCode Available | 2 | 5 |
| Contrastive Decoding: Open-ended Text Generation as Optimization | Oct 27, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Teola: Towards End-to-End Optimization of LLM-based Applications | Jun 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Contrastive Search Is What You Need For Neural Text Generation | Oct 25, 2022 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 | 5 |
| GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction | Oct 5, 2023 | Event Argument ExtractionEvent Extraction | CodeCode Available | 2 | 5 |
| CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers | Apr 28, 2022 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GPT-Driver: Learning to Drive with GPT | Oct 2, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 | 5 |
| Continuous Diffusion Model for Language Modeling | Feb 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DiffArtist: Towards Structure and Appearance Controllable Image Stylization | Jul 22, 2024 | DisentanglementImage Stylization | CodeCode Available | 2 | 5 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 | 5 |
| GIT: A Generative Image-to-text Transformer for Vision and Language | May 27, 2022 | DecoderImage Captioning | CodeCode Available | 2 | 5 |
| The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Apr 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model | Jun 13, 2024 | DiagnosticImage Retrieval | CodeCode Available | 2 | 5 |
| Contextual Semantic Embeddings for Ontology Subsumption Prediction | Feb 20, 2022 | Knowledge Graph EmbeddingsLanguage Modeling | CodeCode Available | 2 | 5 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 | 5 |
| GPT or BERT: why not both? | Oct 31, 2024 | Causal Language ModelingLanguage Modeling | CodeCode Available | 2 | 5 |
| How to Index Item IDs for Recommendation Foundation Models | May 11, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| TIPO: Text to Image with Text Presampling for Prompt Optimization | Nov 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| Explore the Limits of Omni-modal Pretraining at Scale | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | Jan 28, 2021 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning | Jan 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GeoChat: Grounded Large Vision-Language Model for Remote Sensing | Nov 24, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| A Generalist Agent | May 12, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | Mar 13, 2024 | Constituency Grammar InductionLanguage Modeling | CodeCode Available | 2 | 5 |
| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |
| Generating Benchmarks for Factuality Evaluation of Language Models | Jul 13, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning | Apr 14, 2024 | Dense Video CaptioningDescriptive | CodeCode Available | 2 | 5 |
| Generate rather than Retrieve: Large Language Models are Strong Context Generators | Sep 21, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Modeling for Mathematical Discovery | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generalized Interpolating Discrete Diffusion | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer | Jun 3, 2024 | Audio GenerationIn-Context Learning | CodeCode Available | 2 | 5 |
| GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model | Jun 3, 2024 | geo-localizationLanguage Modeling | CodeCode Available | 2 | 5 |
| TrustRAG: Enhancing Robustness and Trustworthiness in RAG | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Composed Image Retrieval for Remote Sensing | May 24, 2024 | Composed Image Retrieval (CoIR)Descriptive | CodeCode Available | 2 | 5 |
| G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference | Dec 23, 2023 | GPUHigh-Level Synthesis | CodeCode Available | 2 | 5 |
| GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities | Jun 17, 2024 | Audio Question AnsweringInstruction Following | CodeCode Available | 2 | 5 |
| ARAGOG: Advanced RAG Output Grading | Apr 1, 2024 | Document EmbeddingLanguage Modeling | CodeCode Available | 2 | 5 |
| From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Frontiers in Intelligent Colonoscopy | Oct 22, 2024 | Image Captioning | CodeCode Available | 2 | 5 |
| FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" | Sep 30, 2024 | counterfactualHallucination | CodeCode Available | 2 | 5 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 | 5 |