| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| A Generalist Agent | May 12, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | Mar 13, 2024 | Constituency Grammar InductionLanguage Modeling | CodeCode Available | 2 | 5 |
| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |
| Generating Benchmarks for Factuality Evaluation of Language Models | Jul 13, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning | Apr 14, 2024 | Dense Video CaptioningDescriptive | CodeCode Available | 2 | 5 |
| Generate rather than Retrieve: Large Language Models are Strong Context Generators | Sep 21, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Modeling for Mathematical Discovery | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generalized Interpolating Discrete Diffusion | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer | Jun 3, 2024 | Audio GenerationIn-Context Learning | CodeCode Available | 2 | 5 |
| GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model | Jun 3, 2024 | geo-localizationLanguage Modeling | CodeCode Available | 2 | 5 |
| TrustRAG: Enhancing Robustness and Trustworthiness in RAG | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Composed Image Retrieval for Remote Sensing | May 24, 2024 | Composed Image Retrieval (CoIR)Descriptive | CodeCode Available | 2 | 5 |
| G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference | Dec 23, 2023 | GPUHigh-Level Synthesis | CodeCode Available | 2 | 5 |
| GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities | Jun 17, 2024 | Audio Question AnsweringInstruction Following | CodeCode Available | 2 | 5 |
| ARAGOG: Advanced RAG Output Grading | Apr 1, 2024 | Document EmbeddingLanguage Modeling | CodeCode Available | 2 | 5 |
| From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Frontiers in Intelligent Colonoscopy | Oct 22, 2024 | Image Captioning | CodeCode Available | 2 | 5 |
| FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" | Sep 30, 2024 | counterfactualHallucination | CodeCode Available | 2 | 5 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 | 5 |