| Large Language Model with Region-guided Referring and Grounding for CT Report Generation | Nov 23, 2024 | Computed Tomography (CT)Diagnostic | CodeCode Available | 2 |
| Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks | Nov 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Nov 22, 2024 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Nov 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BianCang: A Traditional Chinese Medicine Large Language Model | Nov 17, 2024 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning | Nov 15, 2024 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Nov 14, 2024 | Earth ObservationInstruction Following | CodeCode Available | 2 |
| Tucano: Advancing Neural Text Generation for Portuguese | Nov 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TIPO: Text to Image with Text Presampling for Prompt Optimization | Nov 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| The Super Weight in Large Language Models | Nov 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Concept Bottleneck Language Models For protein design | Nov 9, 2024 | Decision MakingDrug Discovery | CodeCode Available | 2 |
| End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training | Nov 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization | Nov 5, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Nov 4, 2024 | Answer GenerationGPU | CodeCode Available | 2 |
| GPT or BERT: why not both? | Oct 31, 2024 | Causal Language ModelingLanguage Modeling | CodeCode Available | 2 |
| Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs | Oct 31, 2024 | Knowledge GraphsLanguage Modeling | CodeCode Available | 2 |
| What is Wrong with Perplexity for Long-context Language Modeling? | Oct 31, 2024 | Document SummarizationIn-Context Learning | CodeCode Available | 2 |
| Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model | Oct 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MiniPLM: Knowledge Distillation for Pre-Training Language Models | Oct 22, 2024 | DiversityKnowledge Distillation | CodeCode Available | 2 |
| Frontiers in Intelligent Colonoscopy | Oct 22, 2024 | Image Captioning | CodeCode Available | 2 |
| PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Improve Vision Language Model Chain-of-thought Reasoning | Oct 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style | Oct 21, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | Oct 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning | Oct 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Oct 15, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation | Oct 15, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| Process Reward Model with Q-Value Rankings | Oct 15, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs | Oct 10, 2024 | Active LearningLanguage Modeling | CodeCode Available | 2 |
| Q-VLM: Post-training Quantization for Large Vision-Language Models | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation | Oct 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Oct 7, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SyllableLM: Learning Coarse Semantic Units for Speech Language Models | Oct 5, 2024 | ClusteringLanguage Modeling | CodeCode Available | 2 |