| Frontiers in Intelligent Colonoscopy | Oct 22, 2024 | Image Captioning | CodeCode Available | 2 |
| PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Improve Vision Language Model Chain-of-thought Reasoning | Oct 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style | Oct 21, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | Oct 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning | Oct 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Oct 15, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation | Oct 15, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| Process Reward Model with Q-Value Rankings | Oct 15, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs | Oct 10, 2024 | Active LearningLanguage Modeling | CodeCode Available | 2 |
| Q-VLM: Post-training Quantization for Large Vision-Language Models | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling | Oct 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation | Oct 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Oct 7, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SyllableLM: Learning Coarse Semantic Units for Speech Language Models | Oct 5, 2024 | ClusteringLanguage Modeling | CodeCode Available | 2 |