| Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Time Series Language Model for Descriptive Caption Generation | Jan 3, 2025 | Caption GenerationDenoising | —Unverified | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 |
| Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding | Jan 3, 2025 | HallucinationLanguage Modeling | CodeCode Available | 1 |
| Metadata Conditioning Accelerates Language Model Pre-training | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges | Jan 3, 2025 | Abstractive Text SummarizationLanguage Modeling | —Unverified | 0 |
| Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Reading Between the Lines: A dataset and a study on why some texts are tougher than others | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model | Jan 2, 2025 | DescriptiveLanguage Modeling | —Unverified | 0 |
| Risks of Cultural Erasure in Large Language Models | Jan 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion | Jan 2, 2025 | DiversityHallucination | —Unverified | 0 |
| Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability | Jan 2, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| Does a Large Language Model Really Speak in Human-Like Language? | Jan 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MSWA: Refining Local Attention with Multi-ScaleWindow Attention | Jan 2, 2025 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| NeutraSum: A Language Model can help a Balanced Media Diet by Neutralizing News Summaries | Jan 2, 2025 | ArticlesLanguage Modeling | —Unverified | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model | Jan 1, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Taxonomy-Aware Evaluation of Vision-Language Models | Jan 1, 2025 | Fine-Grained Image ClassificationLanguage Modeling | —Unverified | 0 |
| HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Video Language Model Pretraining with Spatio-temporal Masking | Jan 1, 2025 | DecoderLanguage Modeling | —Unverified | 0 |
| Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale | Jan 1, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving | Jan 1, 2025 | Autonomous DrivingCARLA longest6 | —Unverified | 0 |
| HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction | Jan 1, 2025 | DescriptiveInstruction Following | —Unverified | 0 |
| Symbolic Representation for Any-to-Any Generative Tasks | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output | Jan 1, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 0 |
| Flexible Frame Selection for Efficient Video Reasoning | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification | Jan 1, 2025 | ClassificationLanguage Modeling | CodeCode Available | 0 |
| S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation | Jan 1, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro | Jan 1, 2025 | Data AugmentationLanguage Modeling | CodeCode Available | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| Reasoning-Oriented and Analogy-Based Methods for Locating and Editing in Zero-Shot Event-Relational Reasoning | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding | Jan 1, 2025 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation | Jan 1, 2025 | Dialogue GenerationLanguage Modeling | CodeCode Available | 0 |
| Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines | Jan 1, 2025 | Information RetrievalLanguage Modeling | —Unverified | 0 |
| Navigating Nuance: In Quest for Political Truth | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| TrustRAG: Enhancing Robustness and Trustworthiness in RAG | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Large Language Model Based Multi-Agent System Augmented Complex Event Processing Pipeline for Internet of Multimedia Things | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Simulating 500 million years of evolution with a language model | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Towards Sustainable Large Language Model Serving | Dec 31, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Generative Emergent Communication: Large Language Model is a Collective World Model | Dec 31, 2024 | Bayesian InferenceLanguage Modeling | —Unverified | 0 |
| Efficient Standardization of Clinical Notes using Large Language Models | Dec 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation | Dec 31, 2024 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Dual Diffusion for Unified Image Generation and Understanding | Dec 31, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Chunk-Distilled Language Modeling | Dec 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment | Dec 31, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 |
| ICONS: Influence Consensus for Vision-Language Data Selection | Dec 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |