| Behind Maya: Building a Multilingual Vision Language Model | May 13, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement | May 13, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance | May 11, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents | May 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| RWKV-X: A Linear Complexity Hybrid Language Model | Apr 30, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis | Apr 26, 2025 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Apr 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model | Apr 13, 2025 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model | Apr 13, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Apr 13, 2025 | Domain AdaptationLanguage Modeling | CodeCode Available | 2 |
| PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models | Apr 11, 2025 | ClusteringLanguage Modeling | CodeCode Available | 2 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Apr 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Unicorn: Text-Only Data Synthesis for Vision Language Model Training | Mar 28, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model | Mar 27, 2025 | EgoSchemaLanguage Modeling | CodeCode Available | 2 |
| Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector | Mar 26, 2025 | Binary ClassificationDeepFake Detection | CodeCode Available | 2 |
| Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis | Mar 25, 2025 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Mar 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Modifying Large Language Model Post-Training for Diverse Creative Writing | Mar 21, 2025 | DiversityLanguage Modeling | CodeCode Available | 2 |
| Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning | Mar 19, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| Generative Modeling for Mathematical Discovery | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model | Mar 13, 2025 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding | Mar 13, 2025 | DiversityLanguage Modeling | CodeCode Available | 2 |
| LongProLIP: A Probabilistic Vision-Language Model with Long Context Text | Mar 11, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mellow: a small audio language model for reasoning | Mar 11, 2025 | Audio captioningLanguage Modeling | CodeCode Available | 2 |
| When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning | Mar 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DiffCLIP: Differential Attention Meets CLIP | Mar 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Mar 7, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| PromptPex: Automatic Test Generation for Language Model Prompts | Mar 7, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Generalized Interpolating Discrete Diffusion | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Mar 6, 2025 | General KnowledgeImage Captioning | CodeCode Available | 2 |
| AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM | Mar 6, 2025 | Anomaly DetectionLanguage Modeling | CodeCode Available | 2 |
| An Egocentric Vision-Language Model based Portable Real-time Smart Assistant | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Scaling Rich Style-Prompted Text-to-Speech Datasets | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization | Mar 5, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments | Mar 4, 2025 | 2D Panoptic SegmentationGraph Generation | CodeCode Available | 2 |
| OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Forgetting Transformer: Softmax Attention with a Forget Gate | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement | Mar 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms | Feb 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support | Feb 25, 2025 | Decision MakingDiagnostic | CodeCode Available | 2 |