| AltGen: AI-Driven Alt Text Generation for Enhancing EPUB Accessibility | Dec 30, 2024 | DescriptiveText Generation | —Unverified | 0 |
| Is Your Text-to-Image Model Robust to Caption Noise? | Dec 27, 2024 | DescriptiveHallucination | —Unverified | 0 |
| Multi-Agent Norm Perception and Induction in Distributed Healthcare | Dec 24, 2024 | Descriptive | —Unverified | 0 |
| Underutilization of Syntactic Processing by Chinese Learners of English in Comprehending English Sentences, Evidenced from Adapted Garden-Path Ambiguity Experiment | Dec 21, 2024 | DescriptiveSentence | —Unverified | 0 |
| TalkWithMachines: Enhancing Human-Robot Interaction for Interpretable Industrial Robotics Through Large/Vision Language Models | Dec 19, 2024 | Descriptive | —Unverified | 0 |
| Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception | Dec 18, 2024 | DescriptiveHuman-Object Interaction Detection | CodeCode Available | 0 |
| Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition | Dec 18, 2024 | AttributeDescriptive | CodeCode Available | 0 |
| JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts | Dec 18, 2024 | Action DetectionDescriptive | CodeCode Available | 0 |
| SEKE: Specialised Experts for Keyword Extraction | Dec 18, 2024 | DescriptiveKeyword Extraction | CodeCode Available | 0 |
| Digital Transformation in Switzerland: The Current State and Expectations | Dec 17, 2024 | DescriptiveSelf-Learning | —Unverified | 0 |
| Organizational culture and the usage of Industry 4.0 technologies: evidence from Swiss businesses | Dec 17, 2024 | Descriptive | —Unverified | 0 |
| Is it the end of (generative) linguistics as we know it? | Dec 17, 2024 | DescriptivePOS | —Unverified | 0 |
| Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning | Dec 17, 2024 | Dense Video CaptioningDescriptive | CodeCode Available | 0 |
| CoinMath: Harnessing the Power of Coding Instruction for Math LLMs | Dec 16, 2024 | DescriptiveMath | CodeCode Available | 0 |
| Semi-automated analysis of audio-recorded lessons: The case of teachers' engaging messages | Dec 16, 2024 | Descriptive | —Unverified | 0 |
| Multilingual and Explainable Text Detoxification with Parallel Corpora | Dec 16, 2024 | DescriptiveStyle Transfer | CodeCode Available | 0 |
| Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives | Dec 14, 2024 | DescriptiveLanguage Modeling | —Unverified | 0 |
| Automated Image Captioning with CNNs and Transformers | Dec 13, 2024 | DescriptiveHyperparameter Optimization | CodeCode Available | 0 |
| Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise | Dec 12, 2024 | DescriptiveMusic Generation | —Unverified | 0 |
| MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation | Dec 12, 2024 | DescriptiveFood recommendation | CodeCode Available | 0 |
| Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic Scenarios | Dec 10, 2024 | Autonomous DrivingDescriptive | CodeCode Available | 0 |
| Cardiometabolic Risk Factors in South Asians: An Epidemiological and Anthropological Study in an Urban Populace of Eastern India | Dec 8, 2024 | Descriptive | —Unverified | 0 |
| Language-Guided Image Tokenization for Generation | Dec 8, 2024 | DescriptiveImage Generation | —Unverified | 0 |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Dec 5, 2024 | DescriptiveVisual Question Answering | CodeCode Available | 2 |
| ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description | Dec 5, 2024 | DescriptiveProtein Design | —Unverified | 0 |
| Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension | Dec 4, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 |
| Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance | Dec 3, 2024 | Descriptive | —Unverified | 0 |
| SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts | Dec 1, 2024 | DescriptiveKnowledge Graphs | —Unverified | 0 |
| EventGPT: Event Stream Understanding with Multimodal Large Language Models | Dec 1, 2024 | Descriptive | —Unverified | 0 |
| Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints | Nov 28, 2024 | Descriptive | —Unverified | 0 |
| TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching | Nov 26, 2024 | Action AssessmentDescriptive | —Unverified | 0 |
| What's in the Image? A Deep-Dive into the Vision of Vision Language Models | Nov 26, 2024 | AttributeDescriptive | —Unverified | 0 |
| SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis | Nov 25, 2024 | DescriptiveForm | —Unverified | 0 |
| Utilization and Profitability of Tractor Services for Maize Farming in Ejura-Sekyedumase Municipality, Ghana | Nov 24, 2024 | Descriptive | —Unverified | 0 |
| From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars | Nov 23, 2024 | DescriptiveIn-Context Learning | CodeCode Available | 0 |
| MolReFlect: Towards Fine-grained In-Context Alignment between Molecules and Texts | Nov 22, 2024 | DescriptiveMolecule Captioning | —Unverified | 0 |
| The Explabox: Model-Agnostic Machine Learning Transparency & Analysis | Nov 22, 2024 | DescriptiveFairness | —Unverified | 0 |
| Proportional infinite-width infinite-depth limit for deep linear neural networks | Nov 22, 2024 | Descriptive | —Unverified | 0 |
| Omni-IML: Towards Unified Image Manipulation Localization | Nov 22, 2024 | DecoderDescriptive | —Unverified | 0 |
| MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts | Nov 22, 2024 | Descriptive | —Unverified | 0 |
| Uterine Ultrasound Image Captioning Using Deep Learning Techniques | Nov 21, 2024 | Deep LearningDescriptive | —Unverified | 0 |
| A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation | Nov 19, 2024 | DescriptiveDiagnostic | CodeCode Available | 0 |
| MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT | Nov 18, 2024 | Contrastive LearningDescriptive | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Nov 15, 2024 | DescriptiveObject | —Unverified | 0 |
| Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions | Nov 13, 2024 | DescriptiveHallucination | CodeCode Available | 0 |
| BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions | Nov 12, 2024 | DescriptiveImage Captioning | —Unverified | 0 |
| Collaborative and Federated Black-box Optimization: A Bayesian Optimization Perspective | Nov 12, 2024 | Bayesian OptimizationDecision Making | —Unverified | 0 |
| An Empirical Implementation of the Shadow Riskless Rate | Nov 11, 2024 | Descriptive | —Unverified | 0 |