SOTAVerified

Descriptive

Papers

Showing 151200 of 1477 papers

TitleStatusHype
AltGen: AI-Driven Alt Text Generation for Enhancing EPUB Accessibility0
Is Your Text-to-Image Model Robust to Caption Noise?0
Multi-Agent Norm Perception and Induction in Distributed Healthcare0
Underutilization of Syntactic Processing by Chinese Learners of English in Comprehending English Sentences, Evidenced from Adapted Garden-Path Ambiguity Experiment0
TalkWithMachines: Enhancing Human-Robot Interaction for Interpretable Industrial Robotics Through Large/Vision Language Models0
Descriptive Caption Enhancement with Visual Specialists for Multimodal PerceptionCode0
Real Classification by Description: Extending CLIP's Limits of Part Attributes RecognitionCode0
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsCode0
SEKE: Specialised Experts for Keyword ExtractionCode0
Digital Transformation in Switzerland: The Current State and Expectations0
Organizational culture and the usage of Industry 4.0 technologies: evidence from Swiss businesses0
Is it the end of (generative) linguistics as we know it?0
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video CaptioningCode0
CoinMath: Harnessing the Power of Coding Instruction for Math LLMsCode0
Semi-automated analysis of audio-recorded lessons: The case of teachers' engaging messages0
Multilingual and Explainable Text Detoxification with Parallel CorporaCode0
Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives0
Automated Image Captioning with CNNs and TransformersCode0
Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise0
MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced InterpretationCode0
Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic ScenariosCode0
Cardiometabolic Risk Factors in South Asians: An Epidemiological and Anthropological Study in an Urban Populace of Eastern India0
Language-Guided Image Tokenization for Generation0
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual CompressionCode2
ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description0
Scaling Inference-Time Search with Vision Value Model for Improved Visual ComprehensionCode1
Remote Sensing Temporal Vision-Language Models: A Comprehensive SurveyCode3
Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance0
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts0
EventGPT: Event Stream Understanding with Multimodal Large Language Models0
Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints0
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching0
What's in the Image? A Deep-Dive into the Vision of Vision Language Models0
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis0
Utilization and Profitability of Tractor Services for Maize Farming in Ejura-Sekyedumase Municipality, Ghana0
From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive GrammarsCode0
MolReFlect: Towards Fine-grained In-Context Alignment between Molecules and Texts0
The Explabox: Model-Agnostic Machine Learning Transparency & Analysis0
Proportional infinite-width infinite-depth limit for deep linear neural networks0
Omni-IML: Towards Unified Image Manipulation Localization0
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts0
Uterine Ultrasound Image Captioning Using Deep Learning Techniques0
A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT SegmentationCode0
MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level0
Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning0
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted CaptionsCode0
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions0
Collaborative and Federated Black-box Optimization: A Bayesian Optimization Perspective0
An Empirical Implementation of the Shadow Riskless Rate0
Show:102550
← PrevPage 4 of 30Next →

No leaderboard results yet.