SOTAVerified

Descriptive

Papers

Showing 51100 of 1477 papers

TitleStatusHype
DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For DrivingCode1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and AccountabilityCode1
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language ModelsCode1
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language UnderstandingCode1
GOAL: Global-local Object Alignment LearningCode1
Controlling Latent Diffusion Using Latent CLIPCode1
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMsCode1
Enhancing Monocular 3D Scene Completion with Diffusion ModelCode1
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMsCode1
Scaling Inference-Time Search with Vision Value Model for Improved Visual ComprehensionCode1
GraphXAIN: Narratives to Explain Graph Neural NetworksCode1
SpeakGer: A meta-data enriched speech corpus of German state and federal parliamentsCode1
Scene Graph Generation with Role-Playing Large Language ModelsCode1
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music RetrievalCode1
ReCLAP: Improving Zero Shot Audio Classification by Describing SoundsCode1
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language ModelsCode1
Revisiting Image Captioning Training Paradigm via Direct CLIP-based OptimizationCode1
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit TestsCode1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis AssistantCode1
Variationist: Exploring Multifaceted Variation and Bias in Written Language DataCode1
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short StoriesCode1
Navigating Knowledge Management Implementation Success in Government Organizations: A type-2 fuzzy approachCode1
Neural Concept BinderCode1
LaMOT: Language-Guided Multi-Object TrackingCode1
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable InsightsCode1
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic SegmentationCode1
User-Friendly Customized Generation with Multi-Modal PromptsCode1
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large ModelsCode1
Aligning LLM Agents by Learning Latent Preference from User EditsCode1
Mixture of Low-rank Experts for Transferable AI-Generated Image DetectionCode1
A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)Code1
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class DiscoveryCode1
FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font ApplicationsCode1
TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human AnnotationCode1
Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsCode1
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only TrainingCode1
VideoStudio: Generating Consistent-Content and Multi-Scene VideosCode1
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh DeformationCode1
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive PropertiesCode1
Ins-HOI: Instance Aware Human-Object Interactions RecoveryCode1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression SegmentationCode1
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup AnnotationsCode1
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton LiveCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Zero-shot audio captioning with audio-language model guidance and audio context keywordsCode1
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language ModelsCode1
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language ModelsCode1
Show:102550
← PrevPage 2 of 30Next →

No leaderboard results yet.