SOTAVerified

Descriptive

Papers

Showing 76100 of 1477 papers

TitleStatusHype
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable InsightsCode1
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic SegmentationCode1
User-Friendly Customized Generation with Multi-Modal PromptsCode1
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large ModelsCode1
Aligning LLM Agents by Learning Latent Preference from User EditsCode1
Mixture of Low-rank Experts for Transferable AI-Generated Image DetectionCode1
A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)Code1
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class DiscoveryCode1
FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font ApplicationsCode1
TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human AnnotationCode1
Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsCode1
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only TrainingCode1
VideoStudio: Generating Consistent-Content and Multi-Scene VideosCode1
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh DeformationCode1
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive PropertiesCode1
Ins-HOI: Instance Aware Human-Object Interactions RecoveryCode1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression SegmentationCode1
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup AnnotationsCode1
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton LiveCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Zero-shot audio captioning with audio-language model guidance and audio context keywordsCode1
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language ModelsCode1
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language ModelsCode1
Show:102550
← PrevPage 4 of 60Next →

No leaderboard results yet.