Descriptive

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1477 papers

Title	Date	Tasks	Status	Hype
Visually Descriptive Language Model for Vector Graphics Reasoning	Apr 9, 2024	DescriptiveLanguage Modeling	CodeCode Available	9
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy	Mar 21, 2024	Contrastive LearningDescriptive	CodeCode Available	7
AudioGen: Textually Guided Audio Generation	Sep 30, 2022	Audio GenerationDescriptive	CodeCode Available	6
Fundamental Components of Deep Learning: A category-theoretic approach	Mar 13, 2024	Deep LearningDescriptive	CodeCode Available	5
Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation	Jun 2, 2025	4kDescriptive	CodeCode Available	3
Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey	Dec 3, 2024	Change DetectionDescriptive	CodeCode Available	3
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation	Sep 20, 2024	DescriptiveQuestion Answering	CodeCode Available	3
Descriptive Image Quality Assessment in the Wild	May 29, 2024	DescriptiveImage Quality Assessment	CodeCode Available	3
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation	Apr 15, 2024	Contrastive LearningDescriptive	CodeCode Available	3
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data	Feb 2, 2024	Contrastive LearningDescriptive	CodeCode Available	3
Fine-Tuning Language Models from Human Preferences	Sep 18, 2019	DescriptiveLanguage Modelling	CodeCode Available	3
SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning	Jun 18, 2025	Caption GenerationDescriptive	CodeCode Available	2
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model	Jun 11, 2025	cross-modal alignmentDescriptive	CodeCode Available	2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models	Jun 11, 2025	counterfactualDescriptive	CodeCode Available	2
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning	May 29, 2025	Anomaly DetectionDescriptive	CodeCode Available	2
RuleKit 2: Faster and simpler rule learning	Apr 29, 2025	Descriptive	CodeCode Available	2
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Mar 28, 2025	DescriptiveImage Quality Assessment	CodeCode Available	2
Teaching LMMs for Image Quality Scoring and Interpreting	Mar 12, 2025	DescriptiveImage Quality Assessment	CodeCode Available	2
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification	Feb 12, 2025	DecoderDescriptive	CodeCode Available	2
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression	Jan 1, 2025	Descriptive	CodeCode Available	2
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression	Dec 5, 2024	DescriptiveVisual Question Answering	CodeCode Available	2
SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition	Oct 14, 2024	Activity RecognitionDescriptive	CodeCode Available	2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion	Sep 26, 2024	DescriptiveGeneralized Referring Expression Comprehension	CodeCode Available	2
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description	Aug 24, 2024	DescriptiveSpeech Synthesis	CodeCode Available	2
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Jul 8, 2024	Action Quality AssessmentDescriptive	CodeCode Available	2
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification	Jul 4, 2024	DescriptiveDiversity	CodeCode Available	2
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations	Jun 17, 2024	DescriptiveMedical Diagnosis	CodeCode Available	2
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent	Jun 11, 2024	AI AgentDescriptive	CodeCode Available	2
Composed Image Retrieval for Remote Sensing	May 24, 2024	Composed Image Retrieval (CoIR)Descriptive	CodeCode Available	2
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning	Apr 14, 2024	Dense Video CaptioningDescriptive	CodeCode Available	2
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement	Mar 11, 2024	Clinical KnowledgeDescriptive	CodeCode Available	2
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control	Mar 7, 2024	Descriptive	CodeCode Available	2
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation	Jan 1, 2024	DescriptiveObject	CodeCode Available	2
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models	Dec 14, 2023	DescriptiveImage Quality Assessment	CodeCode Available	2
Customization Assistant for Text-to-image Generation	Dec 5, 2023	DescriptiveImage Generation	CodeCode Available	2
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans	Aug 16, 2023	DescriptiveQuestion Answering	CodeCode Available	2
Solving Data Quality Problems with Desbordante: a Demo	Jul 27, 2023	Anomaly DetectionDescriptive	CodeCode Available	2
AmadeusGPT: a natural language interface for interactive animal behavioral analysis	Jul 10, 2023	Descriptive	CodeCode Available	2
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language	Jun 28, 2023	DescriptiveLanguage Modeling	CodeCode Available	2
Scalable 3D Captioning with Pretrained Models	Jun 12, 2023	DescriptiveImage Captioning	CodeCode Available	2
GRiT: A Generative Region-to-text Transformer for Object Understanding	Dec 1, 2022	DecoderDense Captioning	CodeCode Available	2
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning	Nov 21, 2022	3D Classification3D Object Detection	CodeCode Available	2
What the DAAM: Interpreting Stable Diffusion Using Cross Attention	Oct 10, 2022	DenoisingDescriptive	CodeCode Available	2
What does a platypus look like? Generating customized prompts for zero-shot image classification	Sep 7, 2022	Descriptiveimage-classification	CodeCode Available	2
SCAMPS: Synthetics for Camera Measurement of Physiological Signals	Jun 8, 2022	DescriptiveDiversity	CodeCode Available	2
Fine-grained Image Captioning with CLIP Reward	May 26, 2022	Caption GenerationDescriptive	CodeCode Available	2
K-LITE: Learning Transferable Visual Models with External Knowledge	Apr 20, 2022	BenchmarkingDescriptive	CodeCode Available	2
Language-driven Semantic Segmentation	Jan 10, 2022	DescriptiveFew-Shot Semantic Segmentation	CodeCode Available	2
Describe Anything Model for Visual Question Answering on Text-rich Images	Jul 16, 2025	DescriptiveLanguage Modeling	CodeCode Available	1
Dataset Distillation via Vision-Language Category Prototype	Jun 30, 2025	Dataset DistillationDescriptive	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 30Next →

No leaderboard results yet.