| DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving | Jun 21, 2025 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 |
| Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability | Jun 2, 2025 | DescriptiveSynthetic Data Generation | CodeCode Available | 1 |
| Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models | May 11, 2025 | DescriptiveDiagnostic | CodeCode Available | 1 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 |
| GOAL: Global-local Object Alignment Learning | Mar 22, 2025 | DescriptiveObject | CodeCode Available | 1 |
| Controlling Latent Diffusion Using Latent CLIP | Mar 11, 2025 | DenoisingDescriptive | CodeCode Available | 1 |
| Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Mar 5, 2025 | Computational EfficiencyDescriptive | CodeCode Available | 1 |
| Enhancing Monocular 3D Scene Completion with Diffusion Model | Mar 2, 2025 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 |
| Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs | Feb 4, 2025 | 16kDescriptive | CodeCode Available | 1 |
| Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension | Dec 4, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| GraphXAIN: Narratives to Explain Graph Neural Networks | Nov 4, 2024 | DescriptiveFeature Importance | CodeCode Available | 1 |
| SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments | Oct 23, 2024 | DescriptiveSentiment Analysis | CodeCode Available | 1 |
| Scene Graph Generation with Role-Playing Large Language Models | Oct 20, 2024 | DescriptiveGraph Generation | CodeCode Available | 1 |
| Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval | Oct 4, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds | Sep 13, 2024 | Audio ClassificationDescriptive | CodeCode Available | 1 |
| RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models | Aug 27, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization | Aug 26, 2024 | DescriptiveImage Captioning | CodeCode Available | 1 |
| Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests | Aug 21, 2024 | Bug fixingDescriptive | CodeCode Available | 1 |
| FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Aug 19, 2024 | DescriptiveFace Swapping | CodeCode Available | 1 |
| Variationist: Exploring Multifaceted Variation and Bias in Written Language Data | Jun 25, 2024 | DescriptiveDiversity | CodeCode Available | 1 |
| The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories | Jun 24, 2024 | Descriptive | CodeCode Available | 1 |
| Navigating Knowledge Management Implementation Success in Government Organizations: A type-2 fuzzy approach | Jun 18, 2024 | DescriptiveManagement | CodeCode Available | 1 |
| Neural Concept Binder | Jun 14, 2024 | DescriptiveRetrieval | CodeCode Available | 1 |
| LaMOT: Language-Guided Multi-Object Tracking | Jun 12, 2024 | DescriptiveMulti-Object Tracking | CodeCode Available | 1 |
| A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding | Jun 8, 2024 | DescriptiveLanguage Modelling | CodeCode Available | 1 |
| What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights | May 31, 2024 | DescriptiveSelf-Supervised Learning | CodeCode Available | 1 |
| A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation | May 29, 2024 | Autonomous DrivingBoundary Detection | CodeCode Available | 1 |
| User-Friendly Customized Generation with Multi-Modal Prompts | May 26, 2024 | DescriptiveImage Generation | CodeCode Available | 1 |
| Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models | May 5, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Aligning LLM Agents by Learning Latent Preference from User Edits | Apr 23, 2024 | DescriptiveLanguage Modelling | CodeCode Available | 1 |
| Mixture of Low-rank Experts for Transferable AI-Generated Image Detection | Apr 7, 2024 | Descriptiveparameter-efficient fine-tuning | CodeCode Available | 1 |
| A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM) | Apr 2, 2024 | Descriptive | CodeCode Available | 1 |
| Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery | Mar 12, 2024 | DescriptiveRetrieval | CodeCode Available | 1 |
| FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications | Mar 11, 2024 | AttributeDescriptive | CodeCode Available | 1 |
| TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation | Feb 24, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings | Jan 28, 2024 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training | Jan 4, 2024 | DescriptiveImage Captioning | CodeCode Available | 1 |
| VideoStudio: Generating Consistent-Content and Multi-Scene Videos | Jan 2, 2024 | DescriptiveVideo Generation | CodeCode Available | 1 |
| SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation | Jan 1, 2024 | Descriptivepoint cloud upsampling | CodeCode Available | 1 |
| A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties | Dec 21, 2023 | Common Sense ReasoningDescriptive | CodeCode Available | 1 |
| Ins-HOI: Instance Aware Human-Object Interactions Recovery | Dec 15, 2023 | DescriptiveDisentanglement | CodeCode Available | 1 |
| Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation | Dec 13, 2023 | DescriptiveObject | CodeCode Available | 1 |
| NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations | Dec 11, 2023 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live | Dec 6, 2023 | Descriptive | CodeCode Available | 1 |
| OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Nov 30, 2023 | DescriptiveLanguage Modelling | CodeCode Available | 1 |
| MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts | Nov 16, 2023 | Binary ClassificationDescriptive | CodeCode Available | 1 |
| Zero-shot audio captioning with audio-language model guidance and audio context keywords | Nov 14, 2023 | Audio captioningDescriptive | CodeCode Available | 1 |
| FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models | Nov 2, 2023 | DescriptiveInstruction Following | CodeCode Available | 1 |
| This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models | Oct 24, 2023 | DescriptiveNegation | CodeCode Available | 1 |