| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 | 5 |
| VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning | May 29, 2025 | Anomaly DetectionDescriptive | CodeCode Available | 2 | 5 |
| Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Jul 8, 2024 | Action Quality AssessmentDescriptive | CodeCode Available | 2 | 5 |
| Solving Data Quality Problems with Desbordante: a Demo | Jul 27, 2023 | Anomaly DetectionDescriptive | CodeCode Available | 2 | 5 |
| What does a platypus look like? Generating customized prompts for zero-shot image classification | Sep 7, 2022 | Descriptiveimage-classification | CodeCode Available | 2 | 5 |
| Language-driven Semantic Segmentation | Jan 10, 2022 | DescriptiveFew-Shot Semantic Segmentation | CodeCode Available | 2 | 5 |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Dec 5, 2024 | DescriptiveVisual Question Answering | CodeCode Available | 2 | 5 |
| MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Jun 17, 2024 | DescriptiveMedical Diagnosis | CodeCode Available | 2 | 5 |
| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 | 5 |
| Fine-grained Image Captioning with CLIP Reward | May 26, 2022 | Caption GenerationDescriptive | CodeCode Available | 2 | 5 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 | 5 |
| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 | 5 |
| K-LITE: Learning Transferable Visual Models with External Knowledge | Apr 20, 2022 | BenchmarkingDescriptive | CodeCode Available | 2 | 5 |
| CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models | Jun 11, 2025 | counterfactualDescriptive | CodeCode Available | 2 | 5 |
| Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Mar 28, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models | Dec 14, 2023 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| AmadeusGPT: a natural language interface for interactive animal behavioral analysis | Jul 10, 2023 | Descriptive | CodeCode Available | 2 | 5 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 | 5 |
| Composed Image Retrieval for Remote Sensing | May 24, 2024 | Composed Image Retrieval (CoIR)Descriptive | CodeCode Available | 2 | 5 |
| Scalable 3D Captioning with Pretrained Models | Jun 12, 2023 | DescriptiveImage Captioning | CodeCode Available | 2 | 5 |
| SCAMPS: Synthetics for Camera Measurement of Physiological Signals | Jun 8, 2022 | DescriptiveDiversity | CodeCode Available | 2 | 5 |
| SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning | Jun 18, 2025 | Caption GenerationDescriptive | CodeCode Available | 2 | 5 |
| SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description | Aug 24, 2024 | DescriptiveSpeech Synthesis | CodeCode Available | 2 | 5 |
| Deep Graph Matching under Quadratic Constraint | Mar 11, 2021 | DescriptiveGraph Matching | CodeCode Available | 1 | 5 |
| Deep Implicit Statistical Shape Models for 3D Medical Image Delineation | Apr 7, 2021 | DescriptiveLiver Segmentation | CodeCode Available | 1 | 5 |