| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | Sep 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 3 | 5 |
| AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation | Apr 19, 2024 | Action Generation | CodeCode Available | 3 | 5 |
| Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection | Sep 7, 2023 | AnatomyData Augmentation | CodeCode Available | 3 | 5 |
| Improving Model Evaluation using SMART Filtering of Benchmark Datasets | Oct 26, 2024 | ChatbotDiversity | CodeCode Available | 3 | 5 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 | 5 |
| A new face swap method for image and video domains: a technical report | Feb 7, 2022 | Action Recognition In VideosFace Recognition | CodeCode Available | 3 | 5 |
| MooER: LLM-based Speech Recognition and Translation Models from Moore Threads | Aug 9, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Reinforcement Learning Enhanced LLMs: A Survey | Dec 5, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 | 5 |
| PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Oct 29, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 | 5 |
| RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision | Sep 13, 2024 | Decoderobject-detection | CodeCode Available | 3 | 5 |
| AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception | Jan 16, 2024 | MLLM Evaluation: Aesthetics | CodeCode Available | 3 | 5 |
| One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt | Jan 23, 2025 | Image GenerationStory Generation | CodeCode Available | 3 | 5 |
| An Imitative Reinforcement Learning Framework for Autonomous Dogfight | Jun 17, 2024 | Imitation Learningreinforcement-learning | CodeCode Available | 3 | 5 |
| FusionBench: A Comprehensive Benchmark of Deep Model Fusion | Jun 5, 2024 | image-classificationImage Classification | CodeCode Available | 3 | 5 |
| FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models | Jun 4, 2024 | Text GenerationTransfer Learning | CodeCode Available | 3 | 5 |
| Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning | Mar 25, 2024 | Visual Question Answering (VQA) | CodeCode Available | 3 | 5 |
| From Panels to Prose: Generating Literary Narratives from Comics | Mar 30, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 3 | 5 |
| TorchCP: A Python Library for Conformal Prediction | Feb 20, 2024 | Conformal PredictionDeep Learning | CodeCode Available | 3 | 5 |
| RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | Apr 30, 2024 | Computational EfficiencyHallucination | CodeCode Available | 3 | 5 |
| Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs | Mar 3, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | Sep 19, 2024 | document understandingVideo Question Answering | CodeCode Available | 3 | 5 |
| X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design | Feb 11, 2024 | graph constructionKnowledge Graphs | CodeCode Available | 3 | 5 |
| Segment Any Medical Model Extended | Mar 26, 2024 | Data AugmentationImage Segmentation | CodeCode Available | 3 | 5 |
| An Image is Worth 32 Tokens for Reconstruction and Generation | Jun 11, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 3 | 5 |