| Policy-Guided Diffusion | Apr 9, 2024 | | CodeCode Available | 2 |
| HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion | Mar 29, 2023 | | CodeCode Available | 2 |
| Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation | Nov 24, 2024 | Semantic Segmentation | CodeCode Available | 2 |
| Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion | May 4, 2022 | Information RetrievalKnowledge Graph Completion | CodeCode Available | 2 |
| Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Dec 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings | Nov 9, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis | Jun 14, 2024 | SuperpixelsVector Graphics | CodeCode Available | 2 |
| Empirical Sample Complexity of Neural Network Mixed State Reconstruction | Jul 4, 2023 | | CodeCode Available | 2 |
| DayDreamer: World Models for Physical Robot Learning | Jun 28, 2022 | Deep Reinforcement LearningNavigate | CodeCode Available | 2 |
| MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection | May 18, 2024 | Anomaly DetectionDecision Making | CodeCode Available | 2 |
| Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | May 26, 2024 | | CodeCode Available | 2 |
| OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation | Dec 12, 2024 | | CodeCode Available | 2 |
| Foundation Models for Video Understanding: A Survey | May 6, 2024 | SurveyVideo Understanding | CodeCode Available | 2 |
| ODIN: A Single Model for 2D and 3D Segmentation | Jan 4, 2024 | 3D Instance Segmentation3D Semantic Segmentation | CodeCode Available | 2 |
| Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-making | Nov 18, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| RelTR: Relation Transformer for Scene Graph Generation | Jan 27, 2022 | DecoderGraph Generation | CodeCode Available | 2 |
| Intrinsic Image Diffusion for Indoor Single-view Material Estimation | Dec 19, 2023 | | CodeCode Available | 2 |
| V_kD: Improving Knowledge Distillation using Orthogonal Projections | Mar 10, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| Social4Rec: Distilling User Preference from Social Graph for Video Recommendation in Tencent | Feb 20, 2023 | Knowledge DistillationRecommendation Systems | CodeCode Available | 2 |
| SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model | Jan 18, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms | Jun 5, 2024 | Low-Rank Matrix CompletionMachine Translation | CodeCode Available | 2 |
| Robust Reflection Removal with Flash-only Cues in the Wild | Nov 5, 2022 | Reflection Removal | CodeCode Available | 2 |
| Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation | Nov 15, 2024 | Contrastive LearningDiagnostic | CodeCode Available | 2 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models | Apr 13, 2023 | Decision MakingMath | CodeCode Available | 2 |
| Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers | Feb 9, 2024 | Generative Question AnsweringInformation Retrieval | CodeCode Available | 2 |
| Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice | May 30, 2024 | | CodeCode Available | 2 |
| RING++: Roto-translation Invariant Gram for Global Localization on a Sparse Scan Map | Oct 12, 2022 | Translation | CodeCode Available | 2 |
| AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation | Jun 1, 2025 | MambaMotion Compensation | CodeCode Available | 2 |
| Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic | Jun 27, 2023 | Image CaptioningReferring Expression Segmentation | CodeCode Available | 2 |
| Generative Multiplane Images: Making a 2D GAN 3D-Aware | Jul 21, 2022 | | CodeCode Available | 2 |
| Wayformer: Motion Forecasting via Simple & Efficient Attention Networks | Jul 12, 2022 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| Real-Time Polygonal Semantic Mapping for Humanoid Robot Stair Climbing | Nov 4, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |
| LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | Jul 11, 2024 | Image RetrievalImage to text | CodeCode Available | 2 |
| AgentSims: An Open-Source Sandbox for Large Language Model Evaluation | Aug 8, 2023 | Language Model EvaluationLanguage Modeling | CodeCode Available | 2 |
| Crystal-GFN: sampling crystals with desirable properties and constraints | Oct 7, 2023 | Formation Energy | CodeCode Available | 2 |
| GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh | Apr 11, 2024 | Computational Efficiency | CodeCode Available | 2 |
| MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models | Aug 5, 2024 | Image ComprehensionMultiple-choice | CodeCode Available | 2 |
| MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection | Mar 21, 2024 | Anomaly DetectionAnomaly Detection In Surveillance Videos | CodeCode Available | 2 |
| FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset | Mar 10, 2025 | | CodeCode Available | 2 |
| TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models | Jun 20, 2024 | Graph Question AnsweringNode Classification | CodeCode Available | 2 |
| DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding | Mar 17, 2025 | Domain GeneralizationMultimodal Reasoning | CodeCode Available | 2 |
| Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching | Mar 27, 2023 | DecoderFew-Shot Learning | CodeCode Available | 2 |
| HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection | Apr 7, 2022 | Objectobject-detection | CodeCode Available | 2 |
| Generative Medical Segmentation | Mar 27, 2024 | DecoderDomain Generalization | CodeCode Available | 2 |
| Vision-Centric BEV Perception: A Survey | Aug 4, 2022 | Survey | CodeCode Available | 2 |
| KNighter: Transforming Static Analysis with LLM-Synthesized Checkers | Mar 12, 2025 | | CodeCode Available | 2 |
| STaR: Bootstrapping Reasoning With Reasoning | Mar 28, 2022 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 2 |
| VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes | May 25, 2025 | 3DGS | CodeCode Available | 2 |