| Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback | Feb 6, 2024 | Video-based Generative Performance Benchmarking | CodeCode Available | 2 | 5 |
| EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain | Jan 30, 2024 | Image ComprehensionInstruction Following | CodeCode Available | 2 | 5 |
| Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs | Oct 10, 2024 | Active LearningLanguage Modeling | CodeCode Available | 2 | 5 |
| Evaluating Quantized Large Language Models | Feb 28, 2024 | MambaQuantization | CodeCode Available | 2 | 5 |
| Edu-ConvoKit: An Open-Source Library for Education Conversation Data | Feb 7, 2024 | | CodeCode Available | 2 | 5 |
| Calibrated Self-Rewarding Vision Language Models | May 23, 2024 | HallucinationLanguage Modelling | CodeCode Available | 2 | 5 |
| PERT: Pre-training BERT with Permuted Language Model | Mar 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement | Mar 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Training Diffusion Models with Reinforcement Learning | May 22, 2023 | Decision MakingDenoising | CodeCode Available | 2 | 5 |
| GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction | Oct 5, 2023 | Event Argument ExtractionEvent Extraction | CodeCode Available | 2 | 5 |
| All in One: Exploring Unified Video-Language Pre-training | Mar 14, 2022 | AllLanguage Modelling | CodeCode Available | 2 | 5 |
| A Survey on Multimodal Large Language Models for Autonomous Driving | Nov 21, 2023 | Autonomous Driving | CodeCode Available | 2 | 5 |
| Towards A Unified Conformer Structure: from ASR to ASV Task | Nov 14, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| DocPrompting: Generating Code by Retrieving the Docs | Jul 13, 2022 | Code Generation | CodeCode Available | 2 | 5 |
| AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation | Mar 4, 2024 | Semantic SegmentationSemi-Supervised Semantic Segmentation | CodeCode Available | 2 | 5 |
| Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | May 30, 2025 | BenchmarkingBlocking | CodeCode Available | 2 | 5 |
| Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives | Nov 9, 2022 | DisentanglementVideo Generation | CodeCode Available | 2 | 5 |
| Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models | Dec 26, 2022 | Image ReconstructionRepresentation Learning | CodeCode Available | 2 | 5 |
| TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs | Mar 28, 2022 | CPUGPU | CodeCode Available | 2 | 5 |
| Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs | Oct 18, 2022 | Deep LearningScheduling | CodeCode Available | 2 | 5 |
| Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study | Mar 13, 2024 | Code Generation | CodeCode Available | 2 | 5 |
| REEF: Representation Encoding Fingerprints for Large Language Models | Oct 18, 2024 | | CodeCode Available | 2 | 5 |
| Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation | Mar 20, 2024 | Semantic SegmentationWeakly supervised Semantic Segmentation | CodeCode Available | 2 | 5 |
| MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model | Aug 31, 2022 | DenoisingMotion Generation | CodeCode Available | 2 | 5 |
| Large language models surpass human experts in predicting neuroscience results | Mar 4, 2024 | | CodeCode Available | 2 | 5 |
| Owl-1: Omni World Model for Consistent Long Video Generation | Dec 12, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk Assessment | Jun 29, 2024 | Prediction | CodeCode Available | 2 | 5 |
| K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization | Jun 8, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Metric Flow Matching for Smooth Interpolations on the Data Manifold | May 23, 2024 | Trajectory Prediction | CodeCode Available | 2 | 5 |
| Harmonizer: Learning to Perform White-Box Image and Video Harmonization | Jul 4, 2022 | Image HarmonizationVideo Harmonization | CodeCode Available | 2 | 5 |
| Android in the Zoo: Chain-of-Action-Thought for GUI Agents | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Knowledge Circuits in Pretrained Transformers | May 28, 2024 | In-Context Learningknowledge editing | CodeCode Available | 2 | 5 |
| PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation | Aug 19, 2022 | Deep LearningImage Segmentation | CodeCode Available | 2 | 5 |
| PHemoNet: A Multimodal Network for Physiological Signals | Sep 13, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 2 | 5 |
| From Sparse to Soft Mixtures of Experts | Aug 2, 2023 | | CodeCode Available | 2 | 5 |
| ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text | Jan 2, 2024 | ColorizationSketch Colorization | CodeCode Available | 2 | 5 |
| Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset | Jun 10, 2024 | Instance SegmentationSalient Object Detection | CodeCode Available | 2 | 5 |
| DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution | Mar 3, 2025 | Autonomous DrivingImage Super-Resolution | CodeCode Available | 2 | 5 |
| nuScenes: A multimodal dataset for autonomous driving | Mar 26, 2019 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection | Jun 10, 2024 | Backdoor AttackCode Completion | CodeCode Available | 2 | 5 |
| Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising | Jun 7, 2022 | 3D ReconstructionDenoising | CodeCode Available | 2 | 5 |
| Video Prediction Transformers without Recurrence or Convolution | Oct 7, 2024 | DecoderPrediction | CodeCode Available | 2 | 5 |
| TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning | Apr 13, 2025 | Question Answeringreinforcement-learning | CodeCode Available | 2 | 5 |
| DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering | Oct 11, 2021 | Speech Enhancement | CodeCode Available | 2 | 5 |
| PoseScript: Linking 3D Human Poses and Natural Language | Oct 21, 2022 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 2 | 5 |
| SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations | Aug 2, 2021 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift | Jul 10, 2024 | Change DetectionDisaster Response | CodeCode Available | 2 | 5 |
| LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics | May 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Unsupervised Universal Image Segmentation | Dec 28, 2023 | Image SegmentationInstance Segmentation | CodeCode Available | 2 | 5 |