| MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking | Jul 28, 2023 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 |
| GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction | Dec 13, 2024 | Autonomous DrivingPrediction | CodeCode Available | 2 |
| ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish | Mar 4, 2025 | Activity PredictionMultivariate Time Series Forecasting | CodeCode Available | 2 |
| Effective Diffusion Transformer Architecture for Image Super-Resolution | Sep 29, 2024 | Image GenerationImage Super-Resolution | CodeCode Available | 2 |
| Towards Diverse Binary Segmentation via A Simple yet General Gated Network | Mar 18, 2023 | DecoderSegmentation | CodeCode Available | 2 |
| LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Oct 21, 2024 | | CodeCode Available | 2 |
| MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space | Apr 18, 2024 | Drug Design | CodeCode Available | 2 |
| GlyphControl: Glyph Conditional Control for Visual Text Generation | May 29, 2023 | Optical Character Recognition (OCR)Text Generation | CodeCode Available | 2 |
| UnIVAL: Unified Model for Image, Video, Audio and Language Tasks | Jul 30, 2023 | Out-of-Distribution Generalization | CodeCode Available | 2 |
| LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Jul 22, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| Varformer: Adapting VAR's Generative Prior for Image Restoration | Dec 30, 2024 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| Guiding Generative Protein Language Models with Reinforcement Learning | Dec 17, 2024 | Diversityreinforcement-learning | CodeCode Available | 2 |
| WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research | Mar 30, 2023 | Audio captioningEvent Detection | CodeCode Available | 2 |
| On Discrete Prompt Optimization for Diffusion Models | Jun 27, 2024 | Adversarial AttackPrompt Engineering | CodeCode Available | 2 |
| Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation | Jun 2, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design | Apr 3, 2025 | Band GapDielectric Constant | CodeCode Available | 2 |
| HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology | May 17, 2025 | DiagnosticDiversity | CodeCode Available | 2 |
| D-Bot: Database Diagnosis System using Large Language Models | Dec 3, 2023 | | CodeCode Available | 2 |
| Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement | Mar 31, 2025 | HallucinationRAG | CodeCode Available | 2 |
| XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution | Mar 8, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition | Jan 7, 2025 | Graph LearningNode Classification | CodeCode Available | 2 |
| Frequency Adaptive Normalization For Non-stationary Time Series Forecasting | Sep 30, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Deep Reinforcement Learning for Multi-Agent Interaction | Aug 2, 2022 | BIG-bench Machine LearningCausal Inference | CodeCode Available | 2 |
| Distributional Soft Actor-Critic with Three Refinements | Oct 9, 2023 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 2 |
| Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion | Dec 4, 2024 | Autonomous VehiclesLidar Scene Completion | CodeCode Available | 2 |
| Navigation Variable-based Multi-objective Particle Swarm Optimization for UAV Path Planning with Kinematic Constraints | Jan 3, 2025 | Metaheuristic Optimization | CodeCode Available | 2 |
| SRAI: Towards Standardization of Geospatial AI | Oct 19, 2023 | | CodeCode Available | 2 |
| Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models | Sep 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel | Apr 21, 2023 | | CodeCode Available | 2 |
| MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Jul 17, 2024 | | CodeCode Available | 2 |
| Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | Sep 20, 2024 | Prompt Engineering | CodeCode Available | 2 |
| Contrastive Search Is What You Need For Neural Text Generation | Oct 25, 2022 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors | Nov 17, 2022 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 |
| Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge | May 6, 2024 | | CodeCode Available | 2 |
| Open World Scene Graph Generation using Vision Language Models | Jun 9, 2025 | Graph GenerationScene Graph Generation | CodeCode Available | 2 |
| Exposure Bracketing Is All You Need For A High-Quality Image | Jan 1, 2024 | AllDeblurring | CodeCode Available | 2 |
| ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | Aug 9, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data | Oct 8, 2024 | Change DetectionEarth Observation | CodeCode Available | 2 |
| MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark | Feb 7, 2024 | | CodeCode Available | 2 |
| An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models | Mar 14, 2024 | | CodeCode Available | 2 |
| zkLLM: Zero Knowledge Proofs for Large Language Models | Apr 24, 2024 | | CodeCode Available | 2 |
| FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model | Mar 5, 2024 | Stock Market Prediction | CodeCode Available | 2 |
| X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks | Nov 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 2 |
| Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models | Jun 7, 2023 | | CodeCode Available | 2 |
| Starting From Non-Parametric Networks for 3D Point Cloud Analysis | Jan 1, 2023 | | CodeCode Available | 2 |
| Foundational Large Language Models for Materials Research | Dec 12, 2024 | Domain AdaptationModel Selection | CodeCode Available | 2 |
| Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision | Jul 25, 2024 | DiversityMedical Image Analysis | CodeCode Available | 2 |
| AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine | Apr 23, 2025 | | CodeCode Available | 2 |
| Re3: Generating Longer Stories With Recursive Reprompting and Revision | Oct 13, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds | Jul 10, 2022 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 |