| FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent | Apr 23, 2024 | Novel View SynthesisOptical Flow Estimation | CodeCode Available | 4 |
| Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center | Jun 15, 2024 | | CodeCode Available | 4 |
| Liquid: Language Models are Scalable Multi-modal Generators | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Dimension Reduction with Locally Adjusted Graphs | Dec 19, 2024 | Dimensionality Reduction | CodeCode Available | 4 |
| Mastering Diverse Domains through World Models | Jan 10, 2023 | Atari Games 100kDecision Making | CodeCode Available | 4 |
| StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners | Jun 1, 2023 | Contrastive Learning | CodeCode Available | 4 |
| IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency | May 16, 2024 | | CodeCode Available | 4 |
| Thin-Plate Spline Motion Model for Image Animation | Mar 27, 2022 | Face ReenactmentImage Animation | CodeCode Available | 4 |
| DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images | Jun 5, 2024 | 2D Object DetectionDenoising | CodeCode Available | 4 |
| Guaranteed Approximation Bounds for Mixed-Precision Neural Operators | Jul 27, 2023 | GPUOperator learning | CodeCode Available | 4 |
| DN-DETR: Accelerate DETR Training by Introducing Query DeNoising | Mar 2, 2022 | DecoderObject Detection | CodeCode Available | 4 |
| BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | Jan 30, 2023 | Generative Visual Question AnsweringImage Captioning | CodeCode Available | 4 |
| Graph of Thoughts: Solving Elaborate Problems with Large Language Models | Aug 18, 2023 | | CodeCode Available | 4 |
| Neural Operators with Localized Integral and Differential Kernels | Feb 26, 2024 | Operator learning | CodeCode Available | 4 |
| Self-Supervised Pre-Training for Table Structure Recognition Transformer | Feb 23, 2024 | Representation Learning | CodeCode Available | 4 |
| Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders | Dec 23, 2024 | 3D Shape ModelingBenchmarking | CodeCode Available | 4 |
| Autonomous LLM-driven research from data to human-verifiable research papers | Apr 24, 2024 | scientific discovery | CodeCode Available | 4 |
| Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step | Jan 23, 2025 | Image GenerationText-to-Image Generation | CodeCode Available | 4 |
| One-Shot Diffusion Mimicker for Handwritten Text Generation | Sep 6, 2024 | Handwriting generationText Generation | CodeCode Available | 4 |
| SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning | Aug 14, 2024 | CPUMotion Planning | CodeCode Available | 4 |
| Mean Flows for One-step Generative Modeling | May 19, 2025 | | CodeCode Available | 4 |
| Tag2Text: Guiding Vision-Language Model via Image Tagging | Mar 10, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | Feb 16, 2024 | RAG | CodeCode Available | 4 |
| ImgEdit: A Unified Image Editing Dataset and Benchmark | May 26, 2025 | Image Editing | CodeCode Available | 4 |
| Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling | Nov 1, 2023 | HallucinationKnowledge Distillation | CodeCode Available | 4 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 |
| Image Fusion via Vision-Language Model | Feb 3, 2024 | DecoderLanguage Modeling | CodeCode Available | 4 |
| Looking Backward: Streaming Video-to-Video Translation with Feature Banks | May 24, 2024 | GPUTranslation | CodeCode Available | 4 |
| Restructuring Vector Quantization with the Rotation Trick | Oct 8, 2024 | Quantization | CodeCode Available | 4 |
| ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | Feb 25, 2025 | Question AnsweringRAG | CodeCode Available | 4 |
| SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis | May 22, 2025 | DiversityInformation Retrieval | CodeCode Available | 4 |
| JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Nov 14, 2024 | Image AnimationMotion Generation | CodeCode Available | 4 |
| TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models | May 18, 2023 | Natural Language InferenceSynthetic Data Generation | CodeCode Available | 4 |
| Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey | Feb 3, 2024 | parameter-efficient fine-tuningTransfer Learning | CodeCode Available | 4 |
| OpenAgents: An Open Platform for Language Agents in the Wild | Oct 16, 2023 | 2D Object Detection | CodeCode Available | 4 |
| Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis | Jun 1, 2023 | Audio SynthesisComputational Efficiency | CodeCode Available | 4 |
| A Survey on Diffusion Models for Time Series and Spatio-Temporal Data | Apr 29, 2024 | Anomaly DetectionImputation | CodeCode Available | 4 |
| OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM | Feb 14, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 4 |
| Factorio Learning Environment | Mar 6, 2025 | Program SynthesisSpatial Reasoning | CodeCode Available | 4 |
| GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation | May 26, 2025 | Question AnsweringSynthetic Data Generation | CodeCode Available | 4 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 |
| Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Apr 21, 2025 | Video Generation | CodeCode Available | 4 |
| ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning | Aug 4, 2024 | DecoderImitation Learning | CodeCode Available | 4 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities | Dec 13, 2022 | Decoder | CodeCode Available | 4 |
| LESS: Selecting Influential Data for Targeted Instruction Tuning | Feb 6, 2024 | | CodeCode Available | 4 |
| ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search | Jun 6, 2024 | | CodeCode Available | 4 |
| No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation | Apr 5, 2024 | Few-Shot LearningScene Segmentation | CodeCode Available | 4 |
| AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | Aug 21, 2023 | | CodeCode Available | 4 |