| HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Jul 24, 2024 | BenchmarkingHuman Animation | CodeCode Available | 3 | 5 |
| Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | May 4, 2023 | DiversityIn-Context Learning | CodeCode Available | 3 | 5 |
| PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask | Dec 22, 2024 | In-Context LearningVirtual Try-on | CodeCode Available | 3 | 5 |
| Interpretable Differencing of Machine Learning Models | Jun 10, 2023 | Classification | CodeCode Available | 3 | 5 |
| Enhancing End-to-End Autonomous Driving with Latent World Model | Jun 12, 2024 | Autonomous DrivingNavSim | CodeCode Available | 3 | 5 |
| GNM: A General Navigation Model to Drive Any Robot | Oct 7, 2022 | | CodeCode Available | 3 | 5 |
| Cut and Learn for Unsupervised Object Detection and Instance Segmentation | Jan 26, 2023 | Instance Segmentationobject-detection | CodeCode Available | 3 | 5 |
| FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models | Jun 7, 2024 | Federated Learning | CodeCode Available | 3 | 5 |
| Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction | Nov 28, 2024 | 3D ReconstructionDiagnostic | CodeCode Available | 3 | 5 |
| COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations | Apr 25, 2024 | Contrastive LearningMusic Generation | CodeCode Available | 3 | 5 |
| From human experts to machines: An LLM supported approach to ontology and knowledge graph construction | Mar 13, 2024 | graph constructionKnowledge Graphs | CodeCode Available | 3 | 5 |
| vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | May 7, 2024 | GPUManagement | CodeCode Available | 3 | 5 |
| VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Mar 17, 2025 | Grounded Video Question AnsweringQuestion Answering | CodeCode Available | 3 | 5 |
| AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 3 | 5 |
| Advancing Speech Language Models by Scaling Supervised Fine-Tuning with Over 60,000 Hours of Synthetic Speech Dialogue Data | Dec 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Deep Learning Alternatives of the Kolmogorov Superposition Theorem | Oct 2, 2024 | Deep LearningKolmogorov-Arnold Networks | CodeCode Available | 3 | 5 |
| Transolver: A Fast Transformer Solver for PDEs on General Geometries | Feb 4, 2024 | | CodeCode Available | 3 | 5 |
| FNSPID: A Comprehensive Financial News Dataset in Time Series | Feb 9, 2024 | Financial AnalysisTime Series | CodeCode Available | 3 | 5 |
| An Improved RaftStereo Trained with A Mixed Dataset for the Robust Vision Challenge 2022 | Oct 23, 2022 | Stereo Matching | CodeCode Available | 3 | 5 |
| In-Context Learning for Extreme Multi-Label Classification | Jan 22, 2024 | ClassificationExtreme Multi-Label Classification | CodeCode Available | 3 | 5 |
| SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities | May 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models | Mar 31, 2024 | Image-text RetrievalLanguage Modeling | CodeCode Available | 3 | 5 |
| ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders | Jan 2, 2023 | Object DetectionRepresentation Learning | CodeCode Available | 3 | 5 |
| SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap | Apr 17, 2024 | Camera CalibrationGame State Reconstruction | CodeCode Available | 3 | 5 |
| Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise | Aug 19, 2022 | Image RestorationVariational Inference | CodeCode Available | 3 | 5 |
| Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Mar 25, 2025 | Text GenerationVideo Generation | CodeCode Available | 3 | 5 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 | 5 |
| Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion | Mar 20, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 | 5 |
| Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow | Jun 12, 2023 | | CodeCode Available | 3 | 5 |
| Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer | Jul 2, 2019 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 3 | 5 |
| Consistency Models Made Easy | Jun 20, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 | 5 |
| Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Sep 6, 2024 | Experimental Designscientific discovery | CodeCode Available | 3 | 5 |
| UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction | Mar 22, 2024 | DiversityPrediction | CodeCode Available | 3 | 5 |
| Scalable Optimization in the Modular Norm | May 23, 2024 | | CodeCode Available | 3 | 5 |
| SupeRANSAC: One RANSAC to Rule Them All | Jun 5, 2025 | AllPose Estimation | CodeCode Available | 3 | 5 |
| Wordflow: Social Prompt Engineering for Large Language Models | Jan 25, 2024 | Prompt Engineering | CodeCode Available | 3 | 5 |
| HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | Dec 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines | Jun 20, 2024 | Diversityobject-detection | CodeCode Available | 3 | 5 |
| Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | Jun 19, 2024 | Instruction Following | CodeCode Available | 3 | 5 |
| Face Anonymization Made Simple | Nov 1, 2024 | AttributeFace Anonymization | CodeCode Available | 3 | 5 |
| Locating and Editing Factual Associations in GPT | Feb 10, 2022 | counterfactualModel Editing | CodeCode Available | 3 | 5 |
| OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network | Sep 10, 2022 | Continual LearningObject | CodeCode Available | 3 | 5 |
| DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | May 3, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 3 | 5 |
| ImageInWords: Unlocking Hyper-Detailed Image Descriptions | May 5, 2024 | Image GenerationSpecificity | CodeCode Available | 3 | 5 |
| Flow Q-Learning | Feb 4, 2025 | Action GenerationD4RL | CodeCode Available | 3 | 5 |
| MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs | Apr 1, 2025 | Knowledge GraphsMathematical Reasoning | CodeCode Available | 3 | 5 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 | 5 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 | 5 |
| The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report | Apr 16, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 | 5 |
| LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding | Oct 22, 2024 | Token ReductionVideo Question Answering | CodeCode Available | 3 | 5 |