| VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | May 29, 2025 | Self-Supervised LearningVideo Generation | CodeCode Available | 2 | 5 |
| Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | Dec 20, 2024 | Human Agent Collaboration | CodeCode Available | 2 | 5 |
| HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Apr 3, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| MedM-VL: What Makes a Good Medical LVLM? | Apr 6, 2025 | Medical Image AnalysisQuestion Answering | CodeCode Available | 2 | 5 |
| Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards | Apr 16, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 | 5 |
| ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models | Oct 11, 2023 | Image Generation | CodeCode Available | 2 | 5 |
| All for One and One for All: Improving Music Separation by Bridging Networks | Oct 8, 2020 | AllMusic Source Separation | CodeCode Available | 2 | 5 |
| Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration | Sep 22, 2022 | Compressed Image Super-resolutionImage Restoration | CodeCode Available | 2 | 5 |
| MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | Mar 5, 2025 | | CodeCode Available | 2 | 5 |
| Mixture of LoRA Experts | Apr 21, 2024 | | CodeCode Available | 2 | 5 |
| Neighboring Autoregressive Modeling for Efficient Visual Generation | Mar 12, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 | 5 |
| The Calysto Scheme Project | Oct 16, 2023 | | CodeCode Available | 2 | 5 |
| ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Jul 5, 2024 | HallucinationLong Form Question Answering | CodeCode Available | 2 | 5 |
| Exploring Plain Vision Transformer Backbones for Object Detection | Mar 30, 2022 | Cross-Domain Few-Shot Object DetectionInstance Segmentation | CodeCode Available | 2 | 5 |
| Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging | Jun 17, 2024 | | CodeCode Available | 2 | 5 |
| Hidden Biases of End-to-End Driving Models | Jun 13, 2023 | Autonomous DrivingBench2Drive | CodeCode Available | 2 | 5 |
| LaserMix for Semi-Supervised LiDAR Semantic Segmentation | Jun 30, 2022 | LIDAR Semantic SegmentationSegmentation | CodeCode Available | 2 | 5 |
| IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization | May 11, 2024 | Sound Source Localization | CodeCode Available | 2 | 5 |
| GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling | Jan 31, 2025 | DenoisingGesture Generation | CodeCode Available | 2 | 5 |
| Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering | Oct 21, 2024 | Open-Domain Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Dec 28, 2023 | Pose EstimationVisual Odometry | CodeCode Available | 2 | 5 |
| Can Language Models Solve Olympiad Programming? | Apr 16, 2024 | | CodeCode Available | 2 | 5 |
| Improving Autoformalization using Type Checking | Jun 11, 2024 | Informal-to-formal Style Transfer | CodeCode Available | 2 | 5 |
| Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection | Sep 26, 2024 | Event DetectionRepresentation Learning | CodeCode Available | 2 | 5 |
| Masked Autoencoders for Point Cloud Self-supervised Learning | Mar 13, 2022 | 3D Part Segmentation3D Point Cloud Classification | CodeCode Available | 2 | 5 |
| MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning | May 4, 2024 | Earth Observationimage-classification | CodeCode Available | 2 | 5 |
| Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Jun 25, 2024 | cross-modal alignmentImage Classification | CodeCode Available | 2 | 5 |
| SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction | Mar 18, 2024 | Autonomous Vehiclesmotion prediction | CodeCode Available | 2 | 5 |
| ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning | Oct 8, 2023 | Federated LearningPersonalized Federated Learning | CodeCode Available | 2 | 5 |
| Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation | Nov 25, 2024 | Image to 3D | CodeCode Available | 2 | 5 |
| Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Mar 20, 2024 | 3D Multi-Object TrackingCPU | CodeCode Available | 2 | 5 |
| Attention Concatenation Volume for Accurate and Efficient Stereo Matching | Mar 4, 2022 | Patch MatchingStereo Depth Estimation | CodeCode Available | 2 | 5 |
| Crafting Interpretable Embeddings by Asking LLMs Questions | May 26, 2024 | Question Answering | CodeCode Available | 2 | 5 |
| PodAgent: A Comprehensive Framework for Podcast Generation | Mar 1, 2025 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 | 5 |
| FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Octopus: Embodied Vision-Language Programmer from Environmental Feedback | Oct 12, 2023 | BenchmarkingCode Generation | CodeCode Available | 2 | 5 |
| Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation | Feb 28, 2024 | Code GenerationIn-Context Learning | CodeCode Available | 2 | 5 |
| Learning Human-Inspired Force Strategies for Robotic Assembly | Mar 22, 2023 | | CodeCode Available | 2 | 5 |
| Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations | May 3, 2024 | Optical Flow EstimationReference-based Super-Resolution | CodeCode Available | 2 | 5 |
| MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning | Nov 4, 2023 | Multi-Task Learning | CodeCode Available | 2 | 5 |
| When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | Feb 16, 2024 | Mathematical ReasoningRe-Ranking | CodeCode Available | 2 | 5 |
| ScreenAI: A Vision-Language Model for UI and Infographics Understanding | Feb 7, 2024 | Chart Question AnsweringLanguage Modeling | CodeCode Available | 2 | 5 |
| Learning to Prompt for Vision-Language Models | Sep 2, 2021 | Domain GeneralizationFew-shot Age Estimation | CodeCode Available | 2 | 5 |
| EmoFace: Audio-driven Emotional 3D Face Animation | Jul 17, 2024 | 3D Face Animation | CodeCode Available | 2 | 5 |
| OmniBench: Towards The Future of Universal Omni-Language Models | Sep 23, 2024 | Instruction Following | CodeCode Available | 2 | 5 |
| ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series Data | Mar 15, 2022 | BenchmarkingDomain Adaptation | CodeCode Available | 2 | 5 |
| ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction | Jul 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 | 5 |
| InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features | Apr 9, 2025 | Computational Efficiency | CodeCode Available | 2 | 5 |
| Specializing Smaller Language Models towards Multi-Step Reasoning | Jan 30, 2023 | MathModel Selection | CodeCode Available | 2 | 5 |