| Fairness Implications of Encoding Protected Categorical Attributes | Jan 27, 2022 | FairnessFeature Engineering | CodeCode Available | 4 |
| Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 |
| Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Apr 10, 2024 | Book summarizationLanguage Modeling | CodeCode Available | 4 |
| X^2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction | Mar 27, 2025 | CT ReconstructionDecoder | CodeCode Available | 4 |
| LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | Nov 1, 2023 | AllImage Generation | CodeCode Available | 4 |
| Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese | Sep 8, 2023 | Domain AdaptationHallucination | CodeCode Available | 4 |
| MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Dec 9, 2024 | Camera CalibrationCamera Pose Estimation | CodeCode Available | 4 |
| BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | Nov 9, 2022 | DecoderLanguage Modeling | CodeCode Available | 4 |
| Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses | Jun 14, 2024 | | CodeCode Available | 4 |
| BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision | Nov 18, 2022 | 3D Object Detection | CodeCode Available | 4 |
| NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors | Dec 6, 2022 | 3D Generation3D geometry | CodeCode Available | 4 |
| RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild | Apr 21, 2025 | | CodeCode Available | 4 |
| COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval | Jun 2, 2024 | Information RetrievalRAG | CodeCode Available | 4 |
| UniScene: Unified Occupancy-centric Driving Scene Generation | Dec 6, 2024 | Autonomous DrivingScene Generation | CodeCode Available | 4 |
| Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset | Jan 9, 2025 | Human Mesh RecoveryMotion Generation | CodeCode Available | 4 |
| Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Sep 26, 2024 | 3D ReconstructionDenoising | CodeCode Available | 4 |
| Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | Jul 17, 2024 | RetrievalVideo Understanding | CodeCode Available | 4 |
| When Does Perceptual Alignment Benefit Vision Representations? | Oct 14, 2024 | Depth EstimationImage Generation | CodeCode Available | 4 |
| MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI | Oct 15, 2024 | Benchmarking | CodeCode Available | 4 |
| Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models | Jan 14, 2025 | BenchmarkingText-to-Video Generation | CodeCode Available | 4 |
| A foundation model for human-AI collaboration in medical literature mining | Jan 27, 2025 | Literature MiningSystematic Literature Review | CodeCode Available | 4 |
| Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | Oct 9, 2023 | Action RecognitionImage Generation | CodeCode Available | 4 |
| PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology | May 16, 2024 | whole slide images | CodeCode Available | 4 |
| FFCV: Accelerating Training by Removing Data Bottlenecks | Jun 21, 2023 | CPUGPU | CodeCode Available | 4 |