| Global Estimation of Building-Integrated Facade and Rooftop Photovoltaic Potential by Integrating 3D Building Footprint and Spatio-Temporal Datasets | Dec 2, 2024 | | CodeCode Available | 2 | 5 |
| Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking | Feb 7, 2023 | 3D Multi-Object TrackingMulti-Object Tracking | CodeCode Available | 2 | 5 |
| 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Dec 24, 2024 | Natural Language UnderstandingScene Understanding | CodeCode Available | 2 | 5 |
| SEGAN: Speech Enhancement Generative Adversarial Network | Mar 28, 2017 | Generative Adversarial NetworkSpeech Enhancement | CodeCode Available | 2 | 5 |
| Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Mar 14, 2024 | Knowledge DistillationNovel Object Detection | CodeCode Available | 2 | 5 |
| Progressive Distillation for Fast Sampling of Diffusion Models | Feb 1, 2022 | Density EstimationImage Generation | CodeCode Available | 2 | 5 |
| Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation | Mar 25, 2022 | Contrastive Learningimage-classification | CodeCode Available | 2 | 5 |
| SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes | Nov 7, 2022 | Depth EstimationIndoor Monocular Depth Estimation | CodeCode Available | 2 | 5 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| VDT: General-purpose Video Diffusion Transformers via Mask Modeling | May 22, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 | 5 |
| Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba | May 9, 2024 | Action RecognitionMamba | CodeCode Available | 2 | 5 |
| Lost in the Middle: How Language Models Use Long Contexts | Jul 6, 2023 | Language ModellingPosition | CodeCode Available | 2 | 5 |
| Representation Engineering: A Top-Down Approach to AI Transparency | Oct 2, 2023 | Question Answering | CodeCode Available | 2 | 5 |
| WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting | Dec 25, 2024 | 3DGS3D Reconstruction | CodeCode Available | 2 | 5 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 | 5 |
| Aligning Text-to-Image Diffusion Models with Reward Backpropagation | Oct 5, 2023 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Temporal Graph Benchmark for Machine Learning on Temporal Graphs | Jul 3, 2023 | Node Property PredictionProperty Prediction | CodeCode Available | 2 | 5 |
| A Survey on Data Augmentation in Large Model Era | Jan 27, 2024 | Audio Signal ProcessingData Augmentation | CodeCode Available | 2 | 5 |
| On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference | Feb 9, 2024 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI | Jul 19, 2022 | Active LearningAutoML | CodeCode Available | 2 | 5 |
| XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 | 5 |
| GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher | Aug 12, 2023 | EthicsRed Teaming | CodeCode Available | 2 | 5 |
| AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Feb 20, 2025 | Autonomous NavigationNavigate | CodeCode Available | 2 | 5 |
| Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Apr 21, 2025 | AttributeCamera Pose Estimation | CodeCode Available | 2 | 5 |
| AnyLoc: Towards Universal Visual Place Recognition | Aug 1, 2023 | Image RetrievalVisual Place Recognition | CodeCode Available | 2 | 5 |
| Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression | Dec 22, 2024 | 3D Lane DetectionLane Detection | CodeCode Available | 2 | 5 |
| Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching | May 29, 2024 | compressed sensingDeblurring | CodeCode Available | 2 | 5 |
| MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages | Oct 1, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 | 5 |
| ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization | Apr 9, 2024 | Colorization | CodeCode Available | 2 | 5 |
| KVQ: Kwai Video Quality Assessment for Short-form Videos | Feb 11, 2024 | FormVideo Quality Assessment | CodeCode Available | 2 | 5 |
| MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | Mar 22, 2024 | Medical DiagnosisMedical Visual Question Answering | CodeCode Available | 2 | 5 |
| On Embeddings for Numerical Features in Tabular Deep Learning | Mar 10, 2022 | Deep Learning | CodeCode Available | 2 | 5 |
| 3D Vision with Transformers: A Survey | Aug 8, 2022 | Pose EstimationSurvey | CodeCode Available | 2 | 5 |
| DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets | Jan 15, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 2 | 5 |
| Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs | Mar 30, 2016 | | CodeCode Available | 2 | 5 |
| How to Merge Your Multimodal Models Over Time? | Dec 9, 2024 | | CodeCode Available | 2 | 5 |
| PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance | Jun 8, 2023 | Conversational Question AnsweringLanguage Modeling | CodeCode Available | 2 | 5 |
| DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation | Nov 18, 2022 | Code GenerationMemorization | CodeCode Available | 2 | 5 |
| Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment | Apr 2, 2025 | 3DGSNeRF | CodeCode Available | 2 | 5 |
| MM-IFEngine: Towards Multimodal Instruction Following | Apr 10, 2025 | Instruction Following | CodeCode Available | 2 | 5 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 | 5 |
| Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Mar 25, 2024 | 3D ReconstructionAnimal Pose Estimation | CodeCode Available | 2 | 5 |
| CFBench: A Comprehensive Constraints-Following Benchmark for LLMs | Aug 2, 2024 | | CodeCode Available | 2 | 5 |
| Maintaining Plasticity in Deep Continual Learning | Jun 23, 2023 | Binary ClassificationContinual Learning | CodeCode Available | 2 | 5 |
| Text-Only Training for Image Captioning using Noise-Injected CLIP | Nov 1, 2022 | DecoderImage Captioning | CodeCode Available | 2 | 5 |
| DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems | Jul 15, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| Leveraging Temporal Contextualization for Video Action Recognition | Apr 15, 2024 | Action RecognitionTemporal Action Localization | CodeCode Available | 2 | 5 |
| Towards Building Text-To-Speech Systems for the Next Billion Users | Nov 17, 2022 | DiversitySpeech Synthesis | CodeCode Available | 2 | 5 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 | 5 |
| u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality | Jul 14, 2022 | Speaker Verificationspeech-recognition | CodeCode Available | 2 | 5 |