| LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language | May 21, 2024 | regression | CodeCode Available | 2 |
| BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations | Apr 15, 2022 | Self-Supervised Learning | CodeCode Available | 2 |
| Learning local equivariant representations for quantum operators | Jul 8, 2024 | Computational Efficiency | CodeCode Available | 2 |
| Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation | Apr 1, 2024 | Action SegmentationSegmentation | CodeCode Available | 2 |
| Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context | Sep 15, 2023 | | CodeCode Available | 2 |
| ByT5 model for massively multilingual grapheme-to-phoneme conversion | Apr 6, 2022 | Grapheme-to-Phoneme Conversion | CodeCode Available | 2 |
| Global Estimation of Building-Integrated Facade and Rooftop Photovoltaic Potential by Integrating 3D Building Footprint and Spatio-Temporal Datasets | Dec 2, 2024 | | CodeCode Available | 2 |
| Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking | Feb 7, 2023 | 3D Multi-Object TrackingMulti-Object Tracking | CodeCode Available | 2 |
| 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Dec 24, 2024 | Natural Language UnderstandingScene Understanding | CodeCode Available | 2 |
| SEGAN: Speech Enhancement Generative Adversarial Network | Mar 28, 2017 | Generative Adversarial NetworkSpeech Enhancement | CodeCode Available | 2 |
| Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Mar 14, 2024 | Knowledge DistillationNovel Object Detection | CodeCode Available | 2 |
| Progressive Distillation for Fast Sampling of Diffusion Models | Feb 1, 2022 | Density EstimationImage Generation | CodeCode Available | 2 |
| Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation | Mar 25, 2022 | Contrastive Learningimage-classification | CodeCode Available | 2 |
| SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes | Nov 7, 2022 | Depth EstimationIndoor Monocular Depth Estimation | CodeCode Available | 2 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| VDT: General-purpose Video Diffusion Transformers via Mask Modeling | May 22, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba | May 9, 2024 | Action RecognitionMamba | CodeCode Available | 2 |
| Lost in the Middle: How Language Models Use Long Contexts | Jul 6, 2023 | Language ModellingPosition | CodeCode Available | 2 |
| Representation Engineering: A Top-Down Approach to AI Transparency | Oct 2, 2023 | Question Answering | CodeCode Available | 2 |
| WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting | Dec 25, 2024 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 |
| Aligning Text-to-Image Diffusion Models with Reward Backpropagation | Oct 5, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Temporal Graph Benchmark for Machine Learning on Temporal Graphs | Jul 3, 2023 | Node Property PredictionProperty Prediction | CodeCode Available | 2 |
| A Survey on Data Augmentation in Large Model Era | Jan 27, 2024 | Audio Signal ProcessingData Augmentation | CodeCode Available | 2 |
| On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference | Feb 9, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI | Jul 19, 2022 | Active LearningAutoML | CodeCode Available | 2 |
| XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 |
| GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher | Aug 12, 2023 | EthicsRed Teaming | CodeCode Available | 2 |
| AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Feb 20, 2025 | Autonomous NavigationNavigate | CodeCode Available | 2 |
| Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Apr 21, 2025 | AttributeCamera Pose Estimation | CodeCode Available | 2 |
| AnyLoc: Towards Universal Visual Place Recognition | Aug 1, 2023 | Image RetrievalVisual Place Recognition | CodeCode Available | 2 |
| Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression | Dec 22, 2024 | 3D Lane DetectionLane Detection | CodeCode Available | 2 |
| Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching | May 29, 2024 | compressed sensingDeblurring | CodeCode Available | 2 |
| MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages | Oct 1, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 |
| ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization | Apr 9, 2024 | Colorization | CodeCode Available | 2 |
| KVQ: Kwai Video Quality Assessment for Short-form Videos | Feb 11, 2024 | FormVideo Quality Assessment | CodeCode Available | 2 |
| MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | Mar 22, 2024 | Medical DiagnosisMedical Visual Question Answering | CodeCode Available | 2 |
| On Embeddings for Numerical Features in Tabular Deep Learning | Mar 10, 2022 | Deep Learning | CodeCode Available | 2 |
| 3D Vision with Transformers: A Survey | Aug 8, 2022 | Pose EstimationSurvey | CodeCode Available | 2 |
| DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets | Jan 15, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 2 |
| Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs | Mar 30, 2016 | | CodeCode Available | 2 |
| How to Merge Your Multimodal Models Over Time? | Dec 9, 2024 | | CodeCode Available | 2 |
| PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance | Jun 8, 2023 | Conversational Question AnsweringLanguage Modeling | CodeCode Available | 2 |
| DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation | Nov 18, 2022 | Code GenerationMemorization | CodeCode Available | 2 |
| Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment | Apr 2, 2025 | 3DGSNeRF | CodeCode Available | 2 |
| MM-IFEngine: Towards Multimodal Instruction Following | Apr 10, 2025 | Instruction Following | CodeCode Available | 2 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Mar 25, 2024 | 3D ReconstructionAnimal Pose Estimation | CodeCode Available | 2 |
| CFBench: A Comprehensive Constraints-Following Benchmark for LLMs | Aug 2, 2024 | | CodeCode Available | 2 |
| Maintaining Plasticity in Deep Continual Learning | Jun 23, 2023 | Binary ClassificationContinual Learning | CodeCode Available | 2 |