MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Sep 1, 2024 Self-Supervised Learning text-to-speech
Code Code Available 95 Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 95 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Jun 11, 2025 Action Anticipation Large Language Model
Code Code Available 75 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis May 14, 2025 Denoising Depth Estimation
Code Code Available 75 What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders May 20, 2022 Contrastive Learning Link Prediction
Code Code Available 65 Transformers without Normalization Mar 13, 2025 Self-Supervised Learning
Code Code Available 55 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding May 6, 2024 Metric Learning Self-Supervised Learning
Code Code Available 55 Learning to (Learn at Test Time): RNNs with Expressive Hidden States Jul 5, 2024 16k 8k
Code Code Available 55 Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Oct 9, 2024 Denoising Image Generation
Code Code Available 55 Awesome Multi-modal Object Tracking May 23, 2024 Autonomous Driving Knowledge Distillation
Code Code Available 55 Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training May 23, 2023 Contrastive Learning Self-Supervised Learning
Code Code Available 55 SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Jun 15, 2023 Cloud Detection Earth Observation
Code Code Available 45 TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch Oct 27, 2023 Self-Supervised Learning Speech Enhancement
Code Code Available 45 Sonata: Self-Supervised Learning of Reliable Point Representations Mar 20, 2025 3D Semantic Segmentation Self-Supervised Learning
Code Code Available 45 GigaAM: Efficient Self-Supervised Learner for Speech Recognition Jun 1, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 45 A Survey on Large Language Models for Recommendation May 31, 2023 Recommendation Systems
Code Code Available 45 Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN May 27, 2022 Image Classification Instance Segmentation
Code Code Available 45 Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Dec 5, 2024 Denoising Image Restoration
Code Code Available 45 Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 45 A Framework For Contrastive Self-Supervised Learning And Designing A New Approach Aug 31, 2020 Data Augmentation Image Classification
Code Code Available 45 TSLANet: Rethinking Transformers for Time Series Representation Learning Apr 12, 2024 Anomaly Detection Computational Efficiency
Code Code Available 35 STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 35 The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech Sep 14, 2024 Self-Supervised Learning Transfer Learning
Code Code Available 35 ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders Jan 2, 2023 Object Detection Representation Learning
Code Code Available 35 SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 35 Robust and Efficient Medical Imaging with Self-Supervision May 19, 2022 Diagnostic Representation Learning
Code Code Available 35 Pushing the limits of raw waveform speaker recognition Mar 16, 2022 Self-Supervised Learning Speaker Recognition
Code Code Available 35 SARATR-X: Toward Building A Foundation Model for SAR Target Recognition May 15, 2024 2D Object Detection Earth Observation
Code Code Available 35 Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket Jan 4, 2024 image-classification Image Classification
Code Code Available 35 VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis Feb 27, 2024 Contrastive Learning Medical Image Analysis
Code Code Available 35 Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised Learning Dec 28, 2024 Fairness Federated Learning
Code Code Available 35 Moving Object Segmentation: All You Need Is SAM (and Flow) Apr 18, 2024 All Motion Segmentation
Code Code Available 35 Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Apr 19, 2025 Decoder Object Localization
Code Code Available 35 Accelerating Goal-Conditioned RL Algorithms and Research Aug 20, 2024 GPU reinforcement-learning
Code Code Available 35 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Mar 20, 2024 Aerial Scene Classification Building change detection for remote sensing images
Code Code Available 35 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Jan 11, 2024 Representation Learning Self-Supervised Learning
Code Code Available 35 emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation Dec 23, 2023 Emotion Recognition Self-Supervised Learning
Code Code Available 35 ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models Jan 30, 2024 Self-Supervised Learning Speaker Recognition
Code Code Available 35 Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models Jun 23, 2025 Domain Adaptation GPU
Code Code Available 35 EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training May 14, 2024 Data Augmentation Self-Supervised Learning
Code Code Available 35 Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach May 24, 2024 Clustering Self-Supervised Learning
Code Code Available 35 Leveraging Self-Supervised Learning for Speaker Diarization Sep 14, 2024 Self-Supervised Learning speaker-diarization
Code Code Available 35 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Jan 7, 2024 Audio Classification Self-Supervised Learning
Code Code Available 35 Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks Mar 30, 2023 Human Parsing Pedestrian Attribute Recognition
Code Code Available 35 A Survey on Self-Supervised Learning for Non-Sequential Tabular Data Feb 2, 2024 Contrastive Learning Descriptive
Code Code Available 35 EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals Jan 1, 2024 EEG Representation Learning
Code Code Available 35 Emergence of Segmentation with Minimalistic White-Box Transformers Aug 30, 2023 Segmentation Self-Supervised Learning
Code Code Available 35 MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Jan 2, 2025 Contrastive Learning Key Detection
Code Code Available 35 How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model Apr 15, 2024 Decoder Image Segmentation
Code Code Available 35 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Jun 20, 2020 Quantization Self-Supervised Learning
Code Code Available 35