Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 95 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Sep 1, 2024 Self-Supervised Learning text-to-speech
Code Code Available 95 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis May 14, 2025 Denoising Depth Estimation
Code Code Available 75 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Jun 11, 2025 Action Anticipation Large Language Model
Code Code Available 75 What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders May 20, 2022 Contrastive Learning Link Prediction
Code Code Available 65 Transformers without Normalization Mar 13, 2025 Self-Supervised Learning
Code Code Available 55 Awesome Multi-modal Object Tracking May 23, 2024 Autonomous Driving Knowledge Distillation
Code Code Available 55 Learning to (Learn at Test Time): RNNs with Expressive Hidden States Jul 5, 2024 16k 8k
Code Code Available 55 Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training May 23, 2023 Contrastive Learning Self-Supervised Learning
Code Code Available 55 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding May 6, 2024 Metric Learning Self-Supervised Learning
Code Code Available 55 Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Oct 9, 2024 Denoising Image Generation
Code Code Available 55 SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Jun 15, 2023 Cloud Detection Earth Observation
Code Code Available 45 TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch Oct 27, 2023 Self-Supervised Learning Speech Enhancement
Code Code Available 45 Sonata: Self-Supervised Learning of Reliable Point Representations Mar 20, 2025 3D Semantic Segmentation Self-Supervised Learning
Code Code Available 45 Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Dec 5, 2024 Denoising Image Restoration
Code Code Available 45 Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN May 27, 2022 Image Classification Instance Segmentation
Code Code Available 45 Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 45 A Survey on Large Language Models for Recommendation May 31, 2023 Recommendation Systems
Code Code Available 45 A Framework For Contrastive Self-Supervised Learning And Designing A New Approach Aug 31, 2020 Data Augmentation Image Classification
Code Code Available 45 GigaAM: Efficient Self-Supervised Learner for Speech Recognition Jun 1, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 45 TSLANet: Rethinking Transformers for Time Series Representation Learning Apr 12, 2024 Anomaly Detection Computational Efficiency
Code Code Available 35 STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 35 The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech Sep 14, 2024 Self-Supervised Learning Transfer Learning
Code Code Available 35 SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 35 ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders Jan 2, 2023 Object Detection Representation Learning
Code Code Available 35 Robust and Efficient Medical Imaging with Self-Supervision May 19, 2022 Diagnostic Representation Learning
Code Code Available 35 Pushing the limits of raw waveform speaker recognition Mar 16, 2022 Self-Supervised Learning Speaker Recognition
Code Code Available 35 SARATR-X: Toward Building A Foundation Model for SAR Target Recognition May 15, 2024 2D Object Detection Earth Observation
Code Code Available 35 Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket Jan 4, 2024 image-classification Image Classification
Code Code Available 35 VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis Feb 27, 2024 Contrastive Learning Medical Image Analysis
Code Code Available 35 Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks Mar 30, 2023 Human Parsing Pedestrian Attribute Recognition
Code Code Available 35 Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Apr 19, 2025 Decoder Object Localization
Code Code Available 35 How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model Apr 15, 2024 Decoder Image Segmentation
Code Code Available 35 Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised Learning Dec 28, 2024 Fairness Federated Learning
Code Code Available 35 Leveraging Self-Supervised Learning for Speaker Diarization Sep 14, 2024 Self-Supervised Learning speaker-diarization
Code Code Available 35 Moving Object Segmentation: All You Need Is SAM (and Flow) Apr 18, 2024 All Motion Segmentation
Code Code Available 35 ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models Jan 30, 2024 Self-Supervised Learning Speaker Recognition
Code Code Available 35 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Jan 11, 2024 Representation Learning Self-Supervised Learning
Code Code Available 35 Accelerating Goal-Conditioned RL Algorithms and Research Aug 20, 2024 GPU reinforcement-learning
Code Code Available 35 Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach May 24, 2024 Clustering Self-Supervised Learning
Code Code Available 35 Emergence of Segmentation with Minimalistic White-Box Transformers Aug 30, 2023 Segmentation Self-Supervised Learning
Code Code Available 35 emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation Dec 23, 2023 Emotion Recognition Self-Supervised Learning
Code Code Available 35 EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals Jan 1, 2024 EEG Representation Learning
Code Code Available 35 Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models Jun 23, 2025 Domain Adaptation GPU
Code Code Available 35 A Survey on Self-Supervised Learning for Non-Sequential Tabular Data Feb 2, 2024 Contrastive Learning Descriptive
Code Code Available 35 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Jan 7, 2024 Audio Classification Self-Supervised Learning
Code Code Available 35 EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training May 14, 2024 Data Augmentation Self-Supervised Learning
Code Code Available 35 MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Jan 2, 2025 Contrastive Learning Key Detection
Code Code Available 35 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Mar 20, 2024 Aerial Scene Classification Building change detection for remote sensing images
Code Code Available 35 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Jun 20, 2020 Quantization Self-Supervised Learning
Code Code Available 35