Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 9MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Sep 1, 2024 Self-Supervised Learning text-to-speech
Code Code Available 9V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Jun 11, 2025 Action Anticipation Large Language Model
Code Code Available 7Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis May 14, 2025 Denoising Depth Estimation
Code Code Available 7What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders May 20, 2022 Contrastive Learning Link Prediction
Code Code Available 6AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding May 6, 2024 Metric Learning Self-Supervised Learning
Code Code Available 5Learning to (Learn at Test Time): RNNs with Expressive Hidden States Jul 5, 2024 16k 8k
Code Code Available 5Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training May 23, 2023 Contrastive Learning Self-Supervised Learning
Code Code Available 5Transformers without Normalization Mar 13, 2025 Self-Supervised Learning
Code Code Available 5Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Oct 9, 2024 Denoising Image Generation
Code Code Available 5Awesome Multi-modal Object Tracking May 23, 2024 Autonomous Driving Knowledge Distillation
Code Code Available 5TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch Oct 27, 2023 Self-Supervised Learning Speech Enhancement
Code Code Available 4A Framework For Contrastive Self-Supervised Learning And Designing A New Approach Aug 31, 2020 Data Augmentation Image Classification
Code Code Available 4SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Jun 15, 2023 Cloud Detection Earth Observation
Code Code Available 4Sonata: Self-Supervised Learning of Reliable Point Representations Mar 20, 2025 3D Semantic Segmentation Self-Supervised Learning
Code Code Available 4Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Dec 5, 2024 Denoising Image Restoration
Code Code Available 4Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN May 27, 2022 Image Classification Instance Segmentation
Code Code Available 4GigaAM: Efficient Self-Supervised Learner for Speech Recognition Jun 1, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 4Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 4A Survey on Large Language Models for Recommendation May 31, 2023 Recommendation Systems
Code Code Available 4TSLANet: Rethinking Transformers for Time Series Representation Learning Apr 12, 2024 Anomaly Detection Computational Efficiency
Code Code Available 3STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 3The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech Sep 14, 2024 Self-Supervised Learning Transfer Learning
Code Code Available 3SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 3Robust and Efficient Medical Imaging with Self-Supervision May 19, 2022 Diagnostic Representation Learning
Code Code Available 3MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Jan 2, 2025 Contrastive Learning Key Detection
Code Code Available 3SARATR-X: Toward Building A Foundation Model for SAR Target Recognition May 15, 2024 2D Object Detection Earth Observation
Code Code Available 3Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket Jan 4, 2024 image-classification Image Classification
Code Code Available 3VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis Feb 27, 2024 Contrastive Learning Medical Image Analysis
Code Code Available 3Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised Learning Dec 28, 2024 Fairness Federated Learning
Code Code Available 3Moving Object Segmentation: All You Need Is SAM (and Flow) Apr 18, 2024 All Motion Segmentation
Code Code Available 3Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Apr 19, 2025 Decoder Object Localization
Code Code Available 3A Survey on Self-Supervised Learning for Non-Sequential Tabular Data Feb 2, 2024 Contrastive Learning Descriptive
Code Code Available 3MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Mar 20, 2024 Aerial Scene Classification Building change detection for remote sensing images
Code Code Available 3Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Jan 11, 2024 Representation Learning Self-Supervised Learning
Code Code Available 3ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models Jan 30, 2024 Self-Supervised Learning Speaker Recognition
Code Code Available 3emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation Dec 23, 2023 Emotion Recognition Self-Supervised Learning
Code Code Available 3Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models Jun 23, 2025 Domain Adaptation GPU
Code Code Available 3EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training May 14, 2024 Data Augmentation Self-Supervised Learning
Code Code Available 3Accelerating Goal-Conditioned RL Algorithms and Research Aug 20, 2024 GPU reinforcement-learning
Code Code Available 3Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach May 24, 2024 Clustering Self-Supervised Learning
Code Code Available 3Leveraging Self-Supervised Learning for Speaker Diarization Sep 14, 2024 Self-Supervised Learning speaker-diarization
Code Code Available 3EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Jan 7, 2024 Audio Classification Self-Supervised Learning
Code Code Available 3Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks Mar 30, 2023 Human Parsing Pedestrian Attribute Recognition
Code Code Available 3EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals Jan 1, 2024 EEG Representation Learning
Code Code Available 3Emergence of Segmentation with Minimalistic White-Box Transformers Aug 30, 2023 Segmentation Self-Supervised Learning
Code Code Available 3How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model Apr 15, 2024 Decoder Image Segmentation
Code Code Available 3Pushing the limits of raw waveform speaker recognition Mar 16, 2022 Self-Supervised Learning Speaker Recognition
Code Code Available 3ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders Jan 2, 2023 Object Detection Representation Learning
Code Code Available 3wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Jun 20, 2020 Quantization Self-Supervised Learning
Code Code Available 3