SOTAVerified

Contrastive Learning

Contrastive Learning is a deep learning technique for unsupervised representation learning. The goal is to learn a representation of data such that similar instances are close together in the representation space, while dissimilar instances are far apart.

It has been shown to be effective in various computer vision and natural language processing tasks, including image retrieval, zero-shot learning, and cross-modal retrieval. In these tasks, the learned representations can be used as features for downstream tasks such as classification and clustering.

(Image credit: Schroff et al. 2015)

Papers

Showing 150 of 6661 papers

TitleStatusHype
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt SynergyCode7
Scaling Vision Pre-Training to 4K ResolutionCode7
PowerPM: Foundation Model for Power SystemsCode7
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Rethinking the Sample Relations for Few-Shot ClassificationCode7
What's Behind the Mask: Understanding Masked Graph Modeling for Graph AutoencodersCode6
Secrets of RLHF in Large Language Models Part II: Reward ModelingCode5
Time-series attribution maps with regularized contrastive learningCode5
LLM2Vec: Large Language Models Are Secretly Powerful Text EncodersCode5
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative TrainingCode5
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image SegmentationCode4
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation LearnersCode4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
Prototypical Verbalizer for Prompt-based Few-shot TuningCode4
GLIPv2: Unifying Localization and Vision-Language UnderstandingCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsCode4
LLM2CLIP: Powerful Language Model Unlocks Richer Visual RepresentationCode4
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
AltCLIP: Altering the Language Encoder in CLIP for Extended Language CapabilitiesCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many ClassesCode3
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual RepresentationsCode3
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image AnalysisCode3
Sigmoid Loss for Language Image Pre-TrainingCode3
Visual Causal Scene Refinement for Video Question AnsweringCode3
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked ModelingCode3
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-TrainingCode3
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation ImageryCode3
Trial and Error: Exploration-Based Trajectory Optimization for LLM AgentsCode3
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector QuantizationCode3
Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools SegmentationCode3
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity RepresentationCode3
Large Language Model based Long-tail Query Rewriting in Taobao SearchCode3
Large-Scale 3D Medical Image Pre-training with Geometric Context PriorsCode3
Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud DetectionCode3
A Survey on Self-Supervised Learning for Non-Sequential Tabular DataCode3
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative RepresentationsCode3
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language AlignmentCode3
Focused Transformer: Contrastive Training for Context ScalingCode3
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth FusionCode3
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio RepresentationsCode3
Momentum Contrast for Unsupervised Visual Representation LearningCode3
ECG-FM: An Open Electrocardiogram Foundation ModelCode3
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance FieldsCode3
Denoising as Adaptation: Noise-Space Domain Adaptation for Image RestorationCode2
Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought CorrectionCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
Show:102550
← PrevPage 1 of 134Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ResNet50ImageNet Top-1 Accuracy73.6Unverified
2ResNet50ImageNet Top-1 Accuracy73Unverified
3ResNet50ImageNet Top-1 Accuracy71.1Unverified
4ResNet50ImageNet Top-1 Accuracy69.3Unverified
5ResNet50 (v2)ImageNet Top-1 Accuracy67.6Unverified
6ResNet50 (v2)ImageNet Top-1 Accuracy63.8Unverified
7ResNet50ImageNet Top-1 Accuracy63.6Unverified
8ResNet50ImageNet Top-1 Accuracy61.5Unverified
9ResNet50ImageNet Top-1 Accuracy61.5Unverified
10ResNet50 (4×)ImageNet Top-1 Accuracy61.3Unverified
#ModelMetricClaimedVerifiedStatus
110..5sec1Unverified
#ModelMetricClaimedVerifiedStatus
1IPCL (ResNet18)Accuracy (Top-1)84.77Unverified
#ModelMetricClaimedVerifiedStatus
1IPCL (ResNet18)Accuracy (Top-1)85.55Unverified