SOTAVerified

Contrastive Learning

Contrastive Learning is a deep learning technique for unsupervised representation learning. The goal is to learn a representation of data such that similar instances are close together in the representation space, while dissimilar instances are far apart.

It has been shown to be effective in various computer vision and natural language processing tasks, including image retrieval, zero-shot learning, and cross-modal retrieval. In these tasks, the learned representations can be used as features for downstream tasks such as classification and clustering.

(Image credit: Schroff et al. 2015)

Papers

Showing 125 of 6661 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Scaling Vision Pre-Training to 4K ResolutionCode7
PowerPM: Foundation Model for Power SystemsCode7
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt SynergyCode7
Rethinking the Sample Relations for Few-Shot ClassificationCode7
What's Behind the Mask: Understanding Masked Graph Modeling for Graph AutoencodersCode6
LLM2Vec: Large Language Models Are Secretly Powerful Text EncodersCode5
Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative TrainingCode5
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
Secrets of RLHF in Large Language Models Part II: Reward ModelingCode5
Time-series attribution maps with regularized contrastive learningCode5
LLM2CLIP: Powerful Language Model Unlocks Richer Visual RepresentationCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image SegmentationCode4
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsCode4
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation LearnersCode4
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
AltCLIP: Altering the Language Encoder in CLIP for Extended Language CapabilitiesCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
GLIPv2: Unifying Localization and Vision-Language UnderstandingCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Prototypical Verbalizer for Prompt-based Few-shot TuningCode4
Large Language Model based Long-tail Query Rewriting in Taobao SearchCode3
Large-Scale 3D Medical Image Pre-training with Geometric Context PriorsCode3
Show:102550
← PrevPage 1 of 267Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ResNet50ImageNet Top-1 Accuracy73.6Unverified
2ResNet50ImageNet Top-1 Accuracy73Unverified
3ResNet50ImageNet Top-1 Accuracy71.1Unverified
4ResNet50ImageNet Top-1 Accuracy69.3Unverified
5ResNet50 (v2)ImageNet Top-1 Accuracy67.6Unverified
6ResNet50 (v2)ImageNet Top-1 Accuracy63.8Unverified
7ResNet50ImageNet Top-1 Accuracy63.6Unverified
8ResNet50ImageNet Top-1 Accuracy61.5Unverified
9ResNet50ImageNet Top-1 Accuracy61.5Unverified
10ResNet50 (4×)ImageNet Top-1 Accuracy61.3Unverified
#ModelMetricClaimedVerifiedStatus
110..5sec1Unverified
#ModelMetricClaimedVerifiedStatus
1IPCL (ResNet18)Accuracy (Top-1)84.77Unverified
#ModelMetricClaimedVerifiedStatus
1IPCL (ResNet18)Accuracy (Top-1)85.55Unverified