Semantic Textual Similarity

Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.

Image source: Learning Semantic Textual Similarity from Conversations

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 2381 papers

Title	Date	Tasks	Status	Hype
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation	Dec 17, 2024	Contrastive LearningImage Segmentation	CodeCode Available	1
Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs	Dec 16, 2024	Prompt EngineeringSemantic Textual Similarity	—Unverified	0
Quantifying Positional Biases in Text Embedding Models	Dec 13, 2024	Information RetrievalPosition	CodeCode Available	0
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data	Dec 13, 2024	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1
Single-View Graph Contrastive Learning with Soft Neighborhood Awareness	Dec 12, 2024	Contrastive LearningSemantic Similarity	CodeCode Available	0
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images	Dec 11, 2024	Contrastive LearningCross-Modal Information Retrieval	—Unverified	0
Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages	Dec 11, 2024	In-Context LearningSemantic Similarity	CodeCode Available	0
Generating Knowledge Graphs from Large Language Models: A Comparative Study of GPT-4, LLaMA 2, and BERT	Dec 10, 2024	Knowledge GraphsSemantic Similarity	—Unverified	0
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning	Dec 9, 2024	RAGReranking	—Unverified	0
Detecting Redundant Health Survey Questions Using Language-agnostic BERT Sentence Embedding (LaBSE)	Dec 5, 2024	Computational EfficiencyQuestion Similarity	—Unverified	0
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models	Dec 4, 2024	Semantic SimilaritySemantic Textual Similarity	—Unverified	0
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Dec 4, 2024	HallucinationInstruction Following	—Unverified	0
Interpretable Company Similarity with Sparse Autoencoders	Dec 3, 2024	Large Language ModelSemantic Similarity	—Unverified	0
TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity	Dec 3, 2024	Adversarial RobustnessAdversarial Text	CodeCode Available	0
Quantifying perturbation impacts for large language models	Dec 1, 2024	Semantic SimilaritySemantic Textual Similarity	—Unverified	0
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild	Dec 1, 2024	Moment RetrievalRetrieval	CodeCode Available	1
Generative Semantic Communication for Joint Image Transmission and Segmentation	Nov 27, 2024	feature selectionImage Reconstruction	—Unverified	0
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data	Nov 27, 2024	Activity RecognitionContrastive Learning	CodeCode Available	1
Isolating authorship from content with semantic embeddings and contrastive learning	Nov 27, 2024	Contrastive LearningDisentanglement	—Unverified	0
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models	Nov 25, 2024	Red TeamingSemantic Similarity	—Unverified	0
BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques	Nov 22, 2024	Hate Speech DetectionKnowledge Distillation	—Unverified	0
FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting	Nov 20, 2024	Dimensionality ReductionGPU	—Unverified	0
HNCSE: Advancing Sentence Embeddings via Hybrid Contrastive Learning with Hard Negatives	Nov 19, 2024	Contrastive LearningRepresentation Learning	—Unverified	0
Advancing Large Language Models for Spatiotemporal and Semantic Association Mining of Similar Environmental Events	Nov 19, 2024	ArticlesReranking	—Unverified	0
Membership Inference Attack against Long-Context Large Language Models	Nov 18, 2024	Inference AttackMembership Inference Attack	—Unverified	0

Show:10 25 50

← PrevPage 8 of 96Next →

All datasets STS Benchmark MTEB MRPC SICK STS13 STS14 STS12 STS15 STS16 MRPC Dev SentEval SICK-R

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SMARTRoBERTa	Dev Pearson Correlation	92.8	—	Unverified
2	DeBERTa (large)	Accuracy	92.5	—	Unverified
3	SMART-BERT	Dev Pearson Correlation	90	—	Unverified
4	MT-DNN-SMART	Pearson Correlation	0.93	—	Unverified
5	StructBERTRoBERTa ensemble	Pearson Correlation	0.93	—	Unverified
6	Mnet-Sim	Pearson Correlation	0.93	—	Unverified
7	XLNet (single model)	Pearson Correlation	0.93	—	Unverified
8	T5-11B	Pearson Correlation	0.93	—	Unverified
9	ALBERT	Pearson Correlation	0.93	—	Unverified
10	RoBERTa	Pearson Correlation	0.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AnglE-UAE	Spearman Correlation	84.54	—	Unverified
2	ST5-XXL	Spearman Correlation	82.63	—	Unverified
3	ST5-Large	Spearman Correlation	81.83	—	Unverified
4	ST5-XL	Spearman Correlation	81.66	—	Unverified
5	ST5-Base	Spearman Correlation	81.14	—	Unverified
6	MPNet-multilingual	Spearman Correlation	80.73	—	Unverified
7	SGPT-5.8B-nli	Spearman Correlation	80.53	—	Unverified
8	MPNet	Spearman Correlation	80.28	—	Unverified
9	MiniLM-L12	Spearman Correlation	79.8	—	Unverified
10	SimCSE-BERT-sup	Spearman Correlation	79.12	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MT-DNN-SMART	Accuracy	93.7	—	Unverified
2	ALBERT	Accuracy	93.4	—	Unverified
3	RoBERTa (ensemble)	Accuracy	92.3	—	Unverified
4	BigBird	F1	91.5	—	Unverified
5	StructBERTRoBERTa ensemble	Accuracy	91.5	—	Unverified
6	FLOATER-large	Accuracy	91.4	—	Unverified
7	SMART	Accuracy	91.3	—	Unverified
8	RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)	Accuracy	91	—	Unverified
9	RoBERTa-large 355M + Entailment as Few-shot Learner	F1	91	—	Unverified
10	SpanBERT	Accuracy	90.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PromCSE-RoBERTa-large (0.355B)	Spearman Correlation	0.82	—	Unverified
2	PromptEOL+CSE+LLaMA-30B	Spearman Correlation	0.82	—	Unverified
3	PromptEOL+CSE+OPT-13B	Spearman Correlation	0.82	—	Unverified
4	SimCSE-RoBERTalarge	Spearman Correlation	0.82	—	Unverified
5	PromptEOL+CSE+OPT-2.7B	Spearman Correlation	0.81	—	Unverified
6	SentenceBERT	Spearman Correlation	0.75	—	Unverified
7	SRoBERTa-NLI-base	Spearman Correlation	0.74	—	Unverified
8	SRoBERTa-NLI-large	Spearman Correlation	0.74	—	Unverified
9	Dino (STS/̄🦕)	Spearman Correlation	0.74	—	Unverified
10	SBERT-NLI-large	Spearman Correlation	0.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AnglE-LLaMA-7B	Spearman Correlation	0.91	—	Unverified
2	AnglE-LLaMA-7B-v2	Spearman Correlation	0.91	—	Unverified
3	PromptEOL+CSE+LLaMA-30B	Spearman Correlation	0.9	—	Unverified
4	PromptEOL+CSE+OPT-13B	Spearman Correlation	0.9	—	Unverified
5	PromptEOL+CSE+OPT-2.7B	Spearman Correlation	0.9	—	Unverified
6	PromCSE-RoBERTa-large (0.355B)	Spearman Correlation	0.89	—	Unverified
7	Trans-Encoder-BERT-large-bi (unsup.)	Spearman Correlation	0.89	—	Unverified
8	Trans-Encoder-BERT-large-cross (unsup.)	Spearman Correlation	0.88	—	Unverified
9	Trans-Encoder-RoBERTa-large-cross (unsup.)	Spearman Correlation	0.88	—	Unverified
10	SimCSE-RoBERTa-large	Spearman Correlation	0.87	—	Unverified