Semantic Textual Similarity

Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.

Image source: Learning Semantic Textual Similarity from Conversations

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 2381 papers

Title	Date	Tasks	Status	Hype
Prompt Obfuscation for Large Language Models	Sep 17, 2024	Large Language ModelSemantic Similarity	—Unverified	0
Cross-Lingual News Event Correlation for Stock Market Trend Prediction	Sep 16, 2024	ArticlesFinancial Analysis	—Unverified	0
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems	Sep 16, 2024	Collaborative FilteringRecommendation Systems	CodeCode Available	2
Distilling Monolingual and Crosslingual Word-in-Context Representations	Sep 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization	Sep 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting	Sep 12, 2024	Representation LearningSegmentation	—Unverified	0
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization	Sep 10, 2024	Document Classificationnamed-entity-recognition	CodeCode Available	0
Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning	Sep 9, 2024	AttributeFraud Detection	—Unverified	0
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning	Sep 2, 2024	Code CompletionCombinatorial Optimization	CodeCode Available	1
Self-Judge: Selective Instruction Following with Alignment Self-Evaluation	Sep 2, 2024	Instruction FollowingSemantic Similarity	CodeCode Available	0
LanguaShrink: Reducing Token Overhead with Psycholinguistics	Sep 1, 2024	ArticlesSemantic Similarity	—Unverified	0
GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting	Aug 31, 2024	Pose EstimationRepetitive Action Counting	CodeCode Available	0
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning	Aug 29, 2024	Few-Shot Imitation LearningImitation Learning	CodeCode Available	0
ConCSE: Unified Contrastive Learning and Augmentation for Code-Switched Embeddings	Aug 28, 2024	Contrastive LearningNatural Language Inference	CodeCode Available	0
Contrastive Learning Subspace for Text Clustering	Aug 26, 2024	ClusteringContrastive Learning	—Unverified	0
HTS-Attack: Heuristic Token Search for Jailbreaking Text-to-Image Models	Aug 25, 2024	Heuristic SearchImage Generation	—Unverified	0
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design	Aug 22, 2024	Information RetrievalReranking	—Unverified	0
GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation	Aug 21, 2024	Point Cloud SegmentationSemantic Similarity	CodeCode Available	0
Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores	Aug 19, 2024	RetrievalSemantic Textual Similarity	—Unverified	0
Distinguish Confusion in Legal Judgment Prediction via Revised Relation Knowledge	Aug 18, 2024	ArticlesInductive Bias	CodeCode Available	1
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment	Aug 15, 2024	Fact CheckingKnowledge Graphs	—Unverified	0
Extracting Sentence Embeddings from Pretrained Transformer Models	Aug 15, 2024	ClusteringRetrieval-augmented Generation	—Unverified	0
reCSE: Portable Reshaping Features for Sentence Embedding in Self-supervised Contrastive Learning	Aug 9, 2024	Contrastive LearningData Augmentation	CodeCode Available	0
Unsupervised Episode Detection for Large-Scale News Events	Aug 9, 2024	ArticlesEvent Detection	CodeCode Available	1
Semantics or spelling? Probing contextual word embeddings with orthographic noise	Aug 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	0

Show:10 25 50

← PrevPage 11 of 96Next →

All datasets STS Benchmark MTEB MRPC SICK STS13 STS14 STS12 STS15 STS16 MRPC Dev SentEval SICK-R

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SMARTRoBERTa	Dev Pearson Correlation	92.8	—	Unverified
2	DeBERTa (large)	Accuracy	92.5	—	Unverified
3	SMART-BERT	Dev Pearson Correlation	90	—	Unverified
4	MT-DNN-SMART	Pearson Correlation	0.93	—	Unverified
5	StructBERTRoBERTa ensemble	Pearson Correlation	0.93	—	Unverified
6	Mnet-Sim	Pearson Correlation	0.93	—	Unverified
7	XLNet (single model)	Pearson Correlation	0.93	—	Unverified
8	ALBERT	Pearson Correlation	0.93	—	Unverified
9	T5-11B	Pearson Correlation	0.93	—	Unverified
10	RoBERTa	Pearson Correlation	0.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AnglE-UAE	Spearman Correlation	84.54	—	Unverified
2	ST5-XXL	Spearman Correlation	82.63	—	Unverified
3	ST5-Large	Spearman Correlation	81.83	—	Unverified
4	ST5-XL	Spearman Correlation	81.66	—	Unverified
5	ST5-Base	Spearman Correlation	81.14	—	Unverified
6	MPNet-multilingual	Spearman Correlation	80.73	—	Unverified
7	SGPT-5.8B-nli	Spearman Correlation	80.53	—	Unverified
8	MPNet	Spearman Correlation	80.28	—	Unverified
9	MiniLM-L12	Spearman Correlation	79.8	—	Unverified
10	SimCSE-BERT-sup	Spearman Correlation	79.12	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MT-DNN-SMART	Accuracy	93.7	—	Unverified
2	ALBERT	Accuracy	93.4	—	Unverified
3	RoBERTa (ensemble)	Accuracy	92.3	—	Unverified
4	BigBird	F1	91.5	—	Unverified
5	StructBERTRoBERTa ensemble	Accuracy	91.5	—	Unverified
6	FLOATER-large	Accuracy	91.4	—	Unverified
7	SMART	Accuracy	91.3	—	Unverified
8	RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)	Accuracy	91	—	Unverified
9	RoBERTa-large 355M + Entailment as Few-shot Learner	F1	91	—	Unverified
10	SpanBERT	Accuracy	90.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PromCSE-RoBERTa-large (0.355B)	Spearman Correlation	0.82	—	Unverified
2	PromptEOL+CSE+LLaMA-30B	Spearman Correlation	0.82	—	Unverified
3	PromptEOL+CSE+OPT-13B	Spearman Correlation	0.82	—	Unverified
4	SimCSE-RoBERTalarge	Spearman Correlation	0.82	—	Unverified
5	PromptEOL+CSE+OPT-2.7B	Spearman Correlation	0.81	—	Unverified
6	SentenceBERT	Spearman Correlation	0.75	—	Unverified
7	SRoBERTa-NLI-base	Spearman Correlation	0.74	—	Unverified
8	SRoBERTa-NLI-large	Spearman Correlation	0.74	—	Unverified
9	Dino (STS/̄🦕)	Spearman Correlation	0.74	—	Unverified
10	SBERT-NLI-large	Spearman Correlation	0.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AnglE-LLaMA-7B	Spearman Correlation	0.91	—	Unverified
2	AnglE-LLaMA-7B-v2	Spearman Correlation	0.91	—	Unverified
3	PromptEOL+CSE+LLaMA-30B	Spearman Correlation	0.9	—	Unverified
4	PromptEOL+CSE+OPT-13B	Spearman Correlation	0.9	—	Unverified
5	PromptEOL+CSE+OPT-2.7B	Spearman Correlation	0.9	—	Unverified
6	PromCSE-RoBERTa-large (0.355B)	Spearman Correlation	0.89	—	Unverified
7	Trans-Encoder-BERT-large-bi (unsup.)	Spearman Correlation	0.89	—	Unverified
8	Trans-Encoder-BERT-large-cross (unsup.)	Spearman Correlation	0.88	—	Unverified
9	Trans-Encoder-RoBERTa-large-cross (unsup.)	Spearman Correlation	0.88	—	Unverified
10	SimCSE-RoBERTa-large	Spearman Correlation	0.87	—	Unverified