SOTAVerified

Semantic Textual Similarity

Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.

Image source: Learning Semantic Textual Similarity from Conversations

Papers

Showing 150 of 2381 papers

TitleStatusHype
SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts0
SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression0
FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution DetectionCode0
LineRetriever: Planning-Aware Observation Reduction for Web Agents0
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning0
Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval0
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models0
Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation0
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty0
Semantic similarity estimation for domain specific data using BERT and other techniques0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Similarity = Value? Consultation Value Assessment and Alignment for Personalized Search0
InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking0
GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion0
FindMeIfYouCan: Bringing Open Set metrics to near , far and farther Out-of-Distribution Object Detection0
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language ModelsCode1
Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation0
Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic SimilarityCode0
Statistical Hypothesis Testing for Auditing Robustness in Language Models0
Conservative Bias in Large Language Models: Measuring Relation Predictions0
Denoising Programming Knowledge Tracing with a Code Graph-based Tuning Adaptor0
KNN-Defense: Defense against 3D Adversarial Point Clouds using Nearest-Neighbor SearchCode0
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance0
MCP-Zero: Active Tool Discovery for Autonomous LLM Agents0
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response TheoryCode1
VUDG: A Dataset for Video Understanding Domain Generalization0
GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training0
Category-aware EEG image generation based on wavelet transform and contrast semantic lossCode0
PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-EncoderCode0
Label-Guided In-Context Learning for Named Entity RecognitionCode1
Document Valuation in LLM Summaries: A Cluster Shapley Approach0
Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging0
LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach0
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEsCode0
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary GiantsCode1
Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-AnsweringCode0
CrosGrpsABS: Cross-Attention over Syntactic and Semantic Graphs for Aspect-Based Sentiment Analysis in a Low-Resource Language0
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation0
Smoothie: Smoothing Diffusion on Token Embeddings for Text GenerationCode1
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected VulnerabilityCode0
Omni TM-AE: A Scalable and Interpretable Embedding Model Using the Full Tsetlin Machine State Space0
Automated Feedback Loops to Protect Text Simplification with Generative AI from Information Loss0
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based MethodsCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association0
Language Specific Knowledge: Do Models Know Better in X than in English?0
Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image ColorizationCode0
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM HallucinationsCode0
R2MED: A Benchmark for Reasoning-Driven Medical RetrievalCode1
Show:102550
← PrevPage 1 of 48Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SMARTRoBERTaDev Pearson Correlation92.8Unverified
2DeBERTa (large)Accuracy92.5Unverified
3SMART-BERTDev Pearson Correlation90Unverified
4MT-DNN-SMARTPearson Correlation0.93Unverified
5StructBERTRoBERTa ensemblePearson Correlation0.93Unverified
6Mnet-SimPearson Correlation0.93Unverified
7XLNet (single model)Pearson Correlation0.93Unverified
8T5-11BPearson Correlation0.93Unverified
9ALBERTPearson Correlation0.93Unverified
10RoBERTaPearson Correlation0.92Unverified
#ModelMetricClaimedVerifiedStatus
1AnglE-UAESpearman Correlation84.54Unverified
2ST5-XXLSpearman Correlation82.63Unverified
3ST5-LargeSpearman Correlation81.83Unverified
4ST5-XLSpearman Correlation81.66Unverified
5ST5-BaseSpearman Correlation81.14Unverified
6MPNet-multilingualSpearman Correlation80.73Unverified
7SGPT-5.8B-nliSpearman Correlation80.53Unverified
8MPNetSpearman Correlation80.28Unverified
9MiniLM-L12Spearman Correlation79.8Unverified
10SimCSE-BERT-supSpearman Correlation79.12Unverified
#ModelMetricClaimedVerifiedStatus
1MT-DNN-SMARTAccuracy93.7Unverified
2ALBERTAccuracy93.4Unverified
3RoBERTa (ensemble)Accuracy92.3Unverified
4BigBirdF191.5Unverified
5StructBERTRoBERTa ensembleAccuracy91.5Unverified
6FLOATER-largeAccuracy91.4Unverified
7SMARTAccuracy91.3Unverified
8RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)Accuracy91Unverified
9RoBERTa-large 355M + Entailment as Few-shot LearnerF191Unverified
10SpanBERTAccuracy90.9Unverified
#ModelMetricClaimedVerifiedStatus
1PromCSE-RoBERTa-large (0.355B)Spearman Correlation0.82Unverified
2PromptEOL+CSE+LLaMA-30BSpearman Correlation0.82Unverified
3PromptEOL+CSE+OPT-13BSpearman Correlation0.82Unverified
4SimCSE-RoBERTalargeSpearman Correlation0.82Unverified
5PromptEOL+CSE+OPT-2.7BSpearman Correlation0.81Unverified
6SentenceBERTSpearman Correlation0.75Unverified
7SRoBERTa-NLI-baseSpearman Correlation0.74Unverified
8SRoBERTa-NLI-largeSpearman Correlation0.74Unverified
9Dino (STS/̄🦕)Spearman Correlation0.74Unverified
10SBERT-NLI-largeSpearman Correlation0.74Unverified
#ModelMetricClaimedVerifiedStatus
1AnglE-LLaMA-7BSpearman Correlation0.91Unverified
2AnglE-LLaMA-7B-v2Spearman Correlation0.91Unverified
3PromptEOL+CSE+LLaMA-30BSpearman Correlation0.9Unverified
4PromptEOL+CSE+OPT-13BSpearman Correlation0.9Unverified
5PromptEOL+CSE+OPT-2.7BSpearman Correlation0.9Unverified
6PromCSE-RoBERTa-large (0.355B)Spearman Correlation0.89Unverified
7Trans-Encoder-BERT-large-bi (unsup.)Spearman Correlation0.89Unverified
8Trans-Encoder-BERT-large-cross (unsup.)Spearman Correlation0.88Unverified
9Trans-Encoder-RoBERTa-large-cross (unsup.)Spearman Correlation0.88Unverified
10SimCSE-RoBERTa-largeSpearman Correlation0.87Unverified