SOTAVerified

Text Clustering

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Papers

Showing 150 of 123 papers

TitleStatusHype
MTEB: Massive Text Embedding BenchmarkCode4
ComStreamClust: a communicative multi-agent approach to text clustering in streaming dataCode1
DeepLens: Interactive Out-of-distribution Data Detection in NLP ModelsCode1
EASE: Entity-Aware Contrastive Learning of Sentence EmbeddingCode1
Discovering New Intents with Deep Aligned ClusteringCode1
ClusterLLM: Large Language Models as a Guide for Text ClusteringCode1
Neural Topic Modeling with Bidirectional Adversarial TrainingCode1
Enhancement of Short Text Clustering by Iterative ClassificationCode1
Short Text Clustering via Convolutional Neural NetworksCode1
Dissimilarity Mixture Autoencoder for Deep ClusteringCode1
Training Effective Neural Sentence Encoders from Automatically Mined ParaphrasesCode1
Text Clustering as Classification with LLMsCode1
Proposition-Level Clustering for Multi-Document SummarizationCode1
Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text ClusteringCode1
Proposition-Level Clustering for Multi-Document SummarizationCode1
Large Language Models Enable Few-Shot ClusteringCode1
Supporting Clustering with Contrastive LearningCode1
Clustering Urdu News Using HeadlinesCode0
Subspace Co-clustering with Two-Way Graph ConvolutionCode0
Very Large Language Model as a Unified Methodology of Text MiningCode0
Efficient Sparse Spherical k-Means for Document ClusteringCode0
Task-Oriented Clustering for DialoguesCode0
Discriminative Representation learning via Attention-Enhanced Contrastive Learning for Short Text ClusteringCode0
Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster RefinementCode0
On the Use of ArXiv as a DatasetCode0
Clustering Similar Amendments at the Italian SenateCode0
Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text ClusteringCode0
Self-Taught Convolutional Neural Networks for Short Text ClusteringCode0
More Discriminative Sentence Embeddings via Semantic Graph SmoothingCode0
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward PassCode0
Learn The Big Picture: Representation Learning for ClusteringCode0
Guiding Sentiment Analysis with Hierarchical Text Clustering: Analyzing the German X/Twitter Discourse on Face Masks in the 2020 COVID-19 PandemicCode0
Human-interpretable clustering of short-text using large language modelsCode0
A Self-Training Approach for Short Text ClusteringCode0
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"Code0
Influence of various text embeddings on clustering performance in NLPCode0
NeurCAM: Interpretable Neural Clustering via Additive ModelsCode0
Translation Transformers Rediscover Inherent Data DomainsCode0
ClusTop: An unsupervised and integrated text clustering and topic extraction framework0
Clustering tweets usingWikipedia concepts0
An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles0
A Method of Accounting Bigrams in Topic Models0
Clustering-Induced Generative Incomplete Image-Text Clustering (CIGIT-C)0
An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering0
Cluster Analysis of Online Mental Health Discourse using Topic-Infused Deep Contextualized Representations0
Effects of Creativity and Cluster Tightness on Short Text Clustering Performance0
CLTC: A Chinese-English Cross-lingual Topic Corpus0
An enhanced Teaching-Learning-Based Optimization (TLBO) with Grey Wolf Optimizer (GWO) for text feature selection and clustering0
A Graph-based Text Similarity Measure That Employs Named Entity Information0
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ST5-XXLV-Measure43.71Unverified
2MPNetV-Measure43.69Unverified
3GTR-XXLV-Measure42.42Unverified
4MiniLM-L6V-Measure42.35Unverified
5ST5-XLV-Measure42.34Unverified
6MiniLM-L12V-Measure41.81Unverified
7ST5-LargeV-Measure41.65Unverified
8GTR-LargeV-Measure41.6Unverified
9GTR-XLV-Measure41.51Unverified
10ContrieverV-Measure41.1Unverified
#ModelMetricClaimedVerifiedStatus
1G-BATAccuracy41.25Unverified
2BATAccuracy35.66Unverified
#ModelMetricClaimedVerifiedStatus
1Vector Space ModelRelated Headlines85Unverified