Protoformer: Embedding Prototypes for Transformers

2022-06-25PAKDD 2022: Advances in Knowledge Discovery and Data Mining 2022Code Available1· sign in to hype

Ashkan Farhangi, Ning Sui, Nan Hua, Haiyan Bai, Arthur Huang, Zhishan Guo

Code Available — Be the first to reproduce this paper.

Code

github.com/ashfarhangi/Protoformer
OfficialIn paperpytorch★ 28
github.com/EthanCoder24/Protofomer-NLP
pytorch★ 7
github.com/GitF82/NLP-Embeddings-Protoformer-Paper
pytorch★ 3
github.com/codelion/adaptive-classifier/blob/main/src/adaptive_classifier/memory.py
pytorch★ 0

Abstract

Transformers have been widely applied in text classification. Unfortunately, real-world data contain anomalies and noisy labels that cause challenges for state-of-art Transformers. This paper proposes Protoformer, a novel self-learning framework for Transformers that can leverage problematic samples for text classification. Protoformer features a selection mechanism for embedding samples that allows us to efficiently extract and utilize anomalies prototypes and difficult class prototypes. We demonstrated such capabilities on datasets with diverse textual structures (e.g., Twitter, IMDB, ArXiv). We also applied the framework to several models. The results indicate that Protoformer can improve current Transformers in various empirical settings.

Tasks

Classification General Classification Language Modelling Large Language Model Learning with noisy labels Self-Learning Sentiment Classification Text Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
arXiv-10	Protoformer	Accuracy	0.79	—	Unverified

Protoformer: Embedding Prototypes for Transformers

Code

Abstract

Tasks

Benchmark Results

Reproductions