SOTAVerified

Document Classification

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Papers

Showing 150 of 641 papers

TitleStatusHype
Can Reasoning LLMs Enhance Clinical Document Classification?Code0
Text Chunking for Document Classification for Urban System Management using Large Language ModelsCode0
Evaluating Negative Sampling Approaches for Neural Topic ModelsCode0
Converting Transformers into DGNNs FormCode0
Cross-Entropy Attacks to Language Models via Rare Event SimulationCode0
On Importance of Layer Pruning for Smaller BERT Models and Low Resource Languages0
Data-Driven Self-Supervised Graph Representation LearningCode0
Extreme Multi-label Completion for Semantic Document Labelling with Taxonomy-Aware Parallel Learning0
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models0
Label Errors in the Tobacco3482 DatasetCode0
WordVIS: A Color Worth A Thousand Words0
Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?0
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation LearningCode1
Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks0
Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts0
Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs0
Weakly-supervised diagnosis identification from Italian discharge letters0
Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data0
ChuLo: Chunk-Level Key Information Representation for Long Document ProcessingCode0
Text Classification using Graph Convolutional Networks: A Comprehensive Survey0
Orthogonal Nonnegative Matrix Factorization with the Kullback-Leibler divergenceCode0
Efficient Few-shot Learning for Multi-label Classification of Scientific Documents with Many ClassesCode1
Manual Verbalizer Enrichment for Few-Shot Text Classification0
Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification0
FLAG: Financial Long Document Classification via AMR-based GNNCode0
Document Type Classification using File Names0
On Importance of Pruning and Distillation for Efficient Low Resource NLP0
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword RegularizationCode0
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document ClassificationCode0
AutoML-guided Fusion of Entity and LLM-based Representations for Document ClassificationCode0
Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classificationCode0
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
An Improved Method for Class-specific Keyword Extraction: A Case Study in the German Business RegistryCode0
Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification0
Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine0
SuperGLEBer: German Language Understanding Evaluation BenchmarkCode1
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language ModelsCode3
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification0
Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification0
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark0
Length-Aware Multi-Kernel Transformer for Long Document ClassificationCode0
Improving Long Text Understanding with Knowledge Distilled from Summarization Model0
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification0
Machine Unlearning for Document ClassificationCode0
L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in MarathiCode0
GuideWalk: A Novel Graph-Based Word Embedding for Enhanced Text Classification0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
Developing Healthcare Language Model Embedding Spaces0
Visually Guided Generative Text-Layout Pre-training for Document IntelligenceCode2
NextLevelBERT: Masked Language Modeling with Higher-Level Representations for Long DocumentsCode1
Show:102550
← PrevPage 1 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ApproxRepSetAccuracy97.17Unverified
2REL-RWMD k-NNAccuracy95.61Unverified
3Orthogonalized Soft VSMAccuracy92.65Unverified
4MAGNETF189.9Unverified
5VLAWEF189.3Unverified
6KD-LSTMregF188.9Unverified
7LSTM-reg (single model)F187Unverified
8SCDV-MSF182.71Unverified
#ModelMetricClaimedVerifiedStatus
1ACNetAccuracy83.5Unverified
2LGCNAccuracy83.3Unverified
3GATAccuracy83Unverified
4MoNetAccuracy81.7Unverified
5DeepWalkAccuracy67.2Unverified
#ModelMetricClaimedVerifiedStatus
1BioLinkBERT (large)F188.1Unverified
2NCBI_BERT(large) (P)F187.3Unverified
3SciFive-largeF186.08Unverified
4BioGPTMicro F185.12Unverified
5PubMedBERT uncasedMicro F182.32Unverified
#ModelMetricClaimedVerifiedStatus
1MPAD-pathAccuracy99.59Unverified
2Orthogonalized Soft VSMAccuracy97.73Unverified
3ApproxRepSetAccuracy95.73Unverified
4REL-RWMD k-NNAccuracy95.18Unverified
#ModelMetricClaimedVerifiedStatus
1ApproxRepSetAccuracy94.31Unverified
2Orthogonalized Soft VSMAccuracy93.42Unverified
3REL-RWMD k-NNAccuracy93.03Unverified
#ModelMetricClaimedVerifiedStatus
1ApproxRepSetAccuracy72.6Unverified
2REL-RWMD k-NNAccuracy71.05Unverified
3Orthogonalized Soft VSMAccuracy69.21Unverified
#ModelMetricClaimedVerifiedStatus
1KD-LSTMregF172.9Unverified
2MAGNETF169.6Unverified
#ModelMetricClaimedVerifiedStatus
1REL-RWMD k-NNAccuracy96.85Unverified
2ApproxRepSetAccuracy96.24Unverified
#ModelMetricClaimedVerifiedStatus
1Document Classification Using Importance of SentencesAccuracy54.8Unverified
2LSTM-reg (single model)Accuracy52.8Unverified
#ModelMetricClaimedVerifiedStatus
1ApproxRepSetAccuracy59.06Unverified
2REL-RWMD k-NNAccuracy56.8Unverified
#ModelMetricClaimedVerifiedStatus
1SPECTERF1 (micro)82Unverified
2SciNCLF1 (micro)81.4Unverified
#ModelMetricClaimedVerifiedStatus
1SciNCLF1 (micro)88.7Unverified
2SPECTERF1 (micro)86.4Unverified
#ModelMetricClaimedVerifiedStatus
1ConvTextTMAccuracy91.28Unverified
2HDLTexAccuracy90.93Unverified
#ModelMetricClaimedVerifiedStatus
1ChuLoAccuracy95.38Unverified
#ModelMetricClaimedVerifiedStatus
1ChuLoAccuracy64.4Unverified
#ModelMetricClaimedVerifiedStatus
1MPAD-pathAccuracy89.81Unverified
#ModelMetricClaimedVerifiedStatus
1BilBOWAAccuracy75Unverified
#ModelMetricClaimedVerifiedStatus
1BilBOWAAccuracy86.5Unverified
#ModelMetricClaimedVerifiedStatus
1HDLTexAccuracy86.07Unverified
#ModelMetricClaimedVerifiedStatus
1HDLTexAccuracy76.58Unverified
#ModelMetricClaimedVerifiedStatus
1KD-LSTMregAccuracy69.4Unverified