| Leveraging BERT Language Model for Arabic Long Document Classification | May 4, 2023 | ClassificationDocument Classification | —Unverified | 0 |
| A New Information Theory of Certainty for Machine Learning | Apr 25, 2023 | Document Classification | —Unverified | 0 |
| HeRo: RoBERTa and Longformer Hebrew Language Models | Apr 18, 2023 | Document ClassificationLanguage Modeling | —Unverified | 0 |
| Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding | Apr 9, 2023 | Document Classificationnamed-entity-recognition | CodeCode Available | 1 |
| Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy | Apr 5, 2023 | ArticlesBias Detection | CodeCode Available | 0 |
| A semi-automatic method for document classification in the shipping industry | Mar 29, 2023 | ClassificationDocument Classification | —Unverified | 0 |
| Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers | Mar 14, 2023 | Document ClassificationLanguage Modeling | —Unverified | 0 |
| Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling | Feb 28, 2023 | Document Classification | —Unverified | 0 |
| Elementwise Language Representation | Feb 27, 2023 | Document ClassificationSpecificity | —Unverified | 0 |
| MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures | Feb 11, 2023 | ArticlesDocument Classification | CodeCode Available | 0 |
| Bioformer: an efficient transformer language model for biomedical text mining | Feb 3, 2023 | ArticlesDocument Classification | CodeCode Available | 1 |
| A Comparative Study of Pretrained Language Models for Long Clinical Text | Jan 27, 2023 | Clinical KnowledgeDocument Classification | CodeCode Available | 1 |
| FewShotTextGCN: K-hop neighborhood regularization for few-shot learning on graphs | Jan 25, 2023 | Document ClassificationFew-Shot Learning | —Unverified | 0 |
| ClassBases at CASE-2022 Multilingual Protest Event Detection Tasks: Multilingual Protest News Detection and Automatically Replicating Manually Created Event Datasets | Jan 16, 2023 | ClassificationDocument Classification | CodeCode Available | 0 |
| Multimodal Side-Tuning for Document Classification | Jan 16, 2023 | ClassificationDocument Classification | CodeCode Available | 1 |
| Hawk: An Industrial-strength Multi-label Document Classifier | Jan 15, 2023 | BenchmarkingDocument Classification | —Unverified | 0 |
| Tsetlin Machine Embedding: Representing Words Using Logical Expressions | Jan 2, 2023 | Document ClassificationMachine Translation | CodeCode Available | 1 |
| Human in the loop: How to effectively create coherent topics by manually labeling only a few documents per class | Dec 19, 2022 | Document ClassificationFew-Shot Learning | —Unverified | 0 |
| Generalised Spherical Text Embedding | Nov 30, 2022 | ClusteringDocument Classification | —Unverified | 0 |
| Text Representation Enrichment Utilizing Graph based Approaches: Stock Market Technical Analysis Case Study | Nov 29, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022 | Nov 21, 2022 | Document ClassificationEvent Detection | —Unverified | 0 |
| Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer | Nov 2, 2022 | Document Classification | —Unverified | 0 |
| BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining | Oct 19, 2022 | Document ClassificationLanguage Modelling | CodeCode Available | 4 |
| Evaluating Out-of-Distribution Performance on Document Image Classifiers | Oct 14, 2022 | Document Classification | CodeCode Available | 0 |
| Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics | Oct 12, 2022 | Document ClassificationRetrieval | CodeCode Available | 1 |
| An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification | Oct 11, 2022 | Document ClassificationGPU | CodeCode Available | 0 |
| Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents | Oct 11, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification | Sep 13, 2022 | Document Classificationtext-classification | —Unverified | 0 |
| Flexible Job Classification with Zero-Shot Learning | Aug 30, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| D2GCLF: Document-to-Graph Classifier for Legal Document Classification | Jul 1, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| BL.Research at SemEval-2022 Task 8: Using various Semantic Information to evaluate document-level Semantic Textual Similarity | Jul 1, 2022 | Document ClassificationSemantic Textual Similarity | CodeCode Available | 0 |
| Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding | Jun 27, 2022 | Document Classificationdocument understanding | —Unverified | 0 |
| Supervised Dictionary Learning with Auxiliary Covariates | Jun 14, 2022 | Dictionary LearningDocument Classification | CodeCode Available | 0 |
| ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths | Jun 12, 2022 | ChunkingDocument Classification | CodeCode Available | 1 |
| Knowledge-based Document Classification with Shannon Entropy | Jun 6, 2022 | BIG-bench Machine LearningClassification | —Unverified | 0 |
| LDRNet: Enabling Real-time Document Localization on Mobile Devices | Jun 5, 2022 | Document Classification | CodeCode Available | 1 |
| UMUTextStats: A linguistic feature extraction tool for Spanish | Jun 1, 2022 | Author ProfilingAuthorship Verification | —Unverified | 0 |
| Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification | Jun 1, 2022 | ArticlesClassification | —Unverified | 0 |
| ConvTextTM: An Explainable Convolutional Tsetlin Machine Framework for Text Classification | Jun 1, 2022 | Decision MakingDocument Classification | —Unverified | 0 |
| Approximate Conditional Coverage & Calibration via Neural Model Approximations | May 28, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | May 27, 2022 | 16k4k | CodeCode Available | 6 |
| BabyBear: Cheap inference triage for expensive language models | May 24, 2022 | Document ClassificationNamed Entity Recognition | CodeCode Available | 0 |
| VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification | May 24, 2022 | Document ClassificationDocument Image Classification | —Unverified | 0 |
| Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem | May 4, 2022 | Document ClassificationTraveling Salesman Problem | CodeCode Available | 1 |
| Towards Comprehensive Patent Approval Predictions:Beyond Traditional Document Classification | May 1, 2022 | ClassificationDocument Classification | —Unverified | 0 |
| Revisiting Transformer-based Models for Long Document Classification | Apr 14, 2022 | ClassificationDocument Classification | CodeCode Available | 1 |
| Analysis of Sparse Subspace Clustering: Experiments and Random Projection | Apr 1, 2022 | ClusteringDocument Classification | —Unverified | 0 |
| LinkBERT: Pretraining Language Models with Document Links | Mar 29, 2022 | Document ClassificationLanguage Modeling | CodeCode Available | 2 |
| An Evaluation Dataset for Legal Word Embedding: A Case Study On Chinese Codex | Mar 29, 2022 | Document ClassificationMachine Translation | CodeCode Available | 0 |
| Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking | Mar 28, 2022 | Document ClassificationSentence | —Unverified | 0 |