SOTAVerified

Fuse and Adapt: Investigating the Use of Pre-Trained Self-Supervising Learning Models in Limited Data NLU problems

2022-12-02UOA Thesis 2022Unverified0· sign in to hype

S Siriwardhana

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Deep Learning (DL) has become a key element in the field of Artificial Intelligence (AI) over the last decade. DL has improved many applications related to different fields of Natural Language Processing (NLP), Computer Vision (CV), Speech Recognition (SR), and Reinforcement Learning (RL). A significant challenge in applying DL in most domains is the scarcity of labelled data. Usually, DL models need large amounts of annotated data to train models. The novel paradigm of Self Supervised Learning (SSL) has become a gamechanger in the field of Deep Learning due to its ability to answer the problem of scarcity of labeled data. SSL can utilize commonly available unlabelled data to large DL architectures. SSL usually consists of two stages, the pre-training phase, and the downstream phase. The pre-training phase usually needs a large amount of unlabelled data, and it is a computationally expensive process that could cost hundreds of thousands of dollars to millions. But it has become a common practice to open-source the model checkpoints of pre-trained SSL models that can represent different modalities of data such as text, vision, and speech. Usually, expensive and valuable pre-trained SSL models get open-sourced from tech giants like Google, Meta, Amazon, Nvidia, and Microsoft. These Pre-trained models have become a vital part of research communities and the industry due to their effectiveness in solving many downstream tasks. In this thesis, my focus is on exploring the utilization of pre-trained SSL models in the field of Natural Language Understanding (NLU). NLU is the ability of machines to understand human language. It has enabled many practical applications such as Emotion Recognition, Sentiment Analysis, Summarization, and Question Answering. This thesis mainly explores the utilization of pre-trained SSL models in the two main areas of multimodal fusion and domain adaptation. Under these two main topics, I explored four research question that introduces novel fusion and adaptation techniques.

Tasks

Reproductions