Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

2018-06-01WS 2018Unverified0· sign in to hype

Hrishikesh Ganu, Viswa Datha P.

Unverified — Be the first to reproduce this paper.

Abstract

We present early results from a system under development which uses sub-word embeddings for query expansion in presence of mis-spelled words and other aberrations. We work for a company which creates accounting software and the end goal is to improve customer experience when they search for help on our ``Customer Care'' portal. Our customers use colloquial language, non-standard acronyms and sometimes mis-spell words when they use our Search portal or interact over other channels. However, our Knowledge Base has curated content which leverages technical terms and is in language which is quite formal. This results in the answer not being retrieved even though the answer might actually be present in the documentation (as assessed by a human). We address this problem by creating equivalence classes of words with similar meanings (with the additional property that the mappings to these equivalence classes are robust to mis-spellings) using sub-word embeddings and then use them to fine tune an Elasticsearch index to improve recall. We demonstrate through an end-end system that using sub-word embeddings leads to a significant lift in correct answers retrieved for an accounting corpus available in the public domain.

Tasks

Information Retrieval Lemmatization Word Embeddings

Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

Abstract

Tasks

Reproductions