Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing

Md. Anwar Hussen Wadud; M. F.  Mridha; Mohammad Motiur  Rahman

doi:10.24996/ijs.2022.63.3.37

Authors

Md. Anwar Hussen Wadud Department of Computer Science and Engineering https://orcid.org/0000-0002-7344-0838
M. F. Mridha Department of Computer Science and Engineering - Bangladesh University of Business and Technology, Dhaka, Bangladesh https://orcid.org/0000-0001-5738-1631
Mohammad Motiur Rahman Department of Computer Science and Engineering - Mawlana Bhashani Science and Technology University, Tangail, Bangladesh https://orcid.org/0000-0003-4417-8276

DOI:

https://doi.org/10.24996/ijs.2022.63.3.37

Keywords:

Word embedding, NLP, FastText, Deep Learning, local and pretrained word vector

Abstract

Natural Language Processing (NLP) deals with analysing, understanding and generating languages likes human. One of the challenges of NLP is training computers to understand the way of learning and using a language as human. Every training session consists of several types of sentences with different context and linguistic structures. Meaning of a sentence depends on actual meaning of main words with their correct positions. Same word can be used as a noun or adjective or others based on their position. In NLP, Word Embedding is a powerful method which is trained on large collection of texts and encoded general semantic and syntactic information of words. Choosing a right word embedding generates more efficient result than others. Most of the papers used pretrained word embedding vector in deep learning for NLP processing. But, the major issue of pretrained word embedding vector is that it can’t use for all types of NLP processing. In this paper, a local word embedding vector formation process have been proposed and shown a comparison between pretrained and local word embedding vectors for Bengali language. The Keras framework is used in Python for local word embedding implementation and analysis section of this paper shows proposed model produced 87.84% accuracy result which is better than fastText pretrained word embedding vectors accuracy 86.75%. Using this proposed method NLP researchers of Bengali language can easily build the specific word embedding vectors for word representation in Natural Language Processing.