Abstract
This study examined how to preprocess and transform data efficiently in order to use deep learning techniques in analyzing linguistic data. Researchers interests in deep learning techniques have explosively increased worldwide; however, it is not easy for them to link linguistics to deep learning techniques or algorithms because linguists do not know how and where to begin in using them. Thus, this study provides the general procedure to train data using deep learning algorithms in practice. In particular, for instance, we focused on how to preprocess and transform Tweet data for a sentiment analysis by using deep learning techniques. In addition, we introduced the latest deep learning algorithm, so-called BERT, in the data preprocessing and transformation procedure. The data preprocessing is particularly important because the result from deep learning can significantly vary depending on it. Even though the data preprocessing procedure can differ according to the aim of research, this study tries to introduce the general way that advanced researchers frequently use for deep learning algorithms. This study is expected to lower the barriers in applying deep learning techniques to linguistic data and make it easier for researchers to conduct deep learning research related to linguistics.
Original language | English |
---|---|
Pages (from-to) | 42-63 |
Number of pages | 22 |
Journal | Korean Journal of English Language and Linguistics |
Volume | 2020 |
Issue number | 20 |
DOIs | |
State | Published - 2020 |
Keywords
- Data preprocessing
- Deep learning
- Sentiment analysis
- Transformation