TY - JOUR
T1 - Intent aware data augmentation by leveraging generative AI for stress detection in social media texts
AU - Saleem, Minhah
AU - Kim, Jihie
N1 - Publisher Copyright:
© (2024) Saleem and Kim.
PY - 2024
Y1 - 2024
N2 - Stress is a major issue in modern society. Researchers focus on identifying stress in individuals, linking language with mental health, and often utilizing social media posts. However, stress classification systems encounter data scarcity issues, necessitating data augmentation. Approaches like Back-Translation (BT), Easy Data Augmentation (EDA), and An Easier Data Augmentation (AEDA) are common. But, recent studies show the potential of generative AI, notably ChatGPT. This article centers on stress identification using the DREADDIT dataset and A Robustly Optimized BERT Pretraining Approach (RoBERTa) transformer, emphasizing the use of generative AI for augmentation. We propose two ChatGPT prompting techniques: same-intent and opposite-intent 1-shot intent-aware data augmentation. Same-intent prompts yield posts with similar topics and sentiments, while opposite-intent prompts produce posts with contrasting sentiments. Results show a 2% and 3% performance increase for opposing and same sentiments, respectively. This study pioneers intent-based data augmentation for stress detection and explores advanced mental health text classification methods with generative AI. It concludes that data augmentation has limited benefits and highlights the importance of diverse Reddit data and further research in this field.
AB - Stress is a major issue in modern society. Researchers focus on identifying stress in individuals, linking language with mental health, and often utilizing social media posts. However, stress classification systems encounter data scarcity issues, necessitating data augmentation. Approaches like Back-Translation (BT), Easy Data Augmentation (EDA), and An Easier Data Augmentation (AEDA) are common. But, recent studies show the potential of generative AI, notably ChatGPT. This article centers on stress identification using the DREADDIT dataset and A Robustly Optimized BERT Pretraining Approach (RoBERTa) transformer, emphasizing the use of generative AI for augmentation. We propose two ChatGPT prompting techniques: same-intent and opposite-intent 1-shot intent-aware data augmentation. Same-intent prompts yield posts with similar topics and sentiments, while opposite-intent prompts produce posts with contrasting sentiments. Results show a 2% and 3% performance increase for opposing and same sentiments, respectively. This study pioneers intent-based data augmentation for stress detection and explores advanced mental health text classification methods with generative AI. It concludes that data augmentation has limited benefits and highlights the importance of diverse Reddit data and further research in this field.
KW - Data augmentation
KW - Generative AI
KW - Mental health
KW - Natural language understanding
KW - Pre-trained language models
KW - Sentiment analysis
KW - Stress detection
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85199035507&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.2156
DO - 10.7717/peerj-cs.2156
M3 - Article
AN - SCOPUS:85199035507
SN - 2376-5992
VL - 10
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e2156
ER -