Intent aware data augmentation by leveraging generative AI for stress detection in social media texts

Minhah Saleem, Jihie Kim

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Stress is a major issue in modern society. Researchers focus on identifying stress in individuals, linking language with mental health, and often utilizing social media posts. However, stress classification systems encounter data scarcity issues, necessitating data augmentation. Approaches like Back-Translation (BT), Easy Data Augmentation (EDA), and An Easier Data Augmentation (AEDA) are common. But, recent studies show the potential of generative AI, notably ChatGPT. This article centers on stress identification using the DREADDIT dataset and A Robustly Optimized BERT Pretraining Approach (RoBERTa) transformer, emphasizing the use of generative AI for augmentation. We propose two ChatGPT prompting techniques: same-intent and opposite-intent 1-shot intent-aware data augmentation. Same-intent prompts yield posts with similar topics and sentiments, while opposite-intent prompts produce posts with contrasting sentiments. Results show a 2% and 3% performance increase for opposing and same sentiments, respectively. This study pioneers intent-based data augmentation for stress detection and explores advanced mental health text classification methods with generative AI. It concludes that data augmentation has limited benefits and highlights the importance of diverse Reddit data and further research in this field.

Original languageEnglish
Article numbere2156
JournalPeerJ Computer Science
Volume10
DOIs
StatePublished - 2024

Keywords

  • Data augmentation
  • Generative AI
  • Mental health
  • Natural language understanding
  • Pre-trained language models
  • Sentiment analysis
  • Stress detection
  • Text classification

Fingerprint

Dive into the research topics of 'Intent aware data augmentation by leveraging generative AI for stress detection in social media texts'. Together they form a unique fingerprint.

Cite this