TY - JOUR
T1 - Prediction of Sleep Stages Via Deep Learning Using Smartphone Audio Recordings in Home Environments
T2 - Model Development and Validation
AU - Tran, Hai Hong
AU - Hong, Jung Kyung
AU - Jang, Hyeryung
AU - Jung, Jinhwan
AU - Kim, Jongmok
AU - Hong, Joonki
AU - Lee, Minji
AU - Kim, Jeong Whun
AU - Kushida, Clete A.
AU - Lee, Dongheon
AU - Kim, Daewoo
AU - Yoon, In Young
N1 - Publisher Copyright:
© Hai Hong Tran, Jung Kyung Hong, Hyeryung Jang, Jinhwan Jung, Jongmok Kim, Joonki Hong, Minji Lee, Jeong-Whun Kim, Clete A Kushida, Dongheon Lee, Daewoo Kim, In-Young Yoon. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 01.06.2023. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
PY - 2023
Y1 - 2023
N2 - Background: The growing public interest and awareness regarding the significance of sleep is driving the demand for sleep monitoring at home. In addition to various commercially available wearable and nearable devices, sound-based sleep staging via deep learning is emerging as a decent alternative for their convenience and potential accuracy. However, sound-based sleep staging has only been studied using in-laboratory sound data. In real-world sleep environments (homes), there is abundant background noise, in contrast to quiet, controlled environments such as laboratories. The use of sound-based sleep staging at homes has not been investigated while it is essential for practical use on a daily basis. Challenges are the lack of and the expected huge expense of acquiring a sufficient size of home data annotated with sleep stages to train a large-scale neural network. Objective: This study aims to develop and validate a deep learning method to perform sound-based sleep staging using audio recordings achieved from various uncontrolled home environments. Methods: To overcome the limitation of lacking home data with known sleep stages, we adopted advanced training techniques and combined home data with hospital data. The training of the model consisted of 3 components: (1) the original supervised learning using 812 pairs of hospital polysomnography (PSG) and audio recordings, and the 2 newly adopted components; (2) transfer learning from hospital to home sounds by adding 829 smartphone audio recordings at home; and (3) consistency training using augmented hospital sound data. Augmented data were created by adding 8255 home noise data to hospital audio recordings. Besides, an independent test set was built by collecting 45 pairs of overnight PSG and smartphone audio recording at homes to examine the performance of the trained model. Results: The accuracy of the model was 76.2% (63.4% for wake, 64.9% for rapid-eye movement [REM], and 83.6% for non-REM) for our test set. The macro F1-score and mean per-class sensitivity were 0.714 and 0.706, respectively. The performance was robust across demographic groups such as age, gender, BMI, or sleep apnea severity (accuracy 73.4%-79.4%). In the ablation study, we evaluated the contribution of each component. While the supervised learning alone achieved accuracy of 69.2% on home sound data, adding consistency training to the supervised learning helped increase the accuracy to a larger degree (+4.3%) than adding transfer learning (+0.1%). The best performance was shown when both transfer learning and consistency training were adopted (+7.0%). Conclusions: This study shows that sound-based sleep staging is feasible for home use. By adopting 2 advanced techniques (transfer learning and consistency training) the deep learning model robustly predicts sleep stages using sounds recorded at various uncontrolled home environments, without using any special equipment but smartphones only.
AB - Background: The growing public interest and awareness regarding the significance of sleep is driving the demand for sleep monitoring at home. In addition to various commercially available wearable and nearable devices, sound-based sleep staging via deep learning is emerging as a decent alternative for their convenience and potential accuracy. However, sound-based sleep staging has only been studied using in-laboratory sound data. In real-world sleep environments (homes), there is abundant background noise, in contrast to quiet, controlled environments such as laboratories. The use of sound-based sleep staging at homes has not been investigated while it is essential for practical use on a daily basis. Challenges are the lack of and the expected huge expense of acquiring a sufficient size of home data annotated with sleep stages to train a large-scale neural network. Objective: This study aims to develop and validate a deep learning method to perform sound-based sleep staging using audio recordings achieved from various uncontrolled home environments. Methods: To overcome the limitation of lacking home data with known sleep stages, we adopted advanced training techniques and combined home data with hospital data. The training of the model consisted of 3 components: (1) the original supervised learning using 812 pairs of hospital polysomnography (PSG) and audio recordings, and the 2 newly adopted components; (2) transfer learning from hospital to home sounds by adding 829 smartphone audio recordings at home; and (3) consistency training using augmented hospital sound data. Augmented data were created by adding 8255 home noise data to hospital audio recordings. Besides, an independent test set was built by collecting 45 pairs of overnight PSG and smartphone audio recording at homes to examine the performance of the trained model. Results: The accuracy of the model was 76.2% (63.4% for wake, 64.9% for rapid-eye movement [REM], and 83.6% for non-REM) for our test set. The macro F1-score and mean per-class sensitivity were 0.714 and 0.706, respectively. The performance was robust across demographic groups such as age, gender, BMI, or sleep apnea severity (accuracy 73.4%-79.4%). In the ablation study, we evaluated the contribution of each component. While the supervised learning alone achieved accuracy of 69.2% on home sound data, adding consistency training to the supervised learning helped increase the accuracy to a larger degree (+4.3%) than adding transfer learning (+0.1%). The best performance was shown when both transfer learning and consistency training were adopted (+7.0%). Conclusions: This study shows that sound-based sleep staging is feasible for home use. By adopting 2 advanced techniques (transfer learning and consistency training) the deep learning model robustly predicts sleep stages using sounds recorded at various uncontrolled home environments, without using any special equipment but smartphones only.
KW - deep learning
KW - home environment
KW - respiratory sounds
KW - sleep stages
KW - smartphone
UR - http://www.scopus.com/inward/record.url?scp=85160970220&partnerID=8YFLogxK
U2 - 10.2196/46216
DO - 10.2196/46216
M3 - Article
C2 - 37261889
AN - SCOPUS:85160970220
SN - 1439-4456
VL - 25
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
M1 - e46216
ER -