TY - JOUR
T1 - Interpretable attention-based multi-encoder transformer based QSPR model for assessing toxicity and environmental impact of chemicals
AU - Kim, Sang Youn
AU - Tariq, Shahzeb
AU - Heo, Sung Ku
AU - Yoo, Chang Kyoo
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/2
Y1 - 2024/2
N2 - The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure–property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable ‘black box’ feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact.
AB - The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure–property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable ‘black box’ feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact.
KW - Attention mechanism
KW - Molecular descriptor
KW - Multi-encoder transformer structure
KW - Quantitative structure-activity/property relationship
KW - SMILES
UR - http://www.scopus.com/inward/record.url?scp=85182516580&partnerID=8YFLogxK
U2 - 10.1016/j.chemosphere.2023.141086
DO - 10.1016/j.chemosphere.2023.141086
M3 - Article
C2 - 38163464
AN - SCOPUS:85182516580
SN - 0045-6535
VL - 350
JO - Chemosphere
JF - Chemosphere
M1 - 141086
ER -