TY - JOUR
T1 - On modeling and utilizing chemical compound information with deep learning technologies
T2 - A task-oriented approach
AU - Lim, Sangsoo
AU - Lee, Sangseon
AU - Piao, Yinhua
AU - Choi, Min Gyu
AU - Bang, Dongmin
AU - Gu, Jeonghyeon
AU - Kim, Sun
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2022/1
Y1 - 2022/1
N2 - A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
AB - A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
KW - Chemical information modeling
KW - Chemical space
KW - Computer-aided drug discovery
KW - Data augmentation
KW - Deep learning
UR - http://www.scopus.com/inward/record.url?scp=85135895925&partnerID=8YFLogxK
U2 - 10.1016/j.csbj.2022.07.049
DO - 10.1016/j.csbj.2022.07.049
M3 - Review article
AN - SCOPUS:85135895925
SN - 2001-0370
VL - 20
SP - 4288
EP - 4304
JO - Computational and Structural Biotechnology Journal
JF - Computational and Structural Biotechnology Journal
ER -