TY - JOUR
T1 - A review on compound-protein interaction prediction methods
T2 - Data, format, representation and model
AU - Lim, Sangsoo
AU - Lu, Yijingxiu
AU - Cho, Chang Yun
AU - Sung, Inyoung
AU - Kim, Jungwoo
AU - Kim, Youngkuk
AU - Park, Sungjoon
AU - Kim, Sun
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2021/1
Y1 - 2021/1
N2 - There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
AB - There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
KW - Chemical descriptors
KW - Compound-protein interaction
KW - Data representation
KW - Deep learning
KW - Interpretable learning
KW - Machine learning
KW - Pharmacophore discovery
KW - Protein descriptors
UR - http://www.scopus.com/inward/record.url?scp=85102865336&partnerID=8YFLogxK
U2 - 10.1016/j.csbj.2021.03.004
DO - 10.1016/j.csbj.2021.03.004
M3 - Review article
AN - SCOPUS:85102865336
SN - 2001-0370
VL - 19
SP - 1541
EP - 1556
JO - Computational and Structural Biotechnology Journal
JF - Computational and Structural Biotechnology Journal
ER -