Abstract
Techniques for analyzing genome sequences in high performance environments to predict the function and structure of a protein have been developing. The function of a protein is determined by its characteristics and the sequence pattern, and a protein is classified as belonging to a family according to its genealogy and structure. This study determines the protein family of unknown proteins by analyzing the sequence database of the proteins, which is classified using a clustering algorithm. The analysis of the experimental clustering results verified that, by applying the proposed pf_cluster algorithm, the protein family of new proteins can be found using their sequence information.
Original language | English |
---|---|
Pages (from-to) | 1878-1896 |
Number of pages | 19 |
Journal | Journal of Supercomputing |
Volume | 72 |
Issue number | 5 |
DOIs | |
State | Published - 1 May 2016 |
Keywords
- Clustering algorithm
- High performance
- Protein clustering
- Protein family