Abstract
Techniques for analyzing genome sequences in high performance environments to predict the function and structure of a protein have been developing. The function of a protein is determined by its characteristics and the sequence pattern, and a protein is classified as belonging to a family according to its genealogy and structure. This study determines the protein family of unknown proteins by analyzing the sequence database of the proteins, which is classified using a clustering algorithm. The analysis of the experimental clustering results verified that, by applying the proposed pf_cluster algorithm, the protein family of new proteins can be found using their sequence information.
| Original language | English |
|---|---|
| Pages (from-to) | 1878-1896 |
| Number of pages | 19 |
| Journal | Journal of Supercomputing |
| Volume | 72 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2016 |
Keywords
- Clustering algorithm
- High performance
- Protein clustering
- Protein family