Abstract:
Cluster of Differentiation (CD) proteins are proteins found in the cell membranes of leukocytes. These proteins are important because they are cell surface markers for many immune cells and can be used as therapeutic and diagnostic targets. Biophysical methods like X-ray crystallography and nuclear magnetic resonance (NMR) are commonly used to determine the function of proteins through the generation of their three-dimensional structures. However, applications of these experimental methods do not work very well in order to determine the function of membrane proteins because of their high flexibility and instability, their partial hydrophobic surface, and the requirement of highly specific detergents for their extraction from phospholipids membranes. In order to address this problem, we devised a theoretical approach where type I CD proteins can be classified into two different functional groups (enzyme and non-enzyme) by using physicochemical parameters related to the primary sequence of the individual CD proteins. Principal component analysis (PCA) was used to analyze 126 parameters of 244 type I CD proteins. Two different clusters of type I CD proteins with enzymatic activity and non-enzymatic activity were found on the score plot, and the separation of those clusters was found to be statistically significant. Cytoplasmic amino acid count was found to be the most important variable for separating enzymes and non- enzymes. The continuous probability densities of CD proteins with enzymatic activity and non-enzymatic activity were then approximated by kernel density estimation (KDE) of cytoplasmic amino acid count. This is the first time this method of determining type I CD proteins functional classes has been employed and appears quite promising. In the future, this statistical approach could be very useful in determining the functional class of newly discovered or poorly characterized type I CD proteins.