dc.contributor.author	K.C., Pramir
dc.date.accessioned	2018-01-09T13:17:57Z
dc.date.available	2018-01-09T13:17:57Z
dc.date.issued	2017-12
dc.identifier.citation	K.C., Pramir. "Analysis of Amino Acid Sequence Characteristics of Type I Cluster of Differentiation (CD) Proteins Using Multivariate Statistics to Determine Their Functional Class," Master's thesis, Valdosta State University, 2017. http://hdl.handle.net/10428/2968.
dc.identifier.other	069C4380-E8A7-DA84-45D2-486E8245B9BD	UUID
dc.identifier.uri	http://hdl.handle.net/10428/2968
dc.description.abstract	Cluster of Differentiation (CD) proteins are proteins found in the cell membranes of leukocytes. These proteins are important because they are cell surface markers for many immune cells and can be used as therapeutic and diagnostic targets. Biophysical methods like X-ray crystallography and nuclear magnetic resonance (NMR) are commonly used to determine the function of proteins through the generation of their three-dimensional structures. However, applications of these experimental methods do not work very well in order to determine the function of membrane proteins because of their high flexibility and instability, their partial hydrophobic surface, and the requirement of highly specific detergents for their extraction from phospholipids membranes. In order to address this problem, we devised a theoretical approach where type I CD proteins can be classified into two different functional groups (enzyme and non-enzyme) by using physicochemical parameters related to the primary sequence of the individual CD proteins. Principal component analysis (PCA) was used to analyze 126 parameters of 244 type I CD proteins. Two different clusters of type I CD proteins with enzymatic activity and non-enzymatic activity were found on the score plot, and the separation of those clusters was found to be statistically significant. Cytoplasmic amino acid count was found to be the most important variable for separating enzymes and non- enzymes. The continuous probability densities of CD proteins with enzymatic activity and non-enzymatic activity were then approximated by kernel density estimation (KDE) of cytoplasmic amino acid count. This is the first time this method of determining type I CD proteins functional classes has been employed and appears quite promising. In the future, this statistical approach could be very useful in determining the functional class of newly discovered or poorly characterized type I CD proteins.	en_US
dc.description.tableofcontents	Chapter I: INTRODUCTION 1 \| Chapter II: LITERATURE REVIEW 5 \| Membrane Proteins 5 \| Principal Component Analysis (PCA) 5 \| Parallel Analysis 6 \| Kernel Density Estimation (KDE) 7 \| Chapter III: MATERIALS AND METHODS 8 \| Data Retrieval 8 \| Retrieval of Type I Protein List 8 \| Retrieval of Protein Sequence 8 \| Retrieval of Regional Amino Acid Count 9 \| Total Amino Acid Count Calculation 9 \| Retrieval of Charges 9 \| Retrieval of Number of Amino Acids 10 \| Retrieval of Theoretical Isoelectric Point (pI) 10 \| Retrieval of Instability Index 10 \| Retrieval of Aliphatic Index 10 \| Retrieval of Grand Average of Hydropathicity 11 \| Retrieval of Glycosylation Site for Extracellular Region 11 \| Retrieval of Phosphorylation Site for Cytoplasmic Domain 11 \| Retrieval of Secondary Structure Content 11 \| Retrieval of Disorder Average and Standard Deviation 12 \| Determination of Function (Enzyme or Non-Enzyme) 12 \| Statistical Analysis 13 \| Principal Component Analysis (PCA) 13 \| Parallel Analysis to Determine the Number of Principal Components to be Retained \| 13 \| Assessment of Statistical Significance of Separation of Enzymes and Non-Enzymes from PCA Data 13 \| Matrix Plot of Scores Values 15 \| Wilcoxon Ranked-Sum Test and Kernel Density Estimation (KDE) Plot 15 \| Chapter IV: RESULTS 17 \| Principal Component Analysis 17 \| Scree Plot 17 \| Horn’s Parallel Analysis 18 \| Matrix Plot for First 10 PCs Score Values 20 \| Separation of Enzyme (E) and Non-enzyme (NE) clusters 21 \| Loadings Plot for PC1 22 \| Loadings Plot for PC2 23 \| Statistical Difference and Kernel Density Estimation (KDE) 25 \| Chapter V: DISCUSSION 27 \| Prediction of Type I CD Proteins Functional Class Based on Their Cytoplasmic Amino Acid Counts 27 \| REFERENCES 30 \| APPENDIX A: Mahalnobis Distance, Hotelling’s T-squared Values and F-statistics for each Combinations of Principle Components for PC1 to PC10. 35 \| APPENDIX B: Loadings Values of PC1 to PC10 for 126 Different Physicochemical Properties 39 \| APPENDIX C: Wilcoxon Ranked-Sum Test for Cytoplasmic Amino Acid Count for Enzymes and Non-enzymes for Type I CD Proteins 49 \| LIST OF FIGURES \| Figure 1. Scree plot of physicochemical characteristics of Type I CD proteins 18 \| Figure 2. Horn’s Parallel Analysis plot to determine the optimal number of PCs retained. PCs with adjusted EV greater than 1 were retained 19 \| Figure 3. Matrix plot of score values for PC1 to PC10. All the score plots that involved PC2 yielded the greatest amount of separation between enzymes and non- enzymes 21 \| Figure 4. Loadings plot for 126 protein characteristics using PC1. 23 \| Figure 5. Loadings plot for 126 variables using PC1. 24 \| Figure 6. Kernel density estimation for enzymes and non-enzymes 26 \|	en_US
dc.language.iso	en_US	en_US
dc.subject	Proteins	en_US
dc.subject	Bioinformatics	en_US
dc.subject	Biometry	en_US
dc.subject	Protein Function Prediction	en_US
dc.subject	Cluster of differentiation	en_US
dc.title	Analysis of Amino Acid Sequence Characteristics of Type I Cluster of Differentiation (CD) Proteins Using Multivariate Statistics to Determine Their Functional Class	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Biology of the College of Arts and Sciences	en_US
dc.description.advisor	Kang, Jonghoon
dc.description.committee	Anderson, Corey
dc.description.committee	Gosnell, Donna
dc.description.degree	M.S.	en_US
dc.description.major	Biology	en_US