Determining Academic, Background, and Financial Predictors of Community College First Year Retention using Data Mining Techniques

Show simple item record

dc.contributor.author Pace, Camille Gasaway
dc.coverage.spatial United States en_US
dc.date.accessioned 2021-07-27T19:32:46Z
dc.date.available 2021-07-27T19:32:46Z
dc.date.issued 2021-06
dc.identifier.other 72062E2E-5BF2-EDB5-4669-4A0F026F28B7 en_US
dc.identifier.uri https://hdl.handle.net/10428/4935
dc.description.abstract Even with extensive retention research dating from the 1960s, community colleges still struggle to identify the reasons why students do not return to college. Data mining has allowed these retention models to evolve to identify new patterns among student populations and variables. The purpose of this study was to create a predictive model for student retention using background, academic, and financial factors serving as a guide for other community colleges to use when investigating institutional retention. Four different data mining models (neural networks, random forest trees, support vector machines, and logistic regression) identified significant factors for retention. The models were compared to identify if one outperformed the others on five different evaluation metrics. The number of credit hours was consistently the most important variable in retention. In addition, the interactions between the number of credit hours, GPA, and financial aid variables were significant in student retention in their first year. The interaction between GPA, financial aid variables, and the number of remedial hours was also crucial for the first-year retention. There were no consistent variables among the retention models that can predict students' nonretention in the first year of their college career. Many background predictors (age, gender, race, or ethnicity) were not significant in predicting retained or nonretained students. The comparison of the retention models found the random forest model had the best performance for accurately classifying the nonretained and retained students overall and the retained students individually. Keywords: Retention, Community College, Data Mining, Academic Factors, Background Factors, Financial Factors en_US
dc.description.tableofcontents Chapter I: INTRODUCTION 1 -- Statement of the Problem 6 -- Purpose of the Study 6 -- Research Questions 7 -- Research Methodology 7 -- Significance of the Study 10 -- Theoretical Basis of the Study 11 -- Limitations of the Study 13 -- Definition of Terms 14 -- Organization of the Study 17 -- Chapter II: LITERATURE REVIEW 19 -- Community College Populations 20 -- Community College Enrollment Trends 22 -- Community College Funding 23 -- Community College Retention 24 -- Bean and Metzner’s Retention Model 25 -- Importance of Individualized Retention Models 27 -- Retention Variables 28 -- Background Variables 28 -- Age 29 -- Gender 31 -- Race or Ethnicity 33 -- High School GPA 36 -- Academic Factors 37 -- College GPA 38 -- Online Courses 40 -- Remedial Courses 44 -- Number of Courses Completed 47 -- Financial Aid Factors 50 -- Amount of Financial Aid Awarded 52 -- Amount of Financial Aid Paid 53 -- FASFA Completion 55 -- Introduction of Data Science and Big Data 59 -- Data Mining 61 -- Educational Data Mining 62 -- Classifiers 63 -- Cross Validation Methods 64 -- Decision Trees and Random Forest Trees 65 -- Support Vector Machines (SVM) 68 -- Neural Network 72 -- Logistic Regression 74 -- Interpretation of Binary Classifier Models 76 -- Evaluation Metrics for Comparing Classifier Models 77 -- Accuracy, Sensitivity, and Specificity 78 -- F1-Scores 78 -- Receiver Operating Characteristic (ROC) Curves 78 -- Validation of Evaluation Metrics 80 -- Summary 80 -- Chapter III: METHODOLOGY 83 -- Research Design 83 -- Participants 85 -- Instrumentation 87 -- Data Collection 88 -- Data Analysis 88 -- Inferential Statistics 91 -- Random Forest 92 -- Supported Vector Machine (SVM) 93 -- Neural Network 93 -- Logistic Regression 93 -- Summary 95 -- Chapter IV: RESULTS 97 -- Demographic Characteristics for Individual Cohorts 98 -- Descriptive Statistics for Students 100 -- Correlation Coefficients for Students 102 -- Categorical Variable Analysis of Combined Cohorts 104 -- Missing Data Analysis of Combined Cohorts 105 -- Cross Validation Method 106 -- Outliers and Normality of Combined Cohorts 106 -- Outlier Capping, Transformation, and Normalization 108 -- Research Question 1 111 -- Random Forest 112 -- Support Vector Machine with Polynomial Kernel 118 -- Support Vector Machine with Radial Kernel 125 -- Neural Network 132 -- Logistic Regression 139 -- Comparison of Variable Importance 150 -- Research Question 2 151 -- Random Forest 153 -- Support Vector Machine with Polynomial Kernel 155 -- Support Vector Machine with Radial Kernel 157 -- Neural Network 159 -- Logistic Regression 161 -- Overall Model Comparison with ROC Curves 163 -- Inferential Tests for Model Comparison 165 -- Summary 173 -- Chapter V: SUMMARY, DISCUSSION, and CONCLUSIONS 176 -- Overview of the Study 177 -- Related Literature 177 -- Classification Models 178 -- Individual and Sector-based Models 178 -- Predictive Factors 178 -- Methodology 180 -- Participants 181 -- Variables Studied 181 -- Background Factors 181 -- Academic Factors 182 -- Financial Factors 183 -- Procedures 183 -- Summary of Findings 184 -- Research Question 1 184 -- Research Question 2 189 -- Discussion of Findings 191 -- Research Question 1 191 -- Research Question 2 193 -- Limitations of the Study 194 -- Implications for Future Research 197 -- Conclusions 198 -- REFERENCES 201 -- APPENDIX A: R Code for Modeling Building and Variable Importance 230 -- APPENDIX B: R Code for Inferential Statistics Tests 259 -- APPENDIX C: Institutional Review Board Protocol Exemption Report 264 -- APPENDIX D: Data Sharing Agreement 266 en_US
dc.format.extent 1 electronic document, 287 pages en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en_US en_US
dc.rights This dissertation is protected by the Copyright Laws of the United States (Public Law 94-553, revised in 1976). Consistent with fair use as defined in the Copyright Laws, brief quotations from this material are allowed with proper acknowledgement. Use of the materials for financial gain with the author's expressed written permissions is not allowed. en_US
dc.subject Dissertations, Academic--United States en_US
dc.subject College dropouts--Prevention en_US
dc.subject Community colleges en_US
dc.subject Data mining en_US
dc.title Determining Academic, Background, and Financial Predictors of Community College First Year Retention using Data Mining Techniques en_US
dc.type Dissertation en_US
dc.contributor.department Department of Curriculum, Leadership, and Technology of the Dewar College of Education and Human Services en_US
dc.description.advisor Brockmeier, Lantry L.
dc.description.committee Bochenko, Michael J.
dc.description.committee Kim, Daesang
dc.description.degree Ed.D. en_US
dc.description.major Education in Leadership en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Vtext


Advanced Search

Browse

My Account