Predictive Modeling for Loan Eligibility Assessment: A Comparative Study of Logistic Regression, Random Forest, and Support Vector Machine with Detailed Oversampling

Authors

  • Jamel Pandiin
  • Junrie Matias

Keywords:

Predictive Modeling, Loan Eligibility, Genetic Algorithms, Feature Selection

Abstract

This study compares the predictive modeling techniques for loan eligibility assessment, comparing Logistic Regression, Random Forest, and Support Vector Machine (SVM) with detailed oversampling and feature selection methods. Using a Kaggle dataset, various feature selection techniques, including Correlation-Based Selection, Recursive Feature Elimination (RFE), and Lasso Regression was employed for feature selection before applying it to three classifiers: Random Forest, Logistic Regression, and Support Vector Machine (SVM), were optimized through Genetic Algorithms (GA). Performance metrics, including accuracy, precision, recall, and F1-score, alongside cross-validation, were employed for model evaluation. Random Forest achieved the highest performance with an accuracy of 85%, precision of 86%, recall of 84%, and F1-score of 85%. Cross validation results for Random Forest averaged 92%, demonstrating consistent robustness. Feature importance analysis identified Credit History (26.8%), Applicant Income (19.7%), and Loan Amount (19.2%) as the most influential factors, while demographic attributes like Gender and Education had minimal impact. SVM excelled in recall (99%) but showed moderate accuracy (71%) and lower precision (63%), reflecting challenges in minimizing false positives. Logistic Regression exhibited consistent yet lower accuracy (67%) and struggled to model complex relationships in the dataset. The findings highlight Random Forest's strength in delivering balanced predictions, making it the most suitable model for fairness and risk management in loan approvals. Practical deployment via a user-friendly web application demonstrated the usability of machine learning models for operational efficiency in financial institutions. This research advocates for integrating Genetic Algorithms with machine learning for enhanced predictive modeling, ensuring precise, efficient, and fair decision making processes.

Downloads

Published

2025-08-19

How to Cite

Pandiin, J., & Matias, J. (2025). Predictive Modeling for Loan Eligibility Assessment: A Comparative Study of Logistic Regression, Random Forest, and Support Vector Machine with Detailed Oversampling. Advances in Engineering and Information Sciences, 1(1), 13–24. Retrieved from https://journals.carsu.edu.ph/AEIS/article/view/178