Using independent components for estimating logistic regression with high-dimensional multicollinear data: Simulation and application
Keywords:
Dimension reduction, Independent components, Logistic regression, Multicollinearity, Breast cancerAbstract
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. In the presence of multicollinearity among predictor, the estimation of the model parameters is not very accurate and their interpretation in terms of odds ratios may be inaccurate. Another important problem is that usually a large number of predictors are required to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimensions of the data with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum independent components of the original predictors. Breast cancer data is used as real data set. The performance of the proposed independent component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum independent components are compared. We built up a simulation study to illustrate the performance of the model with different regressors, sample size, and correlation among the regressors. Independent component logistic regression compared with principal component logistic regression model and independent component logistic regression gives better results.
Downloads
References
Agyekum, G. O., Adarkwa, S. A., & Kusi, R. Y. (2023). Impact of sample size on multicollinearity with high dimensional data in logistic regression analysis. International Journal of Innovation and Development, 1(3).
Agyekum, G. O., Adarkwa, S. A., & Kusi, R. Y. (2023). Impact of sample size on multicollinearity with high dimensional data in logistic regression analysis. International Journal of Innovation and Development, 1(3).
Aucott, L. S., Garthwaite, P. H., &Currall, J. (2000). Regression methods for high dimensional multicollinear data. Communications in Statistics-Simulation and Computation, 29(4), 1021-1037.
Aguilera, A. M., Escabias, M., &Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis, 50(8), 1905-1924.
Aguilera-Morillo, M. C., Aguilera, A. M., Escabias, M., &Valderrama, M. J. (2013). Penalized spline approaches for functional logit regression. Test, 22(2), 251-277.
Bastien, P., Vinzi, V. E., &Tenenhaus, M. (2005). PLS generalised linear regression. Computational Statistics & data analysis, 48(1), 17-46.
Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2005). Modeling environmental data by functional principal component logistic regression. Environmetrics: The official journal of the International Environmetrics Society, 16(1), 95-107.
Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2004). Principal component estimation of functional logistic regression: discussion of two different approaches. Journal of Nonparametric Statistics, 16(3-4), 365-384.
Escabias, M., Aguilera, A. M., &Valderrama, M. J. (2007). Functional PLS logit regression model. Computational Statistics & Data Analysis, 51(10), 4891-4902.
Hosmer, D. W., Hosmer, T., Le Cessie, S., &Lemeshow, S. (1997). A comparison of goodness‐of‐fit tests for the logistic regression model. Statistics in medicine, 16(9), 965-980.
Hubert, M. H., &Wijekoon, P. (2006). Improvement of the Liu estimator in linear regression model. Statistical Papers, 47(3), 471.
Hosmer, D.W., &Lemeshow, S., (1989). Applied logistic regression. Wiley, New York
Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
Månsson, K., &Shukur, G. (2011). On ridge parameters in logistic regression. Communications in Statistics-Theory and Methods, 40(18), 3366-3381.
Newhouse, J. P., & Oman, S. D. (1971). An evaluation of ridge estimators. Rand Corporation. P-716-PR.
Prentice, R. L., &Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66(3), 403-411.
Pulkstenis, E., & Robinson, T. J. (2002). Two goodness‐of‐fit tests for logistic regression models with continuous covariates. Statistics in medicine, 21(1), 79-93.
Ryan, T.P., (1997). Modern regression methods.Wiley, New York.
Schaefer, R. L., Roi, L. D., & Wolfe, R. A. (1984). A ridge logistic estimator. Communications in Statistics-Theory and Methods, 13(1), 99-113.
Steyerberg, E. W., Eijkemans, M. J. C., &Habbema, J. D. F. (2001). Application of shrinkage techniques in logistic regression analysis: a case study. StatisticaNeerlandica, 55(1), 76-88.
Zhou, C., Wang, L., Zhang, Q., & Wei, X. (2014). Face recognition based on PCA and logistic regression analysis. Optik-International Journal for Light and Electron Optics, 125(20), 5916-5919.
Downloads
Published
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
License
Copyright (c) 2025 Sana Ali, Saima Afzal, Nasir Saleem (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


































