International E-publication: Publish Projects, Dissertation, Theses, Books, Souvenir, Conference Proceeding with ISBN.  International E-Bulletin: Information/News regarding: Academics and Research

Gene Expression Data Classification with Kernel independent Component Analysis

Author Affiliations

  • 1College of Mathematics and Computer Science, Hebei University, Baoding 071002, CHINA

Res. J. Mathematical & Statistical Sci., Volume 2, Issue (5), Pages 1-7, May,12 (2014)


The challenge of classifying the characteristics of gene expression data is that the size of the training data is significantly lower than the number of features. Logistic regression (LR) is standard statistical method that broadly used in medical, epidemiology and bioinformatics communities for classification task; however, in such situation of gene expression data, LR does not work efficiently due to multi- collinearly and over- fitting problems, therefore, modifying of LR to analysis the microarray data is required. For solving those problems, reduction dimension is usually used. Recently, kernel approaches have proven to be good for classification such type of data. Kernel independent component analysis (KICA) is the nonlinear form of independent component analysis (ICA). In this paper, LR is applied to classify the features that selected by KICA. To evaluate the classification performance of this technique, this method has compared to kernel principle component analysis (KPCA) and independent component analysis (ICA). Numerous performance metrics such as accuracy, sensitivity, specificity, precision, F-score, the area under receiver operating characteristic curve (AUC) and the receiver operating characteristic (ROC) analysis are used.


  1. Hosmer D.W. and Lemeshow S., Applied logisticregression, 2nd edn. Wiley series in probability andstatistics, Wiley, Inc, New York, (2000)
  2. Menard S., Applied logistic regression analysis, 2nd edn.Sage publications Inc, (2002)
  3. Neter J., Kutner M.H., Nachtsheim C.J. and WassermanW., Applied linear statistical models, 4th edn. Irwin,Chicago, (1996)
  4. Ryan T.P., Modern regression methods, 2nd edn. Wiley,New York, (2008)
  5. Brzezinski J.R. and Knafl G.J., Logistic regressionmodeling for context-based classification, In: Proceedingstenth international workshop on database and expertsystems applications, 755–759 (1999)doi:10.1109/DEXA.1999.795279
  6. Liao J.G. and Chin K.V., Logistic regression for diseaseclassification using microarray data: model selection in alarge p and small n case, Bioinformatics, 23(15), 1945–1951(2007)
  7. Sartor M.A., Leikauf G.D., Medvedovic Lrpath M, Alogistic regression approach for identifying enrichedbiological groups in gene expression data, Bioinformatics,25(2), 211–217 (2008)
  8. Asgary M.P., Jahandideh S., Abdolmaleki P. andKazemnejad A., Analysis and identification of b-turntypes using multinomial logistic regression and artificialneural network, Bioinformatics, 23(23), 3125–3130(2007)
  9. Lee Jae Won, et al., An extensive comparison of recentclassification tools applied to microarray data, Computational Statistics and Data Analysis, 48(4), 869-885(2005)
  10. Nanni Loris, Sheryl Brahnam, and Alessandra Lumini, Combining multiple approaches for gene microarrayclassification, Bioinformatics, 28(8), 1151-1157 (2012)
  11. Guyon Isabelle, et al., Gene selection for cancerclassification using support vector machines, Machinelearning, 46(1-3), 389-422 (2002)
  12. Brown Michael P.S., et al., Knowledge-based analysis ofmicroarray gene expression data by using support vectormachines, Proceedings of the National Academy ofSciences, 97(1), 262-267(2000)
  13. Furey, Terrence S., et al., Support vector machineclassification and validation of cancer tissue samplesusing microarray expression data, Bioinformatics, 16(10),906-914 (2000)
  14. Zhang, Hao Helen, et al., Gene selection using supportvector machines with non-convex penalty, Bioinformatics,22(1), 88-95 (2006)
  15. Friedman, Nir, et al., sing Bayesian networks to analyzeexpression data, Journal of computational biology, 7(3-4),601-620 (2000), undefined
  16. Baldi, Pierre and Anthony D. Long, A Bayesianframework for the analysis of microarray expression data:regularized t-test and statistical inferences of genechanges, Bioinformatics, 17(6), 509-519 (2001)
  17. Bae, Kyounghwa and Bani K. Mallick, Gene selectionusing a two-level hierarchical Bayesian model, Bioinformatics 20(18), 3423-3430 (2004)
  18. Lee, Kyeong Eun, et al., Gene selection: a Bayesianvariable selection approach, Bioinformatics, 19(1), 90-97(2003)
  19. Bae, Kyounghwa and Bani K. Mallick, Gene selectionusing a two-level hierarchical Bayesian model, Bioinformatics, 20(18), 3423-3430 (2004)
  20. Musa A.B., Comparative study on classificationperformance between support vector machine and logisticregression, Int J Mach Learn Cybern, 4(1), 13-24 (2013)
  21. Liao J.G. and Khew-Voon Chin, Logistic regression fordisease classification using microarray data: modelselection in a large p and small n case, Bioinformatics,23(15), 1945-1951(2007)
  22. Fort, Gersende, and Sophie LambertLacroix, assificationusing partial least squares withpenalized logistic regression, Bioinformatics, 21(7), 1104-1111(2005)
  23. Shen, Li, and Eng Chong Tan, Dimension reduction-basedpenalized logistic regression for cancer classificationusing microarray data, IEEE/ACM Transactions onComputational Biology and Bioinformatics (TCBB), 2(2),166-175 (2005)
  24. Zhu, Ji, and Trevor Hastie, Classification of genemicroarrays by penalized logistic regression, Biostatistics,5(3), 427-443 (2004)
  25. Sartor, Maureen A., George D, Leikauf and MarioMedvedovic, LRpath: a logistic regression approach foridentifying enriched biological groups in gene expressiondata, Bioinformatics, 25(2), 211-217 (2009)
  26. Nguyen, Danh V. and David M. Rocke, On partial leastsquares dimension reduction for microarray-basedclassification: a simulation study, Computational statisticsand data analysis, 46(3), 407-425 (2004)
  27. Yeung, Ka Yee and Walter L. Ruzzo, Principalcomponent analysis for clustering gene expression data, Bioinformatics 17(9),763-774 (2001)
  28. Huang, , Shuang and Chun-Hou Zheng,ndependentcomponent analysis-based penalized discriminant methodfor tumor classification using gene expression data,Bioinformatics, 22(15), 1855-1862 (2006)
  29. Liu, Zhenqiu, Dechang Chen and Halima Bensmail, eneexpression data classification with kernel principalcomponent analysis, BioMed Research International, 2,155-159 (2005)
  30. Gao, Qingsong, et al, Gene-or region-based associationstudy via kernel principal component analysis, BMCgenetics, 12(1), 75 (2011)
  31. Bach Francis R. and Michael I. Jordan, Kernelindependent component analysis, The Journal of MachineLearning Research, 3, 1-48(2003)
  32. Musa, Abdallah Bashir, Logistic Regression Classificationfor Uncertain Data, Research Journal of Mathematicaland Statistical Sciences -ISSN 2320: 6047 (2014)
  33. Hyva¨rinen A, Karhunen J, Oja E, Independent componentanalysis, Wiley, New York, (2001)
  34. Hyvarinen A, Oja E, Independent component analysis:algorithms and applications, Neural Network, 13, 411–430(2000)
  35. Musa, Abdallah Bashir, A comparison of ℓ1-regularizion,PCA, KPCA and ICA for dimensionality reduction inlogistic regression, International Journal of MachineLearning and Cybernetics, 1-13 (2013)
  36. Hyvarinen A. and Oja E., A fast fixed-point algorithm forindependent component analysis, Neural Comput, 9(7),483–1492 (1997)
  37. Jin, Xin, et al, Kernel independent component analysis forgene expression data clustering, Independent ComponentAnalysis and Blind Signal Separation, Springer BerlinHeidelberg, 454-461(2006)
  38. van der Maaten L, Statistical pattern recognition toolboxfor Matlab (stprtool) version 2.11, version 0.7.2b, (2010)
  39. Gavert H., Hurri J., Sarela J. and Hyvarinen A., , Fast ICAforMatlab 7.x and 6.x, Version 2.5, (2005)
  40. Koh K., Kim S.J. and Boyd S., l1_logreg: A large-scalesolver for l1-regularized logistic regression problems, 0.8.2 Available at*boyd/l1_logreg/, (2009)