Logistic Regression Classification for Uncertain Data

International E-publication: Publish Projects, Dissertation, Theses, Books, Souvenir, Conference Proceeding with ISBN. International E-Bulletin: Information/News regarding: Academics and Research

Logistic Regression Classification for Uncertain Data

Musa Abdallah Bashir

Author Affiliations

¹College of Mathematics and Computer Science, Hebei University, Baoding 071002, CHINA

Res. J. Mathematical & Statistical Sci., Volume 2, Issue (2), Pages 1-6, February,12 (2014)

Abstract

Logistic regression (LR) is a famous classification technique commonly used in statistics, machine learning, and data mining area of knowledge for learning a response of binary nature. It assumes that the data values are pre-determined precisely, but this is not true for all conditions. Uncertainty data arises in many applications because of data collection methodology as in repeated measures, outdated sources and imprecise measurement as in physical experiments. Studying this uncertainty data becomes area of interest for researchers nowadays. In uncertainty, the value of data item is mostly characterized by a multiple values. So, machine learning techniques are also required to manage an uncertain data. This paper studies the modification of LR technique to handle data with an uncertainty. Statistical inference and theory of probabilities are used to obtain single unbiased estimator that represents the multiple values sufficiently and efficiently. The Maximum Likelihood Estimators (MLE) and the Probabilities Density Function (PDF) are used to capture the uncertainty. Results of the Experiments on UCI data sets demonstrated that the uncertain LR classifier can be constructed successfully, and its accuracy can be improved by taking into consideration the uncertainty information.

References

Hosmer D.W. and Lemeshow S., Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc,New York, (2000)
Google Scholar
Menard S., Applied logistic regression analysis, 2ndedn.Sage publications Inc, (2002)
Google Scholar
Neter J., Kutner M.H., Nachtsheim C.J. and Wasserman W., Applied linear statistical models, 4th edn. Irwin, Chicago,(1996)
Google Scholar
Thomas P. Ryan, Modern Regression Methods, 2nd edn.Wiley-Inter science New York, NY, USA, (2008)
Google Scholar
Brzezinski J.R. and Knafl G.J, Logistic regression modelingfor context-based classification, Database and ExpertSystems Applications, 1999. Proceedings. Tenth InternationalWorkshop on, 755-759, 1999doi: 10.1109/DEXA.1999.795279,(1999)
Google Scholar
Musa A.B., Comparative study on classification performancebetween support vector machine and logistic regression, Int JMach Learn Cybern, 4(1), 13-24 (2013)
Google Scholar
Aggarwal C.C., On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
Google Scholar
Cormode G. and McGregor A., Approximation algorithmsfor clustering uncertain data, In Principle of Data baseSystem (PODS), M. Lenzerini and D. Lembo, Eds. ACM,191–200 (2008)
Google Scholar
Singh S., Mayfield C., Prabhakar S., Shah R., and HambruscS., Indexing categorical data with uncertainty, In ICDE, 616-625, (2007)
J. Ge, Y. Xia and C. Nadungodage, UNN: A neural networkfor uncertain data classication, in PAKDD, 449-460 (2010)
Google Scholar
C.C. Aggarwal, A Survey of Uncertain Data Algorithms andApplications, In IEEE Transactions on Knowledge and DataEngineering, 21(5), (2009)
Google Scholar
C.C. Aggarwal, On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
Google Scholar
Tsang S., Kao B., Yip K., Ho W. and Lee S, Decision treesfor uncertain data, In: International Conference on DataEngineering (ICDE), (2009)
Google Scholar
J. Bi and T. Zhang, Support vector classication with inputdata uncertainty, inAdvances in Neural InformationProcessing Systems (NIPS), 161-168 (2004)
Google Scholar
B. Qin, Y. Xia, and F. Li, DTU, A decision tree for uncertaindata, in PAKDD, 4-15 (2009)
Google Scholar
Jiangtao Ren, Sau Dan Lee, Xianlu Chen, Ben Kao, ReynoldCheng and David Cheung, Naive Bayes Classification ofUncertain Data, Ninth IEEE International Conference onData Mining, (2009)
Google Scholar
B. Qin, Y. Xia and F. Li, A Bayesian classier for uncertaindata, in ACM Symposium on Applied Computing, 1010-1014 (2010)
Eliason S., Maximum Likelihood Estimation: Model andPractice, (1993)
Google Scholar
Mood, Graybill, Introduction to the Theory of Statistics, 3rdedn. McGraw Hill, New York, USA, 271–358, (1974)
Google Scholar
Silverman B.W., Density estimation for statistics and dataanalysis, Lodon; chapman and hall (1986)
Google Scholar
Kim S.J., Koh K., Lustig M., Boyd S. and Gorinevsky D., Aninterior-point method for largescale l1-regularized leastsquares, IEEE Journal on Selected Topics in SignalProcessing, 1(4), 606–617 (2007)
Google Scholar
Musa A.B., Comparison of l1-regularizion, PCA, KPCA andICA for Dimensionality Reduction in Logistic Regression, IntJ Mach Learn Cybern. doi: 10.1007_s13042-013-0171-7,(2013)
Google Scholar
E.L. Lehmann George Casella, Theory of Point Estimation, Second Edition, Springer, Springer-Verlag New York, Inc,83-114, (1998)
Google Scholar
Koh K., Kim S.J., Boyd S., l1_logreg: A large-scale solverfor l1-regularized logistic regression problems, 0.8.2Available at http://www.stanford.edu/*boyd/l1_logreg/,(2009)
Google Scholar

[ref1] Hosmer D.W. and Lemeshow S., Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc,New York, (2000)
Google Scholar

[ref2] Menard S., Applied logistic regression analysis, 2ndedn.Sage publications Inc, (2002)
Google Scholar

[ref3] Neter J., Kutner M.H., Nachtsheim C.J. and Wasserman W., Applied linear statistical models, 4th edn. Irwin, Chicago,(1996)
Google Scholar

[ref4] Thomas P. Ryan, Modern Regression Methods, 2nd edn.Wiley-Inter science New York, NY, USA, (2008)
Google Scholar

[ref5] Brzezinski J.R. and Knafl G.J, Logistic regression modelingfor context-based classification, Database and ExpertSystems Applications, 1999. Proceedings. Tenth InternationalWorkshop on, 755-759, 1999doi: 10.1109/DEXA.1999.795279,(1999)
Google Scholar

[ref6] Musa A.B., Comparative study on classification performancebetween support vector machine and logistic regression, Int JMach Learn Cybern, 4(1), 13-24 (2013)
Google Scholar

[ref7] Aggarwal C.C., On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
Google Scholar

[ref8] Cormode G. and McGregor A., Approximation algorithmsfor clustering uncertain data, In Principle of Data baseSystem (PODS), M. Lenzerini and D. Lembo, Eds. ACM,191–200 (2008)
Google Scholar

[ref9] Singh S., Mayfield C., Prabhakar S., Shah R., and HambruscS., Indexing categorical data with uncertainty, In ICDE, 616-625, (2007)

[ref10] J. Ge, Y. Xia and C. Nadungodage, UNN: A neural networkfor uncertain data classication, in PAKDD, 449-460 (2010)
Google Scholar

[ref11] C.C. Aggarwal, A Survey of Uncertain Data Algorithms andApplications, In IEEE Transactions on Knowledge and DataEngineering, 21(5), (2009)
Google Scholar

[ref12] C.C. Aggarwal, On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
Google Scholar

[ref13] Tsang S., Kao B., Yip K., Ho W. and Lee S, Decision treesfor uncertain data, In: International Conference on DataEngineering (ICDE), (2009)
Google Scholar

[ref14] J. Bi and T. Zhang, Support vector classication with inputdata uncertainty, inAdvances in Neural InformationProcessing Systems (NIPS), 161-168 (2004)
Google Scholar

[ref15] B. Qin, Y. Xia, and F. Li, DTU, A decision tree for uncertaindata, in PAKDD, 4-15 (2009)
Google Scholar

[ref16] Jiangtao Ren, Sau Dan Lee, Xianlu Chen, Ben Kao, ReynoldCheng and David Cheung, Naive Bayes Classification ofUncertain Data, Ninth IEEE International Conference onData Mining, (2009)
Google Scholar

[ref17] B. Qin, Y. Xia and F. Li, A Bayesian classier for uncertaindata, in ACM Symposium on Applied Computing, 1010-1014 (2010)

[ref18] Eliason S., Maximum Likelihood Estimation: Model andPractice, (1993)
Google Scholar

[ref19] Mood, Graybill, Introduction to the Theory of Statistics, 3rdedn. McGraw Hill, New York, USA, 271–358, (1974)
Google Scholar

[ref20] Silverman B.W., Density estimation for statistics and dataanalysis, Lodon; chapman and hall (1986)
Google Scholar

[ref21] Kim S.J., Koh K., Lustig M., Boyd S. and Gorinevsky D., Aninterior-point method for largescale l1-regularized leastsquares, IEEE Journal on Selected Topics in SignalProcessing, 1(4), 606–617 (2007)
Google Scholar

[ref22] Musa A.B., Comparison of l1-regularizion, PCA, KPCA andICA for Dimensionality Reduction in Logistic Regression, IntJ Mach Learn Cybern. doi: 10.1007_s13042-013-0171-7,(2013)
Google Scholar

[ref23] E.L. Lehmann George Casella, Theory of Point Estimation, Second Edition, Springer, Springer-Verlag New York, Inc,83-114, (1998)
Google Scholar

[ref24] Koh K., Kim S.J., Boyd S., l1_logreg: A large-scale solverfor l1-regularized logistic regression problems, 0.8.2Available at http://www.stanford.edu/*boyd/l1_logreg/,(2009)
Google Scholar