International E-publication: Publish Projects, Dissertation, Theses, Books, Souvenir, Conference Proceeding with ISBN.  International E-Bulletin: Information/News regarding: Academics and Research

Logistic Regression Classification for Uncertain Data

Author Affiliations

  • 1College of Mathematics and Computer Science, Hebei University, Baoding 071002, CHINA

Res. J. Mathematical & Statistical Sci., Volume 2, Issue (2), Pages 1-6, February,12 (2014)

Abstract

Logistic regression (LR) is a famous classification technique commonly used in statistics, machine learning, and data mining area of knowledge for learning a response of binary nature. It assumes that the data values are pre-determined precisely, but this is not true for all conditions. Uncertainty data arises in many applications because of data collection methodology as in repeated measures, outdated sources and imprecise measurement as in physical experiments. Studying this uncertainty data becomes area of interest for researchers nowadays. In uncertainty, the value of data item is mostly characterized by a multiple values. So, machine learning techniques are also required to manage an uncertain data. This paper studies the modification of LR technique to handle data with an uncertainty. Statistical inference and theory of probabilities are used to obtain single unbiased estimator that represents the multiple values sufficiently and efficiently. The Maximum Likelihood Estimators (MLE) and the Probabilities Density Function (PDF) are used to capture the uncertainty. Results of the Experiments on UCI data sets demonstrated that the uncertain LR classifier can be constructed successfully, and its accuracy can be improved by taking into consideration the uncertainty information.

References

  1. Hosmer D.W. and Lemeshow S., Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc,New York, (2000)
  2. Menard S., Applied logistic regression analysis, 2ndedn.Sage publications Inc, (2002)
  3. Neter J., Kutner M.H., Nachtsheim C.J. and Wasserman W., Applied linear statistical models, 4th edn. Irwin, Chicago,(1996)
  4. Thomas P. Ryan, Modern Regression Methods, 2nd edn.Wiley-Inter science New York, NY, USA, (2008)
  5. Brzezinski J.R. and Knafl G.J, Logistic regression modelingfor context-based classification, Database and ExpertSystems Applications, 1999. Proceedings. Tenth InternationalWorkshop on, 755-759, 1999doi: 10.1109/DEXA.1999.795279,(1999)
  6. Musa A.B., Comparative study on classification performancebetween support vector machine and logistic regression, Int JMach Learn Cybern, 4(1), 13-24 (2013)
  7. Aggarwal C.C., On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
  8. Cormode G. and McGregor A., Approximation algorithmsfor clustering uncertain data, In Principle of Data baseSystem (PODS), M. Lenzerini and D. Lembo, Eds. ACM,191–200 (2008)
  9. Singh S., Mayfield C., Prabhakar S., Shah R., and HambruscS., Indexing categorical data with uncertainty, In ICDE, 616-625, (2007)
  10. J. Ge, Y. Xia and C. Nadungodage, UNN: A neural networkfor uncertain data classication, in PAKDD, 449-460 (2010)
  11. C.C. Aggarwal, A Survey of Uncertain Data Algorithms andApplications, In IEEE Transactions on Knowledge and DataEngineering, 21(5), (2009)
  12. C.C. Aggarwal, On Density Based Transforms for uncertainData Mining, In ICDE Conference Proceedings, (2007)
  13. Tsang S., Kao B., Yip K., Ho W. and Lee S, Decision treesfor uncertain data, In: International Conference on DataEngineering (ICDE), (2009)
  14. J. Bi and T. Zhang, Support vector classication with inputdata uncertainty, inAdvances in Neural InformationProcessing Systems (NIPS), 161-168 (2004)
  15. B. Qin, Y. Xia, and F. Li, DTU, A decision tree for uncertaindata, in PAKDD, 4-15 (2009)
  16. Jiangtao Ren, Sau Dan Lee, Xianlu Chen, Ben Kao, ReynoldCheng and David Cheung, Naive Bayes Classification ofUncertain Data, Ninth IEEE International Conference onData Mining, (2009)
  17. B. Qin, Y. Xia and F. Li, A Bayesian classier for uncertaindata, in ACM Symposium on Applied Computing, 1010-1014 (2010)
  18. Eliason S., Maximum Likelihood Estimation: Model andPractice, (1993)
  19. Mood, Graybill, Introduction to the Theory of Statistics, 3rdedn. McGraw Hill, New York, USA, 271–358, (1974)
  20. Silverman B.W., Density estimation for statistics and dataanalysis, Lodon; chapman and hall (1986)
  21. Kim S.J., Koh K., Lustig M., Boyd S. and Gorinevsky D., Aninterior-point method for largescale l1-regularized leastsquares, IEEE Journal on Selected Topics in SignalProcessing, 1(4), 606–617 (2007)
  22. Musa A.B., Comparison of l1-regularizion, PCA, KPCA andICA for Dimensionality Reduction in Logistic Regression, IntJ Mach Learn Cybern. doi: 10.1007_s13042-013-0171-7,(2013)
  23. E.L. Lehmann George Casella, Theory of Point Estimation, Second Edition, Springer, Springer-Verlag New York, Inc,83-114, (1998)
  24. Koh K., Kim S.J., Boyd S., l1_logreg: A large-scale solverfor l1-regularized logistic regression problems, 0.8.2Available at http://www.stanford.edu/*boyd/l1_logreg/,(2009)