International E-publication: Publish Projects, Dissertation, Theses, Books, Souvenir, Conference Proceeding with ISBN.  International E-Bulletin: Information/News regarding: Academics and Research

The Analysis of Connected Components and Clustering in Segmentation of Persian Texts

Author Affiliations

  • 1Faculty Member of Technical and Vocation University, Kerman, IRAN
  • 2 Faculty Member of Technical and Vocation University,Yazd, IRAN
  • 3Faculty member of Azad University, South Tehran Branch, Tehran, IRAN
  • 4Faculty member of Shahid Bahonar University, Kerman, IRAN

Res. J. Recent Sci., Volume 3, Issue (4), Pages 71-77, April,2 (2014)


According to the application development computer in human life and increasing use of structured electronic documents and advantages of using them, the need to convert paper documents into their electronic format and use of image processing has been increased. Among researches that have been done in this field, we can point to the identification of the words in texts that comprehensive researches have been done in different languages such as : English, Japanese and Chinese. However, in Persian and Arabic languages, due to the complexity of these languages such as letters interconnection and various forms for letters according to their position in word, it is still need to research in this field. Segmentation is one of the most important steps in letter recognition system that it accuracy and speed is very important. Segmentation of Persian texts is the hardest since the specification of this language. In this study, we try to present a fast and efficient algorithm than same algorithms for segmentation of Persian documents with that help of connected components and clustering, we pay to identification and grouping of text and image areas. The users of this project are typical and we can use it as preprocessing steps of Optical Character Recognition systems. This research has been done on a collection of 100 scanned images of Persian newspapers and magazines with 300 dpi clarification and also it shows the simulation results with accuracy rate of %92.3 and significant speed than other approaches such as Voronoi Diagram.


  1. OfGorman, L. and R. Kasturi, Document Image Analysis, Los Alamitos, California: IEEE computer Society Press, (1995)
  2. Haralick, R. Document Image Understanding: Geometric and Logical Layout. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (1994)
  3. Jain, A. and Y. Zhong, Page Segmentation Using Texture Analysis. Pattern Recognition, 29, 743-77 (1996)
  4. Jain, A. and K. Karu, Learning Texture Discrimination Masks. IEEE Trans, Pattern Analysis and Machine Intelligence, 18, 195-20 (1995)
  5. Jain A. and Bhattacharjee S., Text Segmentation Using Gabor Filters for Automatic Document Processing, Machine Vision and Applications, , 169-184 (1992)
  6. Ittner D. and Baird H., Language-Free Layout Analysis. in Proc. Second Int’l Conf. Document Analysis and Recognition, Tsukuba, Japan (1993)
  7. Baird H., Anatomy of a Versatile Page Reader. in Proc. IEEE. (1992)
  8. Antonacopoulos A. and Ritchings R., Flexible Page Segmentation Using the Background. in Proc. 12th Intfl Conf.. Pattern Recognition, Jerusalem (1994)
  9. Amamoto, N., S. Torigoe, and Y. Hirogaki. Block Segmentation and Text Area Extraction of Vertically/Horizontally Written Document. in Proc. Second Intfl Conf.Document Analysis and Recognition. Tsukuba, Japan (1993)
  10. Akindele, O. and A. Belaid. Page Segmentation by Segment Tracing. in Proc. Second Intfl Conf. Document Analysis and Recognition. Tsukuba, Japan (1993)
  11. Nagy, G. and S. Seth. Hierarchical Representation of Optically Scanned Documents. in Proc. Seventh Intfl Conf. Pattern Recognition, Montreal (1984)
  12. Krishnamoorthy M., et al., Syntactic Segmentation and Labelingof Digitized Pages From Technical Journals. IEEE Trans. Pattern Analysis and Machine Intelligence, 15, 743-747 (1993)
  13. Pavlidis, T. and J. Zhou. Page Segmentation by White Streams. in Proc. First Intfl Conf. Document Analysis and Recognition. Saint-Malo, France (1991)
  14. Ingold, R. and D. Armangil. A Top-Down Document Analysis Method for Logical Structure Recognition. in Proc. First Intfl Conf. Document Analysis and Recognition. Saint-Malo, France (1991)
  15. Fujisawa H. and Nakano Y., A Top-Down Approach for the Analysis of Documents. in Proc. 10th Intfl Conf.Pattern Recognition. Atlantic City, N.J (1990)
  16. Chenevoy Y. and A. Belaid. Hypothesis Management for Structured Document Recognition. in Proc. First Intfl Conf. Document Analysis and Recognition, Saint-Malo, France (1991)
  17. Liu J., et al. Adaptive Document Segmentation and Geometric Relation Labeling: Algorithms and Experimental Results. in Proc. 13th Int’l Conf. Pattern Recognition : Vienn (1996)
  18. Esposito F., Malerba D. and Semeraro G., A Knowledge-Based Approach to the Layout Analysis. in Proc. Third Intfl Conf. Document Analysis and Recognition, Montreal (1995)
  19. OfGorman, L., The Document Spectrum for Page Layout Analysis. IEEE Trans. Pattern Analysis and Machine Intelligence. 15, 1162-1173 (1993)
  20. Parhami B. and Tereghi M., Automatic recognition of printed Farsi Text. Pattern Recognition (1981)
  21. Zheng L., H. A.H., and X. tang, A new algorithm for machine printed Arabic Character segmentation, pattern Recogniting Letters. 2(15), (2004)
  22. Yin, F. and Cheng-LinLiu, Handwritten Chinese text line segmentation by clustering with distance metric learning, pattern recognition, 42, 3146-3157 (2009)
  23. Ahmad H.A. and Zitar R.A., Development of an efficient neural-based segmentation technique forArabic handwriting recognition, Pattern Recognition, 43, 2773-2798 (2010)
  24. Romeo-Parker K., Miled H. and Lecourtier Y., A New Approach for Latin/Arabic character segmentation, in in Proc. International Conference on Document Analysis and Recognition (1995)
  25. Rastegarpour M. and J. Shanbezadeh, OFF-Line Hand-written Farsi/Arabic Word segmentation into subword under Overlapped or Connected Conditions. in The Springer Proceeding of International Workshop on Advances in pattern Recognition IWAPR2007. University of Loughborough, Plymouth-UK (2007)
  26. Pechwitz M. and V. Marger, Baseline estimation for Arabic handwriting Recognition (2002)
  27. Hashemi M.R., O. Fatemi, and R. safavi. Persian Cursive script Recognition. in proc. International conference on Document Analysis and Recognition (1995)
  28. Kise K., Sato A. and Iwata M., Segmentation of Page Images Using the Area Voronoi Diagram. Computer Vision and Image Understanding, 70(3), 370-382 (1998)
  29. Azmi R. and Kabir E., A new segmentation technique for omnifont Farsi Text. Pattern Recognition Letters, 22 (2001)
  30. Motawa D., A. Amin, and R. Sabourin ,segmentation of Arabic Cursive Script, in IEEE. (1997)
  31. Safabakhsh R. and P. Adibi, Nastaaligh Hand Written Word Recogntion Using A Continuous-Density Variable-Duration Hmm. The Arabian Journal for Science and Engineering, 30(Number 1 B) (2005)
  32. Jain R., Kasturi R. and Schunck B.G., Machine vision: McGraw-Hill, Inc (1995)
  33. Simon A., Pret J. and Johnson A., A Fast Algorithm for Bottom-Up Document Layout Analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, 19, 273-276 (1997)
  34. Kovacs F., Legany C. and Babos A., Cluster Validity Measurement Techniques, Department of Automation and Applied Informatics, Budapest University of Technology and Economics (2003)
  35. Keller, F., Clustering, in Tutorial Slides, Computer University Saarlandes, (2003)
  36. Muhammad Altaf Khan, Islam S., Murad Ullah, Sher Afzal Khan, G. Zaman, Muhammad Arif1 and Syed Farasat Sadiq, Application of Homotopy Perturbation Method to Vector Host Epidemic Model with Non-Linear Incidences, Research Journal of Recent Sciences, 2(6), 90-95 (2013)
  37. Farooq Ahmad, Sher Afzal Khan, Ilyas Fakhir and Yaser Daanial Khan, A Survey on Linear Algebraic Approaches for the Analysis of Petri Net based Models, Res. J. Recent Sci., 2(5), 21-28 (2013)
  38. Panah Amir, Enhanced SLAM for a Mobile Robot using Unscented Kalman Filter and Radial Basis Function Neural Network, Res. J. Recent Sci.,2(2), 69-75 (2013)
  39. Belsare Satish and Patil Sunil, Study and Evaluation of uses behavior in e-commerce Using Data Mining, Res. J. Recent Sci., 1(ISC-2011), 375-387 (2012)