Skip to Main Content
Skip Nav Destination
ASME Press Select Proceedings
International Conference on Advanced Computer Theory and Engineering (ICACTE 2009)
Xie Yi
Xie Yi
Search for other works by this author on:
No. of Pages:
ASME Press
Publication date:

There are more than 1000 languages and 14 scripts used by 112 million people in India. All of these scripts divide the document in three parts: Text block, Image block, and Table block. In 21st century, there is a need, obvious reasons, to convert these old printed documents in digital form. Converting them manually is a huge and difficult task. Further it is prone to human errors. Another automated technique is to use Optical character recognition (OCR) system to convert the entire printed document image into editable document. In this paper, an effort has been made to develop OCR technique which converts the printed document into editable document. Firstly a scanned document is preprocessed for noise and skew correction. It is then followed by text-non text classification. Then text line detection has to be performed in the text area. There is no method available which can detect the text line if the image contains the multicolumn text area. In this paper the main contribution is to detect the blocks and detect the text lines in these detected blocks. The technique which can extract the text lines in image document is presented here. After extraction of text lines, word segmentation, character segmentation, and template matching can be performed.

Key Words
1 Introduction
2 Indian Scripts
3 Pre-Processing of Document
4. Proposed Solution
5. Post Processing of Document
6. Conclusion and Future Work
This content is only available via PDF.
You do not currently have access to this chapter.
Close Modal

or Create an Account

Close Modal
Close Modal