Skip to Main Content
ASME Press Select Proceedings

International Conference on Advanced Computer Theory and Engineering (ICACTE 2009)

By
Xie Yi
Xie Yi
Search for other works by this author on:
ISBN:
9780791802977
No. of Pages:
2012
Publisher:
ASME Press
Publication date:
2009

There are more than 1000 languages and 14 scripts used by 112 million people in India. All of these scripts divide the document in three parts: Text block, Image block, and Table block. In 21st century, there is a need, obvious reasons, to convert these old printed documents in digital form. Converting them manually is a huge and difficult task. Further it is prone to human errors. Another automated technique is to use Optical character recognition (OCR) system to convert the entire printed document image into editable document. In this paper, an effort has been made to develop OCR technique which converts the printed document into editable document. Firstly a scanned document is preprocessed for noise and skew correction. It is then followed by text-non text classification. Then text line detection has to be performed in the text area. There is no method available which can detect the text line if the image contains the multicolumn text area. In this paper the main contribution is to detect the blocks and detect the text lines in these detected blocks. The technique which can extract the text lines in image document is presented here. After extraction of text lines, word segmentation, character segmentation, and template matching can be performed.

This content is only available via PDF.
Close Modal
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close Modal
Close Modal