373 Farsi/Arabic Document Image Retrieval Through Sub-Letter Shape Coding Available to Purchase
-
Published:2011
Download citation file:
In this paper, A Novel method for Recognition free Farsi document retrieval is proposed. In this method, the retrieval is done through recognition of sub-letters and other elements of letters such as dots and some signs like Sarkesh. So at first in pre processing phase, lines and words are extracted using blank space between them. In the next phase, each word is divided to its sub-words. A sub-word is a combination of joint letters. For each sub-word, connectors of sub-letters are removed from the initial body of it and remains are recognized as sub-letters by using of their extracted features. The recognized sub-letters are encoded using a dictionary that has been defined in this system. Finally, the document content is encoded and this code can be used for retrieval of existing words in this document. Experimental results show advantages of this method in the retrieval of Persian printed documents.