Home Author Index Search Volume 1 May 2009 ISSN 1797-9617

International Journal of

Recent Trends in Engineering

Home > Vol. 1, No. 1

 

International Journal of Recent Trends in Engineering (IJRTE)

ISSN 1797-9617

Volume 1, Number 1, May 2009

Issue on Computer Science

Page(s): 178-182

An Efficient OCR for Printed Malayalam Text using Novel Segmentation Algorithm and SVM Classifiers

Bindu Philip and R. D. Sudhaker Samuel

Full text: PDF

Abstract

This paper describes an Optical Character Recognition (OCR) System for printed text documents in Malayalam, a South Indian language. Indian scripts are rich in patterns while the combinations of such patterns makes the problem even more complex and these complex patterns are exploited to arrive at the solution. The system segments the scanned document image into text lines, words and further characters and sub-characters. The segmentation algorithm proposed is motivated by the structure of the script. A novel set of features, computationally simple to extract are proposed. The approaches used here are based on the distinctive structural features of machine-printed text lines in these scripts. A lateral cross-sectional analysis is performed along each row of the normalized binary image matrix resulting in distinct features. The final recognition is achieved through classifiers based on the Support Vector Machine (SVM) method. The proposed algorithms have been tested on a variety of printed Malayalam characters and currently achieve recognition rates between 90.22% and 95.31 %.

Index Terms

Malayalam Script, OCR, Structural approach, Segmentation, Support Vector Machine (SVM) Classifier.

Published by Academy Publisher in cooperation with the ACEEE

@ Copyright 2009 ACADEMY PUBLISHER All rights reserved