Home Author Index Search Volume 1 May 2009 ISSN 1797-9617

International Journal of

Recent Trends in Engineering

Home > Vol. 1, No. 2


International Journal of Recent Trends in Engineering (IJRTE)

ISSN 1797-9617

Volume 1, Number 2, May 2009

Issue on Computer Science

Page(s): 166-169

Tamil POS Tagging using Linear Programming

Dhanalakshmi V, Anand Kumar, Shivapratap G, Soman KP and Rajendran S

Full text: PDF


Part of speech (POS) tagging is the process of annotating syntactic categories for each word in a corpus. This paper presents an SVM methodology based on Linear Programming for implementing automatic Tamil POS tagger. We have designed our own tagset consisting of 32 tags for preparing the annotated corpus for Tamil. The features are extracted from a corpus of twenty five thousand sentences and trained with linear programming based SVM. This method, when tested with 10,000 sentences, gave an overall accuracy of 95.63%.

Index Terms

Annotated corpus, tokenization, tagging, machine learning

Published by Academy Publisher in cooperation with the ACEEE

@ Copyright 2009 ACADEMY PUBLISHER All rights reserved