International Journal of Recent Trends in Engineering (IJRTE)

ISSN 1797-9617

Volume 1, Number 2, May 2009

Issue on Computer Science

Page(s): 183-185

A SVM based approach to Telugu Parts Of Speech Tagging using SVMTool

G.Sindhiya Binulal, P. Anand Goud, K.P.Soman

There are different approaches to the problem of labeling a part of speech (POS) tag to each word of a natural language sentence. Parts of speech tagging is one of the most well studied problems in the field of Natural Language Processing (NLP).Parts of speech tagging is the sequence labeling problem. Labeling a POS tag to each word of an un-annotated corpus by hand is very time consuming which results in finding a method to automate the job.

In this paper SVMTool is applied to the problem of part of speech tagging for TELUGU language. Pos tagging can be seen as multiclass classification problem. This paper mainly explains about how binary classifier can be used for multiclass classification problem. Telugu is written the way it is spoken. The tagset used in this paper consists of 10 tags. The training corpus consists of 25000 words. The obtained accuracy is around 95% for Telugu language. Better results can be achieved by increasing the corpus size.

Index Terms

SVMTool, Tagged Corpus, Tagset

