Home Author Index Search Volume 1 May 2009 ISSN 1797-9617

International Journal of

Recent Trends in Engineering

Home > Vol. 1, No. 1

 

International Journal of Recent Trends in Engineering (IJRTE)

ISSN 1797-9617

Volume 1, Number 1, May 2009

Issue on Computer Science

Page(s): 408-412

Maximum Entropy Approach for Named Entity Recognition in Bengali and Hindi

Mohammad Hasanuzzaman, Asif Ekbal and Sivaji Bandyopadhyay

Full text: PDF

Abstract

This paper reports about the development of a Named Entity Recognition (NER) system in two leading Indian languages, namely Bengali and Hindi using the Maximum Entropy (ME) framework. We have used the annotated corpora, obtained from the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL) and tagged with a fine-grained Named Entity (NE) tagset of twelve tags. An appropriate tag conversion routine has been developed in order to convert these corpora to the forms, tagged with the four NE tags, namely Person name, Location name, Organization name and Miscellaneous name. The system makes use of the different contextual information of the words along with the variety of orthographic word-level features that are helpful in predicting the four NE classes. In this work, we have considered language independent features that are applicable to both the languages as well as the language specific features of Bengali and Hindi. Evaluation results show that the use of linguistic features can improve the performance of the system. Evaluation results of the 10-fold cross validation tests yield the overall average recall, precision, and f-score values of 88.01%, 82.63%, and 85.22%, respectively, for Bengali and 86.4%, 79.23%, and 82.66%, respectively, for Hindi.

Index Terms

Named Entity; Named Entity Recognition; Maximum Entropy Model; Bengali; Hindi

Published by Academy Publisher in cooperation with the ACEEE

@ Copyright 2009 ACADEMY PUBLISHER All rights reserved