Home Author Index Search Volume 1 May 2009 ISSN 1797-9617

International Journal of

Recent Trends in Engineering

Home > Vol. 1, No. 2

 

International Journal of Recent Trends in Engineering (IJRTE)

ISSN 1797-9617

Volume 1, Number 2, May 2009

Issue on Computer Science

Page(s): 80-83

Selecting Scalable Algorithms to Deal With Missing Values

B. Mehala, P. Ranjit Jeba Thangaiah, and K. Vivekanandan

Full text: PDF

Abstract

Missing data is a common feature for large data sets in general. Imputation is a class of procedures that aims to fill the missing values with estimated ones. This method involves replacing missing values with estimated ones based on some information available in the data set. One advantage of this approach is that the imputation phase is separated from the analysis phase, allowing different data mining algorithms to be applied to complete data sets. There are many options varying from naive methods like mean or mode imputation to some learning methods, based on relationships among attributes. This work analyses the behavior of C4.5 to handle missing data in classification based mining algorithm and K-Means to handle missing data in cluster based mining algorithm.

Index Terms

Missing values, imputation, preprocessing, data mining, K-Means, C4.5

Published by Academy Publisher in cooperation with the ACEEE

@ Copyright 2009 ACADEMY PUBLISHER All rights reserved