Reliability Design for Large Scale Data Warehouses
Kai Du, Zhengbing Hu, Huaimin Wang, Yingwen Chen, Shuqiang Yang, and Zhijian Yuan
Data reliability has been drawn much concern in large-scale data warehouses with 1PB or more
data. It highly depends on many inter-dependent system parameters, such as the replica placement
policies, number of nodes and so on. Previous work has roughly and separately discussed the
individual impacts of these parameters, and seldom provided their optimal values, nor mentioned
their optimal combination. In this paper, we present a new object-based-repairing Markov model.
Based on analyzing this model in three popular replica placement policies, we figure out the
individual optimal values of these parameters at first, and then work out their optimal combination
by GA. Compared with the existing models, our model is easier to solve while reaching more
integrative and practical conclusions. These conclusions can effectively instruct the designers to
build more reliable large-scale data warehouses.

Index Terms
data reliability, reliability model, large-scale data warehouse