Abstract
One of the major problems in a Data Grid is the optimal distribution and replication of data files in the Grid sites, in order to improve and maintain over time a high overall throughput of Grid jobs that access files. Therefore, a Grid optimisation service [3] should be able to dynamically modify the geographic distribution of data files, triggering file deletion and replication, according to the variation over time of the sites (so called “data hot-spots”) where data is highly requested. In this document we propose two prediction functions for evaluating the future usefulness (value) of data files. These functions can be used by Grid sites to make informed decisions about whether or not to replicate files locally. Both functions use for their prediction logs of file requests that jobs have submitted to the site but assume different statistical models for the historic data. We have performed some preliminary tests on the accuracy of these prediction functions using randomly generated simulated file access patterns. We have compared the predicted values with the simulated ones. It turns out that, over the performed tests, the two functions behave similarly and predict the simulated values reasonably well.