Careful filtering will slow this prodigious rate down to a few petabytes a year. Even so, this is still too much for CERN to deal with on site. The data will therefore be disseminated through a four-tiered system of computer centres, allowing them to be spread over thousands of individual workstations in dozens of universities around the world.
Here, the Grid will come into its element. Repeatedly searching and retrieving large data sets from thousands of computers would be extremely inefficient. So researchers at CERN are looking at ways of processing the data where it is — allowing a physicist sitting at any participating workstation to talk to the rest of the complex web of computers and data storage systems as if they were in his own laboratory.
This raises a radical new concept of "virtual data". The idea is that new data, derived from the analysis of raw data, should not be discarded after use but saved for retrieval, in the same way that raw data are stored. In a later calculation, the Grid's software could then automatically detect whether a computation had to be started from scratch, or could take a short cut using a previous result stored somewhere in the system.
The European response to all the Grid activity in America is the DataGrid project co-ordinated by CERN. Here, the objective is to develop middleware for research projects in the biological sciences, earth observation and high-energy physics. As befits a project overseen by bureaucrats in Brussels, the search for an overarching Grid standard for many different science projects has become a leitmotif. Fabrizio Gagliardi, the project manager for DataGrid, despairs at the many Grid initiatives already underway. "If we each develop similar solutions," he says, "it simply won't work."
A plethora of Grid standards is a real possibility. After all, even electricity has no worldwide standard of voltage or frequency. Much of the talk at the first Global Grid Forum was about giving Grid development a sense of common cause. But once the commercial potential of the Grid begins to dawn, standard-setting skirmishes will break out between companies and even countries.
While scientists gear up to use the Grid, the question remains whether — beyond the search for such exotica as Higgs bosons, running vast protein-folding calculations or simulating the weather — there is any real need for such massive computing. Cynics reckon that the Grid is merely an excuse by computer scientists to milk the political system for more research grants so they can write yet more lines of useless code. There is some truth in this. Many of the Grid projects running today resemble solutions in search of problems. And given the number of independent Grid initiatives, much of the work under way is going to be redundant in any case. But there is always the hope that the competition will breed a better class of Grid in the end.
A more serious criticism is that today's Grid projects focus too much on storing and analysing large data sets, and making the sort of "embarrassingly parallel" calculations that only scientists need. These are obvious applications of the Grid. The real challenge is to figure out how to achieve the more ambitious goals that Grid enthusiasts have set themselves. Some of these are outlined by Mr Foster and colleagues in a paper that will appear shortly in the International Journal of Supercomputer Applications. The authors argue that the Grid will really come into its own only when people learn to build "virtual organisations". Such a virtual organisation could be a crisis-management team dealing with an earthquake or chemical spill. In such circumstances, the Grid would be ideal for analysing local weather, soil models, water supplies and local demographics. It could even help with communications, allowing field workers to discuss problems with office staff by means of video conferencing.
In another guise, the virtual organisation could be an industrial consortium developing, say, a passenger jet. The Grid would allow the consortium to run simulations of various combinations of components from different manufacturers, while keeping the proprietary know-how associated with each component concealed from other consortium members. In both cases, several unrelated types of calculation, using independent data sets, and running on a range of computers in different organisations that may not fully trust one another, have to be threaded together to achieve a coherent answer.
It is precisely because such undertakings would be a nightmare to negotiate over the Internet that the Grid has come into being. Unlike neatly defined science problems, which amaze more by their petabyte scale than their inherent complexity, virtual organisations are noteworthy because of their constantly shifting landscape of data and computer resources, and the authentication on which they rely. This is where the Grid would have its greatest value. It is such problems that could lead to new business models — just as the Internet created the conditions for e-commerce. Unfortunately, this is where the Grid's developers have barely scratched the surface.
Timing Is All
The idea of distributed computing is as old as electronic computing. When devised more than 35 years ago, Multics — a multi-tasking operating system for mainframes that was finally retired last year, and is a distant ancestor of today's Linux — had many of the Grid's goals in its original mission statement (operation analogous to power services, system configurations that were changeable without having to reorganise the software, and so on). Other Grid precursors abound. The point is that the enthusiasm for the Grid today is not the result of some fresh and sudden insight into computer architecture. It is simply that the hardware has now improved enough to make an old idea work on a global scale.


Video
Reader Comments» Post a comment