[NEC] no subject (file transmission)
list-replies@shirky.com
list-replies@shirky.com
Tue, 20 May 2003 15:26:43 -0400 (EDT)
NEC @ Shirky.com, a mailing list about Networks, Economics, and Culture
Published periodically / # 2.6 / May 20, 2003
Licensed under the Creative Commons Attribution License
Subscribe at http://shirky.com/nec.html
In this issue:
- Introduction
- Essay: Grid Supercomputing: The Next Push
(Also at http://www.shirky.com/writings/grids.html)
- Worth Reading
- Weblog on Social Software: http://corante.com/many/
* Introduction =======================================================
April was indeed the cruelest month -- a big push on a project on
long-term digital perservation with GBN and the Library of Congress,
crop of especially smart grad students to pay attention to, and weeks
of family flu kept me from publishing in April. My apologies.
This issue's essay is about Grid computing, the generalization of the
SETI@HOME pattern of distributed computation into a sort of
"supercomputing on tap" application. I was asked to be a
"provocateur" on a panel about netwoked computing, focussing in
particular on Wifi, Grid computing, and the Semantic Web. I had to
write back to the panel organizer and report that I had the least
provocative views possible on both Grids and the Semantic Web, namely
that I thought both would be moderately successful -- not
revolutionary, but not failures either.
This led to a realization: people who try to think clearly about
technology always run the risk of unconciously gravitating towards
technology its easy to think clearly about. It was interesting to
write about the Web in the early days because its importance could
hardly be overstated, or about WAP, because its wrongheadedness
ditto. It's harder to think about Gird computing or the semantic web,
because they are not so obviously headed for either ubiquity or
uselessness.
So this essay is an experiment in writing about something --
supercomputing on tap -- that is going to succeed, but will do so in a
way far less important than its proponents believe.
-clay
* Essay ==============================================================
Grid Supercomputing: The Next Push
http://www.shirky.com/writings/grids.html
Grid Computing is, according to the Grid Information Centre
[http://www.gridcomputing.com] a way to "...enable the sharing,
selection, and aggregation of a wide variety of geographically
distributed computational resources." It is, in other words, an
attempt to make Sun's famous pronouncement "The Network Is The
Computer" an even more workable proposition. (It is also an
instantiation of several of the patterns of decentralization that used
to travel together under the name peer-to-peer.)
Despite the potential generality of the Grid, most of the public
pronouncements are focusing on the use of Grids for supercomputing.
IBM defines it more narrowly: Grid Computing is "... applying
resources from many computers in a network-at the same time-to a
single problem" [http://www-1.ibm.com/grid/], and the MIT Technology
Review equated Grid technology with supercomputing on tap when it
named Grids one of "Ten Technologies That Will Change the World."
[http://www.technologyreview.com/reports/topicreports_emerging.asp]
This view is wrong. Supercomputing on tap won't live up to to this
change-the-world billing, because computation isn't a terribly
important part of what people do with computers. This is a lesson we
learned with PCs, and it looks like we will be relearning it with
Grids.
- The Misnomer of the Personal Computer
Though most computational power lives on the world's hundreds of
millions of PCs, most PCs are not used for computation most of the
time. There are two reasons for this, both of which are bad news for
predictions of a supercomputing revolution. The first is simply that
most people are not sitting at their computer for most hours of the
day. The second is because even when users are at their computers,
they are not tackling computationally hard problems, and especially
not ones that require batch processing -- submit question today, get
answer tomorrow (or next week.) Indeed, whenever users encounter
anything that feels even marginally like batch processing -- a
spreadsheet that takes seconds to sort, a Photoshop file that takes a
minute to render -- they begin hankering for a new PC, because they
care about peak performance, not total number of cycles available over
time. The only time the average PC performs any challenging
calculations is rendering the visual for The Sims or WarCraft.
Therein lies the conundrum of the Grid-as-supercomputer: the
oversupply of cycles the Grid relies on exists because of a lack of
demand. PCs are used as many things -- file cabinets and
communications terminals and typewriters and photo albums and
jukeboxes -- before they are used as literal computers. If most users
had batch applications they were willing to wait for even as long as
overnight, the first place they would look for spare cycles would be
on their own machines, not on some remote distributed supercomputer.
Simply running their own PC round the clock would offer a 10x to 20x
improvement, using hardware they already own.
If users needed Grid-like power, the Grid itself wouldn't work,
because the unused cycles the Grid is going to aggregate wouldn't
exist. Of all the patterns supported by decentralization, from
file-sharing to real-time collaboration to supercomputing,
supercomputing is the _least_ general.
- The Parallel with Push
There is a parallel between Grids and Push technology, that glamorous
flameout of the mid-90s. The idea behind Push, exemplified by the
data-displaying screensaver Pointcast, was that because users suffered
from limited bandwidth and periodic disconnection (e.g. laptops on
airplanes), they would sign up to have data pushed to them, which they
could then experience at their leisure. This, we were told, would
create a revolution in the way people use the internet. (This notion
reached its apotheosis in a Wired magazine cover story, "Push!", whose
subtitle read "Kiss your browser goodbye: The radical future of media
beyond the Web".)
As it turned out, user's response to poor connectivity was to agitate
for better connectivity, because like CPUs, users want bandwidth that
provides good peak performance, even if that means most of it gets
"wasted." Shortly after the Wired cover, it was PointCast we kissed
goodbye.
Push's collapse was made all the more spectacular because of its name.
The label Push seemed to suggest a sweeping new pattern of great
importance. Had the technology been given a duller but more
descriptive name, like "forward caching," it would have generated much
less interest in the beginning, but might also not have been so
prematurely consigned to the list of failed technologies.
Forward caching is in fact a key part of some applications. In
particular, companies building decentralized groupware like Groove
[http://www.groove.com], Kubi Software [http://www.kubisoftware.com/],
and Shinkuro [http://www.shinkuro.com/], all of whom use forward
caching of shared files to overcome the difficulties caused by limited
bandwidth and partially disconnected nodes, just the issues Push was
supposed to address. By pushing the name Push, the Pointcast's of the
world made it harder to see that though forward caching was not
universally important, it was still valuable in some areas.
- Distributed Batch Processing
So it is with Grids. The evocative name suggests that computation is
so critical that we must have a global infrastructure to provide all
those cycles we'll be needing next time our boss asks us to model an
earthquake, or we have to help our parents crack a cryptographic key.
The broadness of the term masks the specialised nature of the
technology, which should probably be called "distributed batch
processing."
Like forward caching, distributed batch processing is useful in a
handful of areas. The SETI@Home project runs on distributed batch
processing, as does the distributed.net cryptographic key-breaking
tool. The sequencing of the SARS virus happened using distributed
batch processing. Distributed batch processing could be useful in
fields like game theory, where scenarios could be exhaustively tested
on the cheap, or animated film, where small studios or even
individuals could afford acces to Pixar-like render farms.
Distributed batch processing is real progress for people who need
supercomputing power, but having supercomputing on tap doesn't make
you a researcher anymore than having surfboard wax on tap would make
you a surfer. Indeed, to the consternation of chip manufacturers (and
the delight of researchers who want cheap cycles), people don't even
have much real use for the computational power on the machines they
buy today.
History has not been kind to business predictions based on an
undersupply of cycles, and the business case for selling access to
supercomputing on tap is grim. Assuming that a $750 machine with a 2
gigahertz chip can be used for 3 years, commodity compute time now
costs roughly a penny a gigahertz/hour. If Grid access costs more
than a penny a ghz/hr, building a dedicated supercomputer starts to be
an economical proposition, relative to buying cycles from a Grid.
(And of course Moore's Law sees to it that these economics get more
adverse every year.)
Most of the for-profit work on supercomputing Grids will be in helping
businesses harness their employees' PCs so that the CFO can close the
books quickly -- cheap, one-shot contracts, in other words, that
mostly displace money from the purchase of new servers. The cost
savings for the average business will be nice of course, but saving
money by deferring server purchases is hardly a revolution.
- People Matter More Than Machines
We have historically overestimated the value of connecting machines to
one another, and underestimated the value of connecting people, and by
emphasizing supercomputing on tap, the proponents of Grids are making
that classic mistake anew. During the last great age of batch
processing, the ARPAnet's designers imagined that the nascent network
would be useful as a way of providing researchers access to batch
processing at remote locations. This was wrong, for two reasons:
first, it turned out researchers were far more interested in getting
their own institutions to buy computers they could use locally than in
using remote batch processing, and Moore's Law made that possible as
time passed. Next, once email was ported to the network, it became a
far more important part of the ARPAnet backbone than batch processing
was. Then as now, access to computing power mattered less to the
average network user than access to one another.
Though Sun was incredibly prescient in declaring "The Network is the
Computer" at a time when PCs didn't even ship with built-in modems,
the phrase is false in some important ways -- a network is a different
kind of thing than a computer. As long ago as 1968, J.R. Licklider
[http://memex.org/licklider.pdf] predicted that computers would one
day be more important as devices of communication than of computation,
a prediction that came true when email overtook the spreadsheet as the
core application driving PC purchases.
What was true of the individual PC is true of the network as well --
changes in computational power are nice, but changes in communications
power are profound. As we learned with Push, an intriguing name is no
substitute for general usefulness. Networks are most important as
ways of linking unevenly distributed resources -- I know something you
don't know; you have something I don't have -- and Grid technology
will achieve general importance to the degree that it supports those
kinds of patterns. The network applications that let us communicate
and share in heterogeneous environments, from email to Kazaa, are far
more important uses of the network than making all the underlying
computers behave as a single supercomputer.
-=-
* Worth Reading =========================================================
Corante has launched Many-to-Many, a new weblog tracking social
software, at http://www.corante.com/many/. Contributors are Liz
Lawley, Seb Paquet, Ross Mayfield, Jessica Hammer, and myself, and the
goal of the site is to illuminate "...new developments in the social
software field, and also to provide commentary and conversation on the
uses of social software in varying contexts."
RSS feed at http://www.webcrimson.com/rss/many.rss
* End ====================================================================
This work is licensed under the Creative Commons Attribution License.
The licensor permits others to copy, distribute, display, and perform
the work. In return, licensees must give the original author credit.
To view a copy of this license, visit
http://creativecommons.org/licenses/by/1.0
or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
2003, Clay Shirky