[NEC] 1.7: DNA, P2P, and Privacy
list-replies@shirky.com
list-replies@shirky.com
Thu, 21 Nov 2002 09:46:28 -0500 (EST)
NEC @ Shirky.com, a mailing list about Networks, Economics, and Culture
Published periodically / # 1.7 / November 21, 2002
Subscribe at http://shirky.com/nec.html
In this issue:
- Introduction
- Essay: DNA, P2P, and Privacy
(Also at http://www.shirky.com/writings/privacy_p2p.html)
- The Three Phase Reaction to BlackPeopleLoveUs.com
- Questions I've Been Asking Myself:
- What can we learn from the constitutions of successful online
communities?
- Why is it so hard to make decisions online? Readers' responses
* Introduction =======================================================
Switching gears from social software for the moment, the essay in this
issue of NEC concerns peer-to-peer architecture, and in particular,
what happens to privacy when DNA becomes a distributed hash-like
identifier allowing individuals to be uniquely, universally and
unambiguously identified.
-clay
* Essay ==============================================================
DNA, P2P, and Privacy
(http://www.shirky.com/writings/privacy_p2p.html)
For decades, the privacy debate has centered on questions about
databases and database interoperability: How much information about
you exists in the world' databases? How easily is it retrieved? How
easily is it compared or combined with other information?
Databases have two key weaknesses that affect this debate. The first
is that they deal badly with ambiguity, and generally have to issue a
unique number, sometimes called a primary key, to every entity they
store information on. The US Social Security number is a primary key
that points to you, the 6-letter Passenger Name Record is a primary
key that points to a particular airline booking, and so on. This leads
to the second weakness: since each database maintains its own set of
primary keys, creating interoperability between different databases is
difficult and expensive, and generally requires significant advance
coordination.
Privacy advocates have relied on these weaknesses in creating legal
encumbrances to issuing and sharing primary keys. They believe,
rightly, that widely shared primary keys pose a danger to privacy.
(The recent case of Princeton using its high school applicants' Social
Security numbers to log in to the Yale admittance database highlights
these dangers.) The current worst-case scenario is a single universal
database in which all records -- federal, state, and local, public and
private -- would be unified with a single set of primary keys.
New technology brings new challenges however, and in the database
world the new challenge is not a single unified database, but rather
decentralized interoperability, interoperability brought about by a
single universally used ID. The ID is DNA. The interoperability comes
from the curious and unique advantages DNA has as a primary key. And
the effect will put privacy advocates in a position analogous to that
of the RIAA, forcing them to switch from fighting the creation of a
single central database to fighting a decentralized and interoperable
system of peer-to-peer information storage.
- DNA Markers
While much of the privacy debate around DNA focuses on the ethics of
predicting mental and physical fitness for job categories and
insurance premiums, this is too narrow and too long-range a view. We
don't even know yet how many genes there are in the human genome, so
our ability to make really sophisticated medical predictions based on
a person's genome is still some way off. However, long before that day
arrives, DNA will provide a cheap way to link a database record with a
particular person, in a way that is much harder to change or forge
than anything we've ever seen.
Everyone has a biological primary key embedded in every cell of their
body in the form of DNA, and everyone has characteristic zones of DNA
that can be easily read and compared. These zones serve as markers,
and they differ enough from individual to individual that with fewer
than a dozen of them, a person can be positively identified out of the
entire world's population.
DNA-as-marker, in other words, is a nearly perfect primary key, as
close as we can get to being unambiguous and unforgeable. If every
person has a primary key that points to their physical being, then the
debate about who gets to issue such a key are over, because the keys
are issued every time someone is born, and re-issued every time a new
cell is created. And if the keys already exist, then the
technological argument is not about creating new keys, but about
reading existing ones.
The race is on among several biotech firms to be able to sequence a
person's entire genome for $1000. The $1 DNA ID will be a side effect
of this price drop, and it's coming soon. When the price of reading
DNA markers drops below a dollar, it will be almost impossible to
control who has access to reading a person's DNA.
There are few if any legal precedents that would prevent collection of
this data, at least in the US. There are several large populations
that do not enjoy constitutional protections of privacy, such as the
armed services, prisoners, and children. Furthermore, most of the
controls on private databases rely on the silo approach, where an
organization can collect an almost unlimited amount of information
about you, provided they abide by the relatively lax rules that govern
sharing that information.
Even these weak protections have been enough, however, to prevent the
creation of a unified database, because the contents of two databases
cannot be easily merged without some shared primary key, and shared
primary keys require advance coordination. And it is here, in the area
of interoperability, that DNA markers will have the greatest effect on
privacy.
- You're the Same You Everywhere
Right now, things like alternate name spellings or alternate addresses
make positive matching difficult across databases. Its hard to tell if
Eric with the Wyoming driver's license and Shawn with the Florida
arrest record are the same person, unless there is other information
to tie them together. If two rows of two different databases are tied
to the same DNA ID, however, they point to the same person, no matter
what other material is contained in the databases, and no matter how
it is organized or labeled.
No more trying to figure out if Mr. Shuler and Mr. Schuller are the
same person, no more wondering if two John Smiths are different
people, no more trying to guess the gender of J. Lee. Identity
collapses to the body, in a way that is far more effective than
fingerprints, and far more easily compared across multiple databases
than more heuristic measures like retinal scans.
In this model, the single universal database never gets created, not
because privacy advocates prevent it, but because it is no longer
needed. If primary keys are issued by nature, rather than by each
database acting alone, then there is no more need for central
databases or advance coordination, because the contents of any two
DNA-holding databases can be merged on demand in something close to
real time.
Unlike the creation of a vast central database, even a virtual one,
the change here can come about piecemeal, with only a few DNA-holding
databases. A car dealer, say, could simply submit a DNA marker to a
person's bank asking for a simple yes-or-no match before issuing a
title. In the same way the mid-90s ID requirements for US domestic
travel benefited the airlines because it kept people from transferring
unused tickets to friends of family, we can expect businesses to like
the way DNA ties transactions to a single customer identity.
The privacy debate tends to be conducted as a religious one, with the
absolutists making the most noise. However, for a large number of
people, privacy is a relative rather than an absolute good. The use of
DNA as an ID will spread in part because people want it to, in the
form of credit cards that cannot be used in other hands or cars that
cannot be driven by other drivers. Likewise, demands that DNA IDs be
derived from populations who do not enjoy constitutional protections,
whether felons or children, will be hard to deflect as the cost of
reading an individual's DNA falls dramatically, and as the public sees
the effective use of DNA in things like rape and paternity cases.
- Peer-to-Peer Collation of Data
In the same way Kazaa has obviated the need for central storage or
coordination for the world's music, the use of DNA as an ID technology
makes radically decentralized data integration possible. With the
primary key problem solved, interoperability will arise as a side
effect, neither mandated nor coordinated centrally. Fighting this will
require different tactics, not least because it is a rear-guard
action. The keys and the readers both exist, and the price and general
availability of the technology all point to ubiquity and vanishingly
low cost within a decade.
This is a different kind of fight over privacy. As the RIAA has
discovered, fighting the growth of a decentralized and latent
capability is much harder than fighting organizations that rely on
central planning and significant resources, because there is no longer
any one place to focus the efforts, and no longer any small list of
organizations who can be targeted for preventive action. In a world
where database interoperability moves from a difficult and costly goal
to one that arises as a byproduct of the system, the important
question for privacy advocates is how they will handle the change.
* The Three Phase Reaction to BlackPeopleLoveUs.com ==================
Jonah Perretti, professional meme-hacker (no, really), launched
http://www.blackpeopleloveus.com earlier this year. BPLU is a site
dedicated to the earnest expression of a white liberal couple's
acceptance by their black friends. As usual with Jonah (who brought
you the "Nike/Sweatshop" meme of a year or so ago), the entire thing
is done with a straight face.
What interested me about it was that the site made not one but two
trips up the blogdex weblog index (http://blogdex.media.mit.edu/). The
first was a few weeks ago, when the blogosphere discovered it, and the
second was this week, shortly after the NY Times Style section in the
Sunday paper ran a story on it.
This is a sort of amplifier/echo chamber effect, where the first
blogdex wave amplified the signal enough to attract the attention of
the mainstream media, who in turn legitimated and publicized it in a
better fashion than the blogosphere itself could (as well as
legitimizing it for other outlets, such as NPR, who interviewed the
Perretti's this week.) This was followed by an echo chamber effect,
where a second wave of blogs posted it _because_ it appeared in the
Times.
I wonder if this is the beginning of a stable three-phase pattern: the
early weblog entries act as a kind of weak-signal detector, amplifying
a few signals out of the many, followed by some mainstream publication
lifting the work out of the blogosphere and broadcasting it widely,
followed by further by blogs that follow mainstream media rather than
leading it.
And if so, I wonder if it would be possible to find signature news
stories where there are both leading and trailing blogs (the current
debate on John Poindexter's 'Total Information Awareness" DARPA
program springs to mind), and I wonder if there is a group of blogs
that are consistently in the leading group?
* Questions I Am Asking Myself =======================================
- What can we learn from the constitutions of successful online
communities?
Most successful online communities have a constitution of some sort,
essentially a group-ratified agreement about 'how we do it around
here'. Like Britain, however, most such constitutions are not written
down.
Recently, I've become aware of a literary tradition in online
communities, essentially "The Story of How We Got Governance." A
number of sites have FAQ sections or other documents that outline
roughly the same pattern:
1. We wanted to do something cool, so we built Version 1.0 with little
thought for structure.
2. Certain pressures (scale, anonymity, bad actors, whatever) impinged
on a way the initial system worked.
3. Despair.
4. Re-design of the system to deflect certain behaviors and encourage
others.
5. Optional 5th Step: Repeat Steps 2-4 with Version 1.1
These documents are our best source of concrete wisdom about social
software. Here are a few I find particularly interesting -- I am
looking for more, if anyone has interesting examples.
- LambdaMOO Takes A New Direction, by the Wizards of LambdaMOO
The wizards depart, and then return quite crankily
http://www.cc.gatech.edu/classes/AY2001/cs6470_fall/LTAND.html
- How Did the Moderation System Develop? from the slashdot FAQ.
Gaming the system as the principle concern of system design
http://slashdot.org/faq/com-mod.shtml#cm520
- Our Replies to Our Critics from the Wikipedia FAQ
A committed group of ionsiders keeps things on an even keel
http://www.wikipedia.org/wiki/Wikipedia%3AOur_Replies_to_Our_Critics
One common feature is that these communities usually have a core
group, sometimes explicit, as with slashdot moderators, and sometimes
implicit, as with the Wikipedia's posse. I wonder if there is
something fundamental about this pattern of community governance?
- Why Is It So Hard to Make Decisions Online? Readers' Responses
Many interesting responses from readers from the question I posed in
Issue 1.5:
Geradline Joffre writes:
Your question is an interesting one but for me the answer is
simple. No matter how advanced we are in technology today, we still
have pretty much the same body and physiology that our ancestors
millions of years ago, i.e. we still share a lot in common with
animals.
If you observe a group of animals, you notice that they don't need
verbal communication to manage their social relations and understand
the group's rules -- non-verbal communication does everything. In a
group of humans, non-verbal communication also plays a major role
but, in an online environment, the non-verbal elements just
diseappears. We are so used to making use of those non-verbal
elements to determine our behaviour -and ulimately make decisions-,
that we feel lost when we don't have them anymore.
You are correct that programmer groups such as open-source project
do work well. But programmers are of a special kind. They are people
whose work is to communicate with machines, i.e. to communicate with
a very defined and unambiguous set of rules, a communication where
subtle non-verbal elements such as emotions have no place at
all. People who have the ability to communicate with computers don't
need as much as other human beings the non-verbal elements for an
effective communication. This is by the way something easier to
achieve for males than females. Open-source projects are
overwhemingly composed of males (if you have a look at sourceforge,
you will find it very hard to find a woman's name among projects'
members).
Karsten Self writes:
I suspect it's because many forums lack either of two things:
- Empowered members. Or rather, all are equally minimally
empowered: they can talk, or little else.
- Specifically empowered members. The group _can_ launch actions,
but several initiatives are likely to be triggered. These then
fight out amongst themselves for best solution.
Where forums _do_ seem to work (e.g.: LKML, linux-elitists,
free-sklyarov) is where:
- There is a specific, common, focus.
- The purpose of the forum is discussion. Decisionmaking is either
out of band or by acclaim ("I'll do...", "OK, do...").
- There are strong activist members (e.g.: Don Marti,
- linux-elitists,
or much of the free-sklyarov group).
- There is an actionable function with limited access to which
- single
members (or small groups of members) have control (e.g.: the
Linux Kernel Mailing List & commit privs on the Linux source
- tree).
Otherwise, forums largely become...blab sessions.
I've seen other examples. I wrote earlier about IWETHEY
[IWETHEY.org - ed]. This is a group of 20-30 core people, of whom
about a half dozen are good for getting off their asses to do stuff.
As a consequence, there are a mailing list, forum, TWiki, MUD, IRC
channel, largely independently operated from one another. Sort of a
stone-soup user group. In this case, loose coupling works pretty
well.
Mark Kraft writes, mostly about the ability of programmers to get
things done online:
I think I know a lot of the answer... It may be a controversial
answer, but I'm pretty sure it's at the heart of the matter.
Here are the reasons I suspect why decisions are hard to make online:
Culture: There are lots of sites that have large groups of
non-programmers working together towards a common goal. LJ has a
large support team, for instance, and most of them aren't
programmers. However, most of them *ARE* technically oriented with a
good familiarity with the web. To them, working together on the web
just makes sense. They might not know how to program, but they do
know how to utilize tools that programmers have made to create
forums, polls, etc. Still, these non-programmers who collaborate
together to this level are still an exception to the rule - most
people aren't at that technical level. There are also groups that do
a lot of organization online such as Amnesty International, Maybe
your real question is "Why is it hard to organize groups online to
get things done without having programmers be at the core of it
all?" To me, the answer seems to be obvious - the programmers help
enable others to participate in the process.
[...]
Design without coherent structure and intent: Developers tend to
create community-based features as individual "widgets". A poll
feature, a forum feature, etc. What they *don't* do often enough is
design those features with the intent of having them work seamlessly
together in order to allow people to make decisions and get things
done.
[...]
Programmers as facilitators: When a programmer designs a
community-based feature for a website, they are not only empowering
others to use their software -- they are inviting them into the
forum of public opinion. In other words, the user can now exercise
some degree of pressure on the decision making processes of the
project / site. Many programmers do not necessarily want to do this,
however, in that it also means giving up some degree of control.
Lack of coherent decision-making processes: What is the correct way
for people to make a decision? This is something that most every
open source developer needs to ask themselves. Should the lead
developer give his ok? Should a suggestion be posted about, with
coding to proceed based upon the feedback? (If so, does that mean
that the person who suggested the idea should be in charge of the
project?) Should someone code first and ask questions later,
perhaps leaving others to deal with the ramifications of what
they've created? Is the process consistent for all developers, or
are there "special exceptions" for some people? Frankly, I don't
know of any open source project that addresses all these issues both
well and consistently.
* End ====================================================================
Copyright 2002, Clay Shirky
Feel free to reprint, quote, or forward, so long as you credit me.