[NEC] 1.7: DNA, P2P, and Privacy

list-replies@shirky.com list-replies@shirky.com
Thu, 21 Nov 2002 09:46:28 -0500 (EST)


NEC @ Shirky.com, a mailing list about Networks, Economics, and Culture 

           Published periodically / # 1.7 / November 21, 2002 
               Subscribe at http://shirky.com/nec.html

In this issue:

 - Introduction
 - Essay: DNA, P2P, and Privacy
    (Also at http://www.shirky.com/writings/privacy_p2p.html)
 - The Three Phase Reaction to BlackPeopleLoveUs.com
 - Questions I've Been Asking Myself:
   - What can we learn from the constitutions of successful online
     communities? 
   - Why is it so hard to make decisions online? Readers' responses

* Introduction =======================================================

Switching gears from social software for the moment, the essay in this
issue of NEC concerns peer-to-peer architecture, and in particular,
what happens to privacy when DNA becomes a distributed hash-like
identifier allowing individuals to be uniquely, universally and
unambiguously identified.

-clay

* Essay ==============================================================

DNA, P2P, and Privacy
  (http://www.shirky.com/writings/privacy_p2p.html)

For  decades,  the privacy  debate  has  centered  on questions  about
databases  and database interoperability:  How much  information about
you exists  in the world' databases?  How easily is  it retrieved? How
easily is it compared or combined with other information?

Databases have two  key weaknesses that affect this  debate. The first
is that they deal badly with  ambiguity, and generally have to issue a
unique number,  sometimes called a  primary key, to every  entity they
store information on.  The US Social Security number  is a primary key
that points  to you, the 6-letter  Passenger Name Record  is a primary
key that points to a particular airline booking, and so on. This leads
to the second  weakness: since each database maintains  its own set of
primary keys, creating interoperability between different databases is
difficult  and expensive, and  generally requires  significant advance
coordination.

Privacy advocates  have relied on  these weaknesses in  creating legal
encumbrances  to  issuing and  sharing  primary  keys.  They  believe,
rightly, that  widely shared  primary keys pose  a danger  to privacy.
(The recent case of Princeton using its high school applicants' Social
Security numbers to log in to the Yale  admittance database highlights
these dangers.)  The current worst-case scenario is a single universal
database in which all records -- federal, state, and local, public and
private -- would be unified with a single set of primary keys.

New  technology brings  new challenges  however, and  in  the database
world the new  challenge is not a single  unified database, but rather
decentralized  interoperability, interoperability  brought about  by a
single universally used ID. The  ID is DNA. The interoperability comes
from the curious  and unique advantages DNA has as  a primary key. And
the effect will put privacy  advocates in a position analogous to that
of the  RIAA, forcing them to  switch from fighting the  creation of a
single central database to  fighting a decentralized and interoperable
system of peer-to-peer information storage.

- DNA Markers

While much of  the privacy debate around DNA focuses  on the ethics of
predicting  mental  and  physical   fitness  for  job  categories  and
insurance premiums, this  is too narrow and too  long-range a view. We
don't even know  yet how many genes there are in  the human genome, so
our ability to make  really sophisticated medical predictions based on
a person's genome is still some way off. However, long before that day
arrives, DNA will provide a cheap way to link a database record with a
particular person,  in a way  that is much  harder to change  or forge
than anything we've ever seen.

Everyone has a biological primary  key embedded in every cell of their
body in the form of DNA,  and everyone has characteristic zones of DNA
that can  be easily read and  compared. These zones  serve as markers,
and they differ  enough from individual to individual  that with fewer
than a dozen of them, a person can be positively identified out of the
entire world's population.

DNA-as-marker, in  other words,  is a nearly  perfect primary  key, as
close as  we can get to  being unambiguous and  unforgeable.  If every
person has a primary key that points to their physical being, then the
debate about who  gets to issue such a key are  over, because the keys
are issued every time someone is  born, and re-issued every time a new
cell  is   created.   And  if   the  keys  already  exist,   then  the
technological  argument is  not  about creating  new  keys, but  about
reading existing ones.

The race  is on among several biotech  firms to be able  to sequence a
person's entire genome for $1000. The  $1 DNA ID will be a side effect
of this  price drop, and it's  coming soon. When the  price of reading
DNA  markers drops below  a dollar,  it will  be almost  impossible to
control who has access to reading a person's DNA.

There are few if any legal precedents that would prevent collection of
this data,  at least  in the US.  There are several  large populations
that do not  enjoy constitutional protections of privacy,  such as the
armed  services, prisoners,  and  children. Furthermore,  most of  the
controls  on private  databases rely  on the  silo approach,  where an
organization  can collect  an almost  unlimited amount  of information
about you, provided they abide by the relatively lax rules that govern
sharing that information.

Even these weak protections have  been enough, however, to prevent the
creation of a unified database,  because the contents of two databases
cannot be  easily merged without  some shared primary key,  and shared
primary keys require advance coordination. And it is here, in the area
of interoperability, that DNA markers will have the greatest effect on
privacy.

- You're the Same You Everywhere

Right now, things like alternate name spellings or alternate addresses
make positive matching difficult across databases. Its hard to tell if
Eric  with the  Wyoming driver's  license and  Shawn with  the Florida
arrest record are  the same person, unless there  is other information
to tie them together. If two  rows of two different databases are tied
to the same DNA ID, however, they point to the same person,  no matter
what other material  is contained in the databases,  and no matter how
it is organized or labeled.

No more  trying to figure out if  Mr. Shuler and Mr.  Schuller are the
same  person, no  more  wondering  if two  John  Smiths are  different
people,  no  more trying  to  guess the  gender  of  J. Lee.  Identity
collapses  to the  body, in  a  way that  is far  more effective  than
fingerprints, and  far more easily compared  across multiple databases
than more heuristic measures like retinal scans.

In this model,  the single universal database never  gets created, not
because  privacy advocates  prevent it,  but because  it is  no longer
needed.  If  primary keys  are issued by  nature, rather than  by each
database  acting  alone,  then  there  is no  more  need  for  central
databases  or advance coordination,  because the  contents of  any two
DNA-holding databases  can be merged  on demand in something  close to
real time.

Unlike the  creation of a vast  central database, even  a virtual one,
the change here can come  about piecemeal, with only a few DNA-holding
databases. A  car dealer, say, could  simply submit a DNA  marker to a
person's bank  asking for  a simple yes-or-no  match before  issuing a
title.  In  the same way the  mid-90s ID requirements  for US domestic
travel benefited the airlines because it kept people from transferring
unused tickets to friends of  family, we can expect businesses to like
the way DNA ties transactions to a single customer identity.

The privacy debate tends to be  conducted as a religious one, with the
absolutists making  the most  noise.  However, for  a large  number of
people, privacy is a relative rather than an absolute good. The use of
DNA as  an ID will spread  in part because  people want it to,  in the
form of credit  cards that cannot be used in other  hands or cars that
cannot be driven  by other drivers. Likewise, demands  that DNA IDs be
derived from populations who  do not enjoy constitutional protections,
whether felons  or children, will  be hard to  deflect as the  cost of
reading an individual's DNA falls dramatically, and as the public sees
the effective use of DNA in things like rape and paternity cases.

- Peer-to-Peer Collation of Data

In the  same way Kazaa  has obviated the  need for central  storage or
coordination for the world's music, the use of DNA as an ID technology
makes  radically  decentralized data  integration  possible. With  the
primary  key problem  solved, interoperability  will arise  as  a side
effect, neither mandated nor coordinated centrally. Fighting this will
require  different  tactics, not  least  because  it  is a  rear-guard
action. The keys and the readers both exist, and the price and general
availability of  the technology all point to  ubiquity and vanishingly
low cost within a decade.

This  is a  different kind  of  fight over  privacy. As  the RIAA  has
discovered,  fighting  the  growth   of  a  decentralized  and  latent
capability  is much harder  than fighting  organizations that  rely on
central planning and significant resources, because there is no longer
any one  place to focus the efforts,  and no longer any  small list of
organizations who  can be targeted  for preventive action. In  a world
where database interoperability moves from a difficult and costly goal
to  one  that arises  as  a byproduct  of  the  system, the  important
question for privacy advocates is how they will handle the change.


* The Three Phase Reaction to BlackPeopleLoveUs.com ==================

Jonah Perretti, professional meme-hacker (no, really), launched
http://www.blackpeopleloveus.com earlier this year. BPLU is a site
dedicated to the earnest expression of a white liberal couple's
acceptance by their black friends. As usual with Jonah (who brought
you the "Nike/Sweatshop" meme of a year or so ago), the entire thing
is done with a straight face.

What interested me about it was that the site made not one but two
trips up the blogdex weblog index (http://blogdex.media.mit.edu/). The
first was a few weeks ago, when the blogosphere discovered it, and the
second was this week, shortly after the NY Times Style section in the
Sunday paper ran a story on it.

This is a sort of amplifier/echo chamber effect, where the first
blogdex wave amplified the signal enough to attract the attention of
the mainstream media, who in turn legitimated and publicized it in a
better fashion than the blogosphere itself could (as well as
legitimizing it for other outlets, such as NPR, who interviewed the
Perretti's this week.) This was followed by an echo chamber effect,
where a second wave of blogs posted it _because_ it appeared in the
Times.

I wonder if this is the beginning of a stable three-phase pattern: the
early weblog entries act as a kind of weak-signal detector, amplifying
a few signals out of the many, followed by some mainstream publication
lifting the work out of the blogosphere and broadcasting it widely,
followed by further by blogs that follow mainstream media rather than
leading it.

And if so, I wonder if it would be possible to find signature news
stories where there are both leading and trailing blogs (the current
debate on John Poindexter's 'Total Information Awareness" DARPA
program springs to mind), and I wonder if there is a group of blogs
that are consistently in the leading group?

* Questions I Am Asking Myself =======================================

- What can we learn from the constitutions of successful online
  communities?

Most successful online communities have a constitution of some sort,
essentially a group-ratified agreement about 'how we do it around
here'. Like Britain, however, most such constitutions are not written
down.

Recently, I've become aware of a literary tradition in online
communities, essentially "The Story of How We Got Governance." A
number of sites have FAQ sections or other documents that outline
roughly the same pattern:

1. We wanted to do something cool, so we built Version 1.0 with little
  thought for structure.
2. Certain pressures (scale, anonymity, bad actors, whatever) impinged
  on a way the initial system worked.
3. Despair.
4. Re-design of the system to deflect certain behaviors and encourage
  others. 
5. Optional 5th Step: Repeat Steps 2-4 with Version 1.1

These documents are our best source of concrete wisdom about social
software. Here are a few I find particularly interesting -- I am
looking for more, if anyone has interesting examples.

 - LambdaMOO Takes A New Direction, by the Wizards of LambdaMOO
    The wizards depart, and then return quite crankily
    http://www.cc.gatech.edu/classes/AY2001/cs6470_fall/LTAND.html
 - How Did the Moderation System Develop? from the slashdot FAQ.
    Gaming the system as the principle concern of system design
    http://slashdot.org/faq/com-mod.shtml#cm520
 - Our Replies to Our Critics from the Wikipedia FAQ
    A committed group of ionsiders keeps things on an even keel 
    http://www.wikipedia.org/wiki/Wikipedia%3AOur_Replies_to_Our_Critics

One common feature is that these communities usually have a core
group, sometimes explicit, as with slashdot moderators, and sometimes
implicit, as with the Wikipedia's posse. I wonder if there is
something fundamental about this pattern of community governance?

- Why Is It So Hard to Make Decisions Online? Readers' Responses

Many interesting responses from readers from the question I posed in
Issue 1.5: 

Geradline Joffre writes:

  Your question is an interesting one but for me the answer is
  simple. No matter how advanced we are in technology today, we still
  have pretty much the same body and physiology that our ancestors
  millions of years ago, i.e. we still share a lot in common with
  animals.

  If you observe a group of animals, you notice that they don't need
  verbal communication to manage their social relations and understand
  the group's rules -- non-verbal communication does everything. In a
  group of humans, non-verbal communication also plays a major role
  but, in an online environment, the non-verbal elements just
  diseappears. We are so used to making use of those non-verbal
  elements to determine our behaviour -and ulimately make decisions-,
  that we feel lost when we don't have them anymore.

  You are correct that programmer groups such as open-source project
  do work well. But programmers are of a special kind. They are people
  whose work is to communicate with machines, i.e. to communicate with
  a very defined and unambiguous set of rules, a communication where
  subtle non-verbal elements such as emotions have no place at
  all. People who have the ability to communicate with computers don't
  need as much as other human beings the non-verbal elements for an
  effective communication. This is by the way something easier to
  achieve for males than females.  Open-source projects are
  overwhemingly composed of males (if you have a look at sourceforge,
  you will find it very hard to find a woman's name among projects'
  members).

Karsten Self writes:

  I suspect it's because many forums lack either of two things:

  - Empowered members.  Or rather, all are equally minimally
    empowered: they can talk, or little else.

  - Specifically empowered members.  The group _can_ launch actions,
    but several initiatives are likely to be triggered.  These then
    fight out amongst themselves for best solution.

  Where forums _do_ seem to work (e.g.:  LKML, linux-elitists,
  free-sklyarov) is where:

  - There is a specific, common, focus.
  - The purpose of the forum is discussion.  Decisionmaking is either
    out of band or by acclaim ("I'll do...", "OK, do...").
  - There are strong activist members (e.g.:  Don Marti,
  - linux-elitists,
    or much of the free-sklyarov group).
  - There is an actionable function with limited access to which
  - single
    members (or small groups of members) have control (e.g.:  the
    Linux Kernel Mailing List & commit privs on the Linux source
  - tree).

  Otherwise, forums largely become...blab sessions.

  I've seen other examples.  I wrote earlier about IWETHEY
  [IWETHEY.org - ed].  This is a group of 20-30 core people, of whom
  about a half dozen are good for getting off their asses to do stuff.
  As a consequence, there are a mailing list, forum, TWiki, MUD, IRC
  channel, largely independently operated from one another.  Sort of a
  stone-soup user group.  In this case, loose coupling works pretty
  well.

Mark Kraft writes, mostly about the ability of programmers to get
things done online:

  I think I know a lot of the answer... It may be a controversial
  answer, but I'm pretty sure it's at the heart of the matter.

  Here are the reasons I suspect why decisions are hard to make online:

  Culture: There are lots of sites that have large groups of
  non-programmers working together towards a common goal. LJ has a
  large support team, for instance, and most of them aren't
  programmers. However, most of them *ARE* technically oriented with a
  good familiarity with the web. To them, working together on the web
  just makes sense. They might not know how to program, but they do
  know how to utilize tools that programmers have made to create
  forums, polls, etc. Still, these non-programmers who collaborate
  together to this level are still an exception to the rule - most
  people aren't at that technical level. There are also groups that do
  a lot of organization online such as Amnesty International, Maybe
  your real question is "Why is it hard to organize groups online to
  get things done without having programmers be at the core of it
  all?" To me, the answer seems to be obvious - the programmers help
  enable others to participate in the process.

[...]

  Design without coherent structure and intent: Developers tend to
  create community-based features as individual "widgets". A poll
  feature, a forum feature, etc. What they *don't* do often enough is
  design those features with the intent of having them work seamlessly
  together in order to allow people to make decisions and get things
  done.

[...]

  Programmers as facilitators: When a programmer designs a
  community-based feature for a website, they are not only empowering
  others to use their software -- they are inviting them into the
  forum of public opinion. In other words, the user can now exercise
  some degree of pressure on the decision making processes of the
  project / site. Many programmers do not necessarily want to do this,
  however, in that it also means giving up some degree of control.

  Lack of coherent decision-making processes: What is the correct way
  for people to make a decision? This is something that most every
  open source developer needs to ask themselves. Should the lead
  developer give his ok?  Should a suggestion be posted about, with
  coding to proceed based upon the feedback? (If so, does that mean
  that the person who suggested the idea should be in charge of the
  project?)  Should someone code first and ask questions later,
  perhaps leaving others to deal with the ramifications of what
  they've created? Is the process consistent for all developers, or
  are there "special exceptions" for some people? Frankly, I don't
  know of any open source project that addresses all these issues both
  well and consistently.

* End ====================================================================

Copyright 2002, Clay Shirky
Feel free to reprint, quote, or forward, so long as you credit me.