Inside The NSA's Secret Tool For Mapping Your Social Network
Inside The NSA's Secret Tool For Mapping Your Social Network
Edward Snowden revealed the agency’s phone-record tracking program. But thanks to
“precomputed contact chaining,” that database was much more powerful than anyone
knew.
I N T H E S U MM E R of
2013, I spent my days sifting through the most
extensive archive of top-secret files that had ever reached the hands of an
American journalist. In a spectacular act of transgression against the
National Security Agency, where he worked as a contractor, Edward
Snowden had transmitted tens of thousands of classified documents to
me, the columnist Glenn Greenwald, and the documentary filmmaker
Laura Poitras.
I decided to delve more deeply. The public debate was missing important
information. It occurred to me that I did not even know what the records
looked like. At first I imagined them in the form of a simple, if gargantuan,
list. I assumed that the NSA cleaned up the list—date goes here, call
duration there—and converted it to the agency’s preferred “atomic sigint
data format.” Otherwise I thought of the records as inert. During a
conversation at the Aspen Security Forum that July, six weeks after
Snowden’s first disclosure and three months after the Boston Marathon
bombing, Admiral Dennis Blair, the former director of national intelligence,
assured me that the records were “stored,” untouched, until the next
Boston bomber came along.
These could be cast as abstract concerns, but I thought them quite real.
By September of that year, it dawned on me that there were also concrete
questions that I had not sufficiently explored. Where in the innards of the
NSA did the phone records live? What happened to them there? The
Snowden archive did not answer those questions directly, but there were
clues.
I stumbled across the first clue later that month. I had become interested in
the NSA’s internal conversation about “bulk collection,” the acquisition of
high-volume data sets in their entirety. Phone records were one of several
kinds. The agency had grown more and more adept, brilliantly creative in
fact, at finding and swallowing other people’s information whole. Lately the
NSA had begun to see that it consumed too much to digest. Midlevel
managers and engineers sounded notes of alarm in briefings prepared for
their chains of command. The cover page of one presentation asked “Is It
the End of the SIGINT World as We Have Come to Know It?” The authors
tried for a jaunty tone but had no sure answer. The surveillance
infrastructure was laboring under serious strain.
One name caught my eye on a chart that listed systems at highest risk:
Mainway. I knew that one. NSA engineers had built Mainway in urgent
haste after September 11, 2001. Vice President Dick Cheney’s office had
drafted orders, signed by President George W. Bush, to do something the
NSA had never done before. The assignment, forbidden by statute, was to
track telephone calls made and received by Americans on American soil.
The resulting operation was the lawless precursor of the broader one that I
was looking at now.
M A I N W A Y C A M E T O life
alongside Stellarwind, the domestic surveillance
program created by Cheney in the first frantic weeks after al Qaeda flew
passenger airplanes into the Pentagon and World Trade Center.
Stellarwind defined the operation; Mainway was a tool to carry it out.
At the time, the NSA knew how to do this sort of thing with foreign
telephone calls, but it did not have the machinery to do it at home.
When NSA director Mike Hayden received the execution order on October
4, 2001 for “the vice president’s special program,” NSA engineers
assembled a system from bare metal and borrowed code within a matter
of days, a stupendous achievement under pressure. They commandeered
50 state-of-the-art computer servers from Dell, which was about to ship
them to another customer, and lashed them into a quick and dirty but
powerful cluster. Hayden cleared out space in a specially restricted wing
of OPS 2B, an inner sanctum of the gleaming, mirrored headquarters
complex at Fort Meade, Maryland. When the cluster expanded,
incorporating some 200 machines, Mainway spilled into an annex in the
Tordella Supercomputer Facility nearby. Trusted lieutenants began calling
in a small group of analysts, programmers, and mathematicians on
October 6 and 7.
It was impossible to hide the hubbub from other NSA personnel, who saw
new equipment arriving under armed escort at a furious pace, but even
among top clearance holders hardly anyone knew what was going on.
Stellarwind was designated as ECI, “exceptionally controlled information,”
the most closely held classification of all. From his West Wing office,
Cheney ordered that Stellarwind be concealed from the judges of the
FISA Court and from members of the intelligence committees in
Congress.
According to my sources and the documents I worked through in the fall of
2013, Mainway soon became the NSA’s most important tool for mapping
social networks—an anchor of what the agency called Large Access
Exploitation. “Large” is not an adjective in casual use at Fort Meade.
Mainway was built for operations at stupendous scale. Other systems
parsed the contents of intercepted communications: voice, video, email
and chat text, attachments, pager messages, and so on. Mainway was
queen of metadata, foreign and domestic, designed to find patterns that
content did not reveal. Beyond that, Mainway was a prototype for still
more ambitious plans. Next-generation systems, their planners wrote,
could amplify the power of surveillance by moving “from the more
traditional analysis of what is collected to the analysis of what to collect.”
Patterns gleaned from call records would identify targets in email or
location databases, and vice versa. Metadata was the key to the NSA’s
plan to “identify, track, store, manipulate and update relationships” across
all forms of intercepted content. An integrated map, presented graphically,
would eventually allow the NSA to display nearly anyone’s movements
and communications on a global scale. In their first mission statement,
planners gave the project the unironic name “the Big Awesome Graph.”
Inevitably it acquired a breezy acronym, “the BAG.”
The crucial discovery on this subject turned up at the bottom right corner
of a large network diagram prepared in 2012. A little box in that corner,
reproduced below, finally answered my question about where the NSA
stashed the telephone records that Blair and I talked about. The records
lived in Mainway. The implications were startling.
For reasons that will become apparent soon, I want to reproduce the entry
for Mainway in the SSO Dictionary, a classified NSA reference document:
---------------
The NSA has many volume problems, actually. Too much information
moving too fast across global networks. Too much to ingest, too much to
store, too much to retrieve through available pipes from distant collection
points. Too much noise drowning too little signal. In the passage I just
quoted, however, the volume problem referred to something else—
something deeper inside the guts of the surveillance machine. It was the
strain of an unbounded appetite on the NSA’s digestive tract. Collection
systems were closing their jaws on more data than they could chew.
Processing, not storage, was the problem.
For a long time, intelligence officials explained away the call records
database by quoting a remark from President Bush. “It seems like to me
that if somebody is talking to al Qaeda, we want to know why,” he had
said.
In fact, that was not at all the way the NSA used the call records. The
program was designed to find out whether, not why, US callers had some
tie to a terrorist conspiracy—and to do so, it searched us all. Working
through the FBI, the NSA assembled a five-year inventory of phone calls
from every account it could touch. Trillions of calls. Nothing like that was
needed to find the numbers on a bad guy’s telephone bill.
Software tools mapped the call records as “nodes” and “edges” on a grid
so large that the human mind, unaided, could not encompass it. Nodes
were dots on the map, each representing a telephone number. Edges
were lines drawn between the nodes, each representing a call. A related
tool called MapReduce condensed the trillions of data points into
summary form that a human analyst could grasp.
Network theory called this map a social graph. It modeled the relationships
and groups that defined each person’s interaction with the world. The size
of the graph grew exponentially as contact chaining progressed. The
whole point of chaining was to push outward from a target’s immediate
contacts to the contacts of contacts, then contacts of contacts of contacts.
Each step in that process was called a hop.
Double a penny once a day and you reach $1 million in less than a month.
That is what exponential growth looks like with a base of two. As contact
chaining steps through its hops, the social graph grows much faster. If the
average person calls or is called by 10 other people a year, then each hop
produces a tenfold increase in the population of the NSA’s contact map.
Most of us talk on the phone with a lot more than 10 others. Whatever that
number, dozens or hundreds, you multiply it by itself to measure the
growth at each hop.
Bacon shared screen credits with a long list of actors. Those were his
direct links, one hop from Bacon himself. Actors who never worked
alongside him, but appeared in a film with someone who had, were two
hops away from Bacon. Scarlett Johansson never worked with Bacon, but
each of them had starred alongside Mickey Rourke: Bacon in Diner,
Johannson in Iron Man 2. Two hops, through Rourke, connected them. If
you kept on playing you discovered that Bacon was seldom more than two
hops away from any actor, however removed in time and movie style. In a
single-industry town like Hollywood, links like these might make intuitive
sense. More surprising, if you did not spend much time around logarithms,
was the distance traveled by one or two hops through the vastly larger
NSA data set. Academic research suggested that an average of three
hops—the same number Inglis mentioned—could trace a path between
any two Americans.
When the Boston marathon bombs exploded in April 2013, the Graph-in-
Memory was ready. Absent unlucky data gaps, it already held a summary
map of the contacts revealed by the Tsarnaev brothers’ calls. The
underlying details—dates, times, durations, busy signals, missed calls,
and “call waiting events”—were easily retrieved on demand. Mainway had
already processed them. With the first hop precomputed, the Graph-in-
Memory could make much quicker work of the second and the third.
As I parsed the documents and interviewed sources in the fall of 2013, the
implications finally sank in. The NSA had built a live, ever-updating social
graph of the US.
Our phone records were not in cold storage. They did not sit untouched.
They were arranged in a one-hop contact chain of each to all. All kinds of
secrets—social, medical, political, professional—were precomputed, 24/7.
Ledgett told me he saw no cause for concern because “the links are
unassembled until you launch a query.” I saw a database that was
preconfigured to map anyone’s life at the touch of a button.
I A M W E L L aware
that a person could take this line of thinking too far.
Maybe I have. The US is not East Germany. As I pieced this picture
together, I had no reason to believe the NSA made corrupt use of its real-
time map of American life. The rules imposed some restrictions on use of
US telephone records, even after Bush’s attorney general, Michael
Mukasey, blew a hole in them. Only 22 top officials, according to the
Privacy and Civil Liberties Oversight Board, had authority to order a
contact chain to be built from data in Mainway’s FISA partitions.
But history has not been kind to the belief that government conduct
always follows rules or that the rules will never change in dangerous
ways. Rules can be bypassed or rewritten—with or without notice, with or
without malignant intent, by a few degrees at a time or more than a few.
Government might decide one day to look in Mainway or a comparable
system for evidence of a violent crime, or any crime, or any suspicion.
Governments have slid down that slope before. Within living memory,
Richard Nixon had ordered wiretaps of his political enemies. The FBI,
judging Martin Luther King Jr. a “dangerous and effective Negro,” used
secret surveillance to record his sexual liaisons. A top lieutenant of J.
Edgar Hoover invited King to kill himself or face exposure.
Meaningful abuse of surveillance had come much more recently. The FBI
illegally planted hundreds of GPS tracking devices without warrants. New
York police spied systemically on mosques. Governments at all levels
used the power of the state most heavy-handedly, sometimes illegally, to
monitor communities disadvantaged by poverty, race, religion, ethnicity,
and immigration status. As a presidential candidate, Donald Trump
threatened explicitly to put his opposing candidate in jail. Once in office,
he asserted the absolute right to control any government agency. He
placed intense pressure on the Justice Department, publicly and privately,
to launch criminal investigations of his critics.