From Algorithms To Stories.
From Algorithms To Stories.
to Stories
Jonathan Stray
Columbia / ProPublica / Overview
or...
three hard lessons
in building tools for
computational
journalism
Science Journalism
Journalism through Science
Computational Journalism
Stories will emerge from stacks of nancial disclosure
forms, court records, legislative hearings, ocials'
calendars or meeting notes, and regulators' email messages
that no one today has time or money to mine. With a suite
of reporting tools, a journalist will be able to scan,
transcribe, analyze, and visualize the paFerns in these
documents.
- Cohen, Hamilton, Turner, 2011
and then...
nobody used it
three years later...
Lesson 1
Workow >> Algorithm
User testing!
Loaded confirmation link, which goes to /docsets. "Hmm. What do I do now?" Eventually
clicked import link. "I need more guidance what to do next." Import pane opened to DC
login. Looked like he was about to type in credentials. Then: "I can't really do any of these
now." Eventually saw "example document sets" and clicked.
Cloned caracas-cables example set. Waited. Understood when document set import
complete. Then hesitated. Didn't know where to click to open. Eventually clicked.
"In general, you could be way more communicative."
Moved mouse to document list immediately. "For some reason, this drew me." Clicked around
doc list. "What am I looking at?"
Moved to tree view. Clicked + without hesitation to open node. Saw document in viewer
change. "It's not clear what I'm looking at in the viewer." Eventually: "Which document is
showing when I click a node? Is it the first?"
A little later, more conversationally: "I don't know how useful the document list is." He said this
twice at different points. "Is this a comma separated list of documents? It just looks like one
block of text." Suggested a horizontal delimiter of some sort.
Lesson 2
It's humans + machines
By Maria Kiselyova
(Reuters) - Russian mobile phone operator Vimpelcom has become the
latest company to come under scrutiny over its operations in Uzbekistan, an
authoritarian country where rival MTS had its assets confiscated.
U.S.-listed Vimpelcom, Uzbekistan's biggest mobile operator by subscribers,
said on Wednesday that it was being investigated by the U.S. Securities and
Exchange Commission (SEC) and Dutch authorities.
Lesson 3
Real data is messy
PDF dumps
Printed, scanned emails
Scraping thousands of pages from an antique site
CD full of Excel files
...
Overall precision = 77%
Overall recall = 30%
...and this is on the cleanest possible data
Meta-lesson
You don't know what
the user's problem is
BeFer metrics?
How many stories got done?
o Are you solving a niche problem?
o Would resources have been better spent on reporting?
Journalism as a cycle
Action
Data
Reporting
User
Distribution
Story
Use it!
overviewproject.org
Code it!
github.com/overview
Thank you!
Knight Foundation, Google Ideas, Open Syllabus Project