Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Christos Katsanos | ckatsanos@ece.upatras.gr
Nikolaos Tselios | nitse@ece.upatras.gr
Nikolaos Avouris | avouris@ece.upatras.gr
Are Ten Participants Enough for Evaluating
Information Scent of Web Page Hyperlinks?
IFIP INTERACT | Uppsala, Sweden | 24-28 August, 2009

Purpose & Motivation
2
 A critical factor in web navigation is information scent
(Fu & Pirolli, 2007; Blackmon et al, 2005; Miller & Remington, 2004)
 user’s assessment of semantic relevance of navigation
options in a webpage
 Often, participants are called to evaluate scent by
providing ratings (Miller & Remington, 2004; Brumby & Howes,
2008)
 Remains unclear how many raters are required to
obtain representative estimates of information scent.

Design & Procedures
 Web-based survey
 Rate semantic relevancy of all links to the provided
goal (1=poor relevance, 5=high relevance).
 101 participants
 8 navigation menus, 8 links each
4
6464 ratings

Analysis Methodology
 Reference case = Scent-ratings from 101 participants
 Select 10 random samples of different size N
 N = 2, 5, 10, 15, 20, 25, 30, 40 and 50
 [Samples-Ratings] VS [All 101 participants Ratings]
 Average Spearman Correlation
 How many raters are enough to represent the ratings
of the whole dataset? 5

Results
6
 10 raters
 84-90% total var.
Error Bars = (rMEAN ± rSD)2
 x2 raters
 still the same
 x3 raters
 +5% closer to whole
dataset

First-phase: Conclusion
 10 raters appear to be a cost-effective solution to
evaluate information scent without expense in
the quality of results
7
 But how close are scent-ratings of 10 participants
to observed navigation behavior?

Design & Procedures
 Eye-tracking user study
 Perform the same 8 navigation tasks
used in first-phase
 54 users (not involved in first-phase)
 Two measures of users’ behavior:
 clicks on each link
 fixations-adjusted-for-text-length on each link.
9
432
recordings

Analysis Methodology
 Reference case = Behavioral data from 54 users
 [Scent-ratings from samples of - 1st
phase] VS
[Measures of user’s navigation behavior - 2nd
phase]
 Average Spearman Correlation
 How many raters are enough to reach an acceptable
level of correlation with these two measures?
10

Results
11
 Clicks on each link
 r10-raters is 0.7% different from
r101-raters
 r101raters = 0.80, p<.01
 Fixations on each link
 r10-raters is 7.4% different
from r101-raters
 r101-raters = 0.40, ns
Error Bars = rMEAN ± rSD

Second-phase: Conclusion
 10 participants provide scent-ratings that are close to
 observed link-selection behavior (clicks)
 distribution of attention (fixations)
12
 However, scent-ratings should be used only as a
rough indicator of users’ distribution of attention
 rs = 0.40, ns

Summary & Questions
 Investigated the well-known debate of “how many
users” in the context of information scent evaluation
 Scent-ratings of 10 participants appeared to be
enough for a discount evaluation of information scent
13
More studies required in the context of highly specialized
domains and/or varied user group composition
Christos Katsanos | ckatsanos@ece.upatras.gr

First-Phase: Question example
15

Second-Phase: How many
users are enough?
16
Clinks Count Observations Count

Interact 2009 katsanos et al are 10 participants enough to evaluate scent

More Related Content

Viewers also liked (20)

More from Nikolaos Tselios (13)

Recently uploaded (20)

Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Editor's Notes