Read the article carefully, If you don’t read the article carefully, I’ll start a dispute. Finish it in 24 hours
1. Write a 800-words summary
Read the paper by Leskovec, Kleinberg, and Faloutsos, “Graphs over time:
densification laws, shrinking diameters and possible explanations”, that was
published in KDD 2005. You might recall Kleinberg as the author of the HITS
algorithm and Faloutsos as one of the 3 authors in the {Faloutsos, Faloutsos,
Faloutsos} paper. The “Graphs over Time” paper talks about phenomena that occur
when graphs evolve/grow over time and new graph models that reflect such
phenomena. (You do not have to follow or master the theoretical contributions, just
the basic concepts.) Write an approx. 800-word summary of the paper having two
parts: i) summary of the paper’s contributions, ii) how the conclusions here can
benefit an urban computing scenario (suitably chosen by you).
2. Read and discuss 2 questions, 100 words each
To read
[required] John M. Carroll, Mary Beth Rosson, George Chin Jr., and Jurgen
Koenemann. Requirements development in scenario-based design. IEEE
Transactions on Software Engineering 24(12): 1156-1170, Dec. 1998.
[required] Mary Beth Rosson and John M. Carroll. Scenario-Based Usability
Engineering, Chapter 3, 1999.

To turn in
Prepare a brief (no more than one page) written answer to the following two questions.
Write up your answer using MS Word
One well-presented paragraph for each question is sufficient.
What do you believe is the central difference between the requirements analysis
approach(es) you studied in 5704 and the “participatory design”-based approach
discussed in the assigned material?
If you were to use this HCI-based approach on a new project, would you worry about
prematurely considering or making important design decisions during requirements
gathering? Why or why not?
Graphs over Time: Densification Laws, Shrinking
Diameters and Possible Explanations
Jure Leskovec
Carnegie Mellon University
[email protected]
Jon Kleinberg

Cornell University
[email protected]
Christos Faloutsos
Carnegie Mellon University
[email protected]
How do real graphs evolve over time? What are “normal”
growth patterns in social, technological, and information
networks? Many studies have discovered patterns in static
graphs, identifying properties in a single snapshot of a large
network, or in a very small number of snapshots; these in-
clude heavy tails for in- and out-degree distributions, com-
munities, small-world phenomena, and others. However,
given the lack of information about network evolution over
long periods, it has been hard to convert these findings into
statements about trends over time.
Here we study a wide range of real graphs, and we observe
some surprising phenomena. First, most of these graphs
densify over time, with the number of edges growing super-
linearly in the number of nodes. Second, the average dis-
tance between nodes often shrinks over time, in contrast
to the conventional wisdom that such distance parameters
should increase slowly as a function of the number of nodes
(like O(log n) or O(log(log n)).
Existing graph generation models do not exhibit these
types of behavior, even at a qualitative level. We provide a
new graph generator, based on a “forest fire” spreading pro-
cess, that has a simple, intuitive justification, requires very
few parameters (like the “flammability” of nodes), and pro-
Work partially supported by the National Science Founda-
tion under Grants No. IIS-0209107, SENSOR-0329549, IIS-
0326322, CNS-0433540, CCF-0325453, IIS-0329064, CNS-
0403340, CCR-0122581, a David and Lucile Packard Foun-
dation Fellowship, and also by the Pennsylvania Infrastruc-
ture Technology Alliance (PITA), a partnership of Carnegie
Mellon, Lehigh University and the Commonwealth of Penn-
sylvania’s Department of Community and Economic Devel-
opment (DCED). Any opinions, findings, and conclusions
or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the
National Science Foundation, or other funding parties.∗This research was done while on sabbatical leave at CMU.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD’05, August 21–24, 2005, Chicago, Illinois, USA.
Copyright 2005 ACM 1-59593-135-X/05/0008 …$5.00.
duces graphs exhibiting the full range of properties observed
both in prior work and in the present study.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications –
Data Mining
General Terms
Measurement, Theory
densification power laws, graph generators, graph mining,
heavy-tailed distributions, small-world phenomena
In recent years, there has been considerable interest in
graph structures arising in technological, sociological, and
scientific settings: computer networks (routers or autonomous
systems connected together); networks of users exchanging
e-mail or instant messages; citation networks and hyperlink
networks; social networks (who-trusts-whom, who-talks-to-
whom, and so forth); and countless more [24]. The study
of such networks has proceeded along two related tracks:
the measurement of large network datasets, and the devel-
opment of random graph models that approximate the ob-
served properties.
Many of the properties of interest in these studies are
based on two fundamental parameters: the nodes’ degrees
(i.e., the number of edges incident to each node), and the
distances between pairs of nodes (as measured by shortest-
path length). The node-to-node distances are often studied
in terms of the diameter — the maximum distance — and
a set of closely related but more robust quantities including
the average distance among pairs and the effective diameter
(the 90th percentile distance, a smoothed form of which we
use for our studies).
Almost all large real-world networks evolve over time by
the addition and deletion of nodes and edges. Most of the
recent models of network evolution capture the growth pro-
cess in a way that incorporates two pieces of “conventional
(A) Constant average degree assumption: The average node
degree in the network remains constant over time. (Or
equivalently, the number of edges grows linearly in the
number of nodes.)
(B) Slowly growing diameter assumption: The diameter is
a slowly growing function of the network size, as in
“small world” graphs [4, 7, 22, 30].
For example, the intensively-studied preferential attach-
ment model [3, 24] posits a network in which each new node,
when it arrives, attaches to the existing network by a con-
stant number of out-links, according to a “rich-get-richer”
rule. Recent work has given tight asymptotic bounds on the
diameter of preferential attachment networks [6, 9]; depend-
ing on the precise model, these bounds grow logarithmically
or even slower than logarithmically in the number of nodes.
How are assumptions (A) and (B) reflected in data on net-
work growth? Empirical studies of large networks to date
have mainly focused on static graphs, identifying properties
of a single snapshot or a very small number of snapshots
of a large network. For example, despite the intense inter-
est in the Web’s link structure, the recent work of Ntoulas
et al. [25] noted the lack of prior empirical research on the
evolution of the Web. Thus, while one can assert based
on these studies that, qualitatively, real networks have rela-
tively small average node degrees and diameters, it has not
been clear how to convert these into statements about trends
over time.
The present work: Densification laws and shrinking
diameters. Here we study a range of different networks,
from several domains, and we focus specifically on the way in
which fundamental network properties vary with time. We
find, based on the growth patterns of these networks, that
principles (A) and (B) need to be reassessed. Specifically,
we show the following for a broad range of networks across
diverse domains.
(A′) Empirical observation: Densification power laws: The
networks are becoming denser over time, with the av-
erage degree increasing (and hence with the number of
edges growing super-linearly in the number of nodes).
Moreover, the densification follows a power-law pat-
(B′) Empirical observation: Shrinking diameters: The ef-
fective diameter is, in many cases, actually decreasing
as the network grows.
We view the second of these findings as particularly surpris-
ing: Rather than shedding light on the long-running debate
over exactly how slowly the graph diameter grows as a func-
tion of the number of nodes, it suggests a need to revisit
standard models so as to produce graphs in which the ef-
fective diameter is capable of actually shrinking over time.
We also note that, while densification and decreasing diam-
eters are properties that are intuitively consistent with one
another (and are both borne out in the datasets we study),
they are qualitatively distinct in the sense that it is possi-
ble to construct examples of graphs evolving over time that
exhibit one of these properties but not the other.
We can further sharpen the quantitative aspects of these
findings. In particular, the densification of these graphs,
as suggested by (A′), is not arbitrary; we find that as the
graphs evolve over time, they follow a version of the relation
e(t) ∝ n(t)a (1)
where e(t) and n(t) denote the number of edges and nodes
of the graph at time t, and a is an exponent that generally
lies strictly between 1 and 2. We refer to such a relation as
a densification power law, or growth power law. (Exponent
a = 1 corresponds to constant average degree over time,
while a = 2 corresponds to an extremely dense graph where
each node has, on average, edges to a constant fraction of
all nodes.)
What underlying process causes a graph to systematically
densify, with a fixed exponent as in Equation (1), and to
experience a decrease in effective diameter even as its size
increases? This question motivates the second main contri-
bution of this work: we present two families of probabilistic
generative models for graphs that capture aspects of these
properties. The first model, which we refer to as Community
Guided Attachment (CGA), argues that graph densification
can have a simple underlying basis; it is based on a decom-
position of the nodes into a nested set of communities, such
that the difficulty of forming links between communities in-
creases with the community size. For this model, we obtain
rigorous results showing that a natural tunable parameter
in the model can lead to a densification power law with
any desired exponent a. The second model, which is more
sophisticated, exhibits both densification and a decreasing
effective diameter as it grows. This model, which we refer to
as the Forest Fire Model, is based on having new nodes at-
tach to the network by “burning” through existing edges in
epidemic fashion. The mathematical analysis of this model
appears to lead to novel questions about random graphs that
are quite complex, but through simulation we find that for
a range of parameter values the model exhibits realistic be-
havior in densification, distances, and degree distributions.
It is thus the first model, to our knowledge, that exhibits
this full set of desired properties.
Accurate properties of network growth, together with mod-
els supporting them, have implications in several contexts.
• Graph generation: Our findings form means for as-
sessing the quality of graph generators. Synthetic graphs are
important for ‘what if’ scenarios, for extrapolations, and for
simulations, when real graphs are impossible to collect (like,
e.g., a very large friendship graph between people).
• Graph sampling: Datasets consisting of huge real-
world graphs are increasingly available, with sizes ranging
from the millions to billions of nodes. There are many known
algorithms to compute interesting measures ( shortest paths,
centrality, betweenness, etc), but most of these algorithms
become impractical for the largest of these graphs. Thus
sampling is essential — but sampling from a graph is a non-
trivial problem. Densification laws can help discard bad
sampling methods, by providing means to reject sampled
• Extrapolations: For several real graphs, we have a
lot of snapshots of their past. What can we say about their
future? Our results help form a basis for validating scenarios
for graph evolution.
• Abnormality detection and computer network man-
agement: In many network settings, “normal” behavior will
produce subgraphs that obey densification laws (with a pre-
dictable exponent) and other properties of network growth.
If we detect activity producing structures that deviate sig-
nificantly from this, we can flag it as an abnormality; this
can potentially help with the detection of e.g. fraud, spam,
or distributed denial of service (DDoS) attacks.
The rest of the paper is organized as follows: Section 2 sur-
veys the related work. Section 3 gives our empirical findings
on real-world networks across diverse domains. Section 4 de-
scribes our proposed models and gives results obtained both
through analysis and simulation. We conclude and discuss
the implications of our findings in Section 5.
Research over the past few years has identified classes of
properties that many real-world networks obey. One of the
main areas of focus has been on degree power laws, show-
ing that the set of node degrees has a heavy-tailed distri-
bution. Such degree distributions have been identified in
phone call graphs [1], the Internet [11], the Web [3, 14, 20],
click-stream data [5] and for a who-trusts-whom social net-
work [8]. Other properties include the “small-world phe-
nomenon,” popularly known as “six degrees of separation”,
which states that real graphs have surprisingly small (aver-
age or effective) diameter (see [4, 6, 7, 9, 17, 22, 30, 31]).
In parallel with empirical studies of large networks, there
has been considerable work on probabilistic models for graph
generation. The discovery of degree power laws led to the
development of random graph models that exhibited such
degree distributions, including the family of models based
on preferential attachment [2, 3, 10] and the related copying
model [18, 19]. See [23, 24] for surveys of this area.
It is important to note the fundamental contrast between
one of our main findings here — that the average number of
out-links per node is growing polynomially in the network
size — and body of work on degree power laws. This earlier
work developed models that almost exclusively used the as-
sumption of node degrees that were bounded by constants
(or at most logarithmic functions) as the network grew; our
findings and associated model challenge this assumption, by
showing that networks across a number of domains are be-
coming denser.
The bulk of prior work on the study of network datasets
has focused on static graphs, identifying patterns in a sin-
gle snapshot, or a small number of network snapshots (see
also the discussion of this point by Ntoulas et al. [25]). Two
exceptions are the very recent work of Katz [16], who in-
dependently discovered densification power laws for citation
networks, and the work of Redner [28], who studied the
evolution of the citation graph of Physical Review over the
past century. Katz’s work builds on his earlier research on
power-law relationships between the size and recognition of
professional communities [15]; his work on densification is
focused specifically on citations, and he does not propose a
generative network model to account for the densification
phenomenon, as we do here. Redner’s work focuses on a
range of citation patterns over time that are different from
the network properties we study here.
Our Community Guided Attachment (CGA) model, which
produces densifying graphs, is an example of a hierarchical
graph generation model, in which the linkage probability be-
tween nodes decreases as a function of their relative distance
in the hierarchy [8, 17, 31]. Again, there is a distinction be-
tween the aims of this past work and our model here; where
these earlier network models were seeking to capture proper-
ties of individual snapshots of a graph, we seek to explain a
time evolution process in which one of the fundamental pa-
rameters, the average node degree, is varying as the process
unfolds. Our Forest Fire Model follows the overall frame-
work of earlier graph models in which nodes arrive one at
a time and link into the existing structure; like the copy-
1994 1996 1998 2000 2002
Year of publication
1975 1980 1985 1990 1995
Year granted
(a) arXiv (b) Patents
0 200 400 600
Time [days]
1994 1996 1998 2000
Year of publication
(c) Autonomous Systems (d) Affiliation network
Figure 1: The average node out-degree over time.
Notice that it increases, in all 4 datasets. That is,
all graphs are densifying.
ing model discussed above, for example, a new node creates
links by consulting the links of existing nodes. However, the
recursive process by which nodes in the Forest Fire Model
creates these links is quite different, leading to the new prop-
erties discussed in the previous section.
We study the temporal evolution of several networks, by
observing snapshots of these networks taken at regularly
spaced points in time. We use datasets from four differ-
ent sources; for each, we have information about the time
when each node was added to the network over a period of
several years — this enables the construction of a snapshot
at any desired point in time. For each of datasets, we find
a version of the densification power law from Equation (1),
e(t) ∝ n(t)a; the exponent a differs across datasets, but
remains remarkably stable over time within each dataset.
We also find that the effective diameter decreases in all the
datasets considered.
The datasets consist of two citation graphs for different
areas in the physics literature, a citation graph for U.S.
patents, a graph of the Internet, and five bipartite affiliation
graphs of authors with papers they authored. Overall, then,
we consider 9 different datasets from 4 different sources.
3.1 Densification Laws
Here we describe the datasets we used, and our findings
related to densification. For each graph dataset, we have,
or can generate, several time snapshots, for which we study
the number of nodes n(t) and the number of edges e(t) at
each timestamp t. We denote by n and e the final number
of nodes and edges. We use the term Densification Power
Law plot (or just DPL plot) to refer to the log-log plot of
number of edges e(t) versus number of nodes n(t).
3.1.1 ArXiv citation graph
We first investigate a citation graph provided as part of
the 2003 KDD Cup [12]. The HEP–TH (high energy physics
theory) citation graph from the e-print arXiv covers all the
citations within a dataset of n=29,555 papers with e= 352,807
edges. If a paper i cites paper j, the graph contains a di-
rected edge from i to j. If a paper cites, or is cited by, a
paper outside the dataset, the graph does not contain any
information about this. We refer to this dataset as arXiv.
This data covers papers in the period from January 1993
to April 2003 (124 months). It begins within a few months
of the inception of the arXiv, and thus represents essentially
the complete history of its HEP–TH section. For each month
m (1 ≤ m ≤ 124) we create a citation graph using all papers
published before month m. For each of these graphs, we
plot the number of nodes versus the number of edges on a
logarithmic scale and fit a line.
Figure 2(a) shows the DPL plot; the slope is a = 1.68
and corresponds to the exponent in the densification law.
Notice that a is significantly higher than 1, indicating a
large deviation from linear growth. As noted earlier, when
a graph has a > 1, its average degree increases over time.
Figure 1(a) exactly plots the average degree d̄ over time,
and it is clear that d̄ increases. This means that the average
length of the bibliographies of papers increases over time.
There is a subtle point here that we elaborate next: With
almost any network dataset, one does not have data reaching
all the way back to the network’s birth (to the extent that
this is a well-defined notion). We refer to this as the problem
of the “missing past.” Due to this, there will be some ef-
fect of increasing out-degree simply because edges will point
to nodes prior to the beginning of the observation period.
We refer to such nodes as phantom nodes, with a similar
definition for phantom edges. In all our datasets, we find
that this effect is relatively minor once we move away from
the beginning of the observation period; on the other hand,
the phenomenon of increasing degree continues through to
the present. For example, in arXiv, nodes over the most
recent years are primarily referencing non-phantom nodes;
we observe a knee in Figure 1(a) in 1997 that appears to
be attributable in large part to the effect of phantom nodes.
(Later, when we consider a graph of the Internet, we will
see a case where comparable properties hold in the absence
of any “missing past” issues.)
We also experimented with a second citation graph, taken
from the HEP–PH section of the arXiv, which is about the
same size as our first arXiv dataset. It exhibits the same
behavior, with the densification exponent a = 1.56. The
plot is omitted for brevity.
3.1.2 Patents citation graph
Next, we consider a U.S. patent dataset maintained by the
National Bureau of Economic Research [13]. The data set
spans 37 years (January 1, 1963 to December 30, 1999), and
includes all the utility patents granted during that period,
totaling n=3,923,922 patents. The citation graph includes
all citations made by patents granted between 1975 and
1999, totaling e=16,522,438 citations. Because the dataset
begins in 1975, it too has a “missing past” issue, but again
the effect of this is minor as one moves away from the first
few years.
We follow the same procedure as with arXiv. For each
year Y from 1975 to 1999, we create a citation network on
patents up to year Y , and give the DPL plot, in Figure 2(b).
As with the arXiv citation network, we observe a high den-
sification exponent, in this case a = 1.66.
Number of nodes
Jan 1993
Apr 2003
= 0.0113 x1.69 R2=1.0
Number of nodes
= 0.0002 x1.66 R2=0.99
(a) arXiv (b) Patents
Number of nodes
= 0.87 x1.18 R2=1.00
Number of nodes
= 0.4255 x1.15 R2=1.0
(c) Autonomous Systems (d) Affiliation network
Figure 2: Number of edges e(t) versus number of
nodes n(t), in log-log scales, for several graphs. All
4 graphs obey the Densification Power Law, with a
consistently good fit. Slopes: a = 1.68, 1.66, 1.18
and 1.15, respectively.
Figure 1(b) illustrates the increasing out-degree of patents
over time. Note that this plot does not incur any of the
complications of a bounded observation period, since the
patents in the dataset include complete citation lists, and
here we are simply plotting the average size of these as a
function of the year.
3.1.3 Autonomous systems graph
The graph of routers comprising the Internet can be or-
ganized into sub-graphs called Autonomous Systems (AS).
Each AS exchanges traffic flows with some neighbors (peers).
We can construct a communication network of who-talks-to-
whom from the BGP (Border Gateway Protocol) logs.
We use the the Autonomous Systems (AS) dataset from [26].
The dataset contains 735 daily instances which span an in-
terval of 785 days from November 8 1997 to January 2 2000.
In contrast to citation networks, where nodes and edges
only get added (not deleted) over time, the AS dataset also
exhibits both the addition and deletion of the nodes and
edges over time.
Figure 2(c) shows the DPL plot for the Autonomous Sys-
tems dataset. We observe a clear trend: Even in the pres-
ence of noise, changing external conditions, and disruptions
to the Internet we observe a strong super-linear growth in
the number of edges over more than 700 AS graphs. We
show the increase in the average node degree over time
in Figure 1(c). The densification exponent is a = 1.18,
lower than the one for the citation networks, but still clearly
greater than 1.
3.1.4 Affiliation graphs
Using the arXiv data, we also constructed bipartite affil-
iation graphs. There is a node for each paper, a node for
each person who authored at least one arXiv paper, and an
edge connecting people to the papers they authored. Note
that the more traditional co-authorship network is implicit
in the affiliation network: two people are co-authors if there
is at least one paper joined by an edge to each of them.
We studied affiliation networks derived from the five largest
categories in the arXiv (ASTRO–PH, HEP–TH, HEP–PH,
COND–MAT and GR–QC). We place a time-stamp on each
node: the submission date of each paper, and for each per-
son, the date of their first submission to the arXiv. The
data for affiliation graphs covers the period from April 1992
to March 2002. The smallest of the graphs (category GR–
QC) had 19,309 nodes (5,855 authors, 13,454 papers) and
26,169 edges. ASTRO–PH is the largest graph, with 57,381
nodes (19,393 authors, 37,988 papers) and 133,170 edges. It
has 6.87 authors per paper; most of the other categories also
have similarly high numbers of authors per paper.
For all these affiliation graphs we observe similar phe-
nomena, and in particular we have densification exponents
between 1.08 and 1.15. Due to lack of space we present
the complete set of measurements only for ASTRO–PH, the
largest affiliation graph. Figures 1(d) and 2(d) show the
increasing average degree over time, and a densification ex-
ponent of a = 1.15.
3.2 Shrinking Diameters
We now discuss the behavior of the effective diameter over
time, for this collection of network datasets. Following the
conventional wisdom on this topic, we expected the under-
lying question to be whether we could detect the differences
among competing hypotheses concerning the growth rates
of the diameter — for example, the difference between loga-
rithmic and sub-logarithmic growth. Thus, it was with some
surprise that we found the effective diameters to be actually
decreasing over time (Figure 3).
Let us make the definitions underlying the observations
concrete. We say that two nodes in an undirected network
are connected if there is an path between them; for each nat-
ural number d, let g(d) denote the fraction of connected node
pairs whose shortest connecting path has length at most d.
The hop-plot for the network is the set of pairs (d, g(d)); it
thus gives the cumulative distribution of distances between
connected node pairs. We extend the hop-plot to a function
defined over all positive real numbers by linearly interpolat-
ing between the points (d, g(d)) and (d + 1, g(d + 1)) for each
d, and we define the effective diameter of the network to be
the value of d at which this function achieves the value 0.9.
(Note that this varies slightly from an alternate definition
of the effective diameter used in earlier work: the minimum
value d such that at least 90% of the connected node pairs
are at distance at most d. Our variation smooths this defi-
nition by allowing it to take non-integer values.) The effec-
tive diameter is a more robust quantity than the diameter
(defined as the maximum distance over all connected node
pairs), since the diameter is prone to the effects of degener-
ate structures in the graph (e.g. very long chains). However,
the effective diameter and diameter tend to exhibit qualita-
tively similar behavior.
For each time t (as in the previous subsection), we create
a graph consisting of nodes up to that time, and compute
the effective diameter of the undirected version of the graph.
Figure 3 shows the effective diameter over time; one ob-
serves a decreasing trend for all the graphs. We performed
a comparable analysis to what we describe here for all 9
graph datasets in our study, with very similar results. For
the citation networks in our study, the decreasing effective
1992 1994 1996 1998 2000 2002 2004
Time [years]
Full graph
Post ’95 subgraph
Post ’95 subgraph, no past
1992 1994 1996 1998 2000 2002
Time [years]
Full graph
Post ’95 subgraph
Post ’95 subgraph, no past
(a) arXiv citation graph (b) Affiliation network
1975 1980 1985 1990 1995 2000
Time [years]
Full graph
Post ’85 subgraph
Post ’85 subgraph, no past
3000 3500 4000 4500 5000 5500 6000 6500
Size of the graph [number of nodes]
Linear fit
(c) Patents (d) AS
Figure 3: The effective diameter over time.
diameter has the following interpretation: Since all the links
out of a node are “frozen” at the moment it joins the graph,
the decreasing distance between pairs of nodes appears to be
the result of subsequent papers acting as “bridges” by cit-
ing earlier papers from disparate areas. Note that for other
graphs in our study, such as the AS dataset, it is possible for
an edge between two nodes to appear at an arbitrary time
after these two nodes join the graph.
We note that the effective diameter of a graph over time is
necessarily bounded from below, and the decreasing patterns
of the effective diameter in the plots of Figure 3 are consis-
tent with convergence to some asymptotic value. However,
understanding the full “limiting behavior” of the effective
diameter over time, to the extent that this is even a well-
defined notion, remains an open question.
3.2.1 Validating the shrinking diameter conclusion
Given the unexpected nature of this result, we wanted to
verify that the shrinking diameters were not attributable to
artifacts of our datasets or analyses. We explored this issue
in a number of ways, which we now summarize; the conclu-
sion is that the shrinking diameter appears to be a robust,
and intrinsic, phenomenon. Specifically, we performed ex-
periments to account for (a) possible sampling problems, (b)
the effect of …
Copyright  1999 by Mary Beth Rosson and John M. Carroll
Scenario-Based Usability Engineering
Mary Beth Rosson and John M. Carroll
Department of Computer Science
Virginia Tech
Fall 1999
SBUE—Chapter 3 1
Copyright  1999 by Mary Beth Rosson and John M. Carroll
Chapter 3
Analyzing Requirements
Making work visible. The end goal of requirements analysis can be elusive when work is not
understood in the same way by all participants. Blomberg, Suchman, and Trigg describe this
problem in their exploration of image-processing services for a law firm. Initial studies of
attorneys produced a rich analysis of their document processing needs—for any legal proceeding,
documents often numbering in the thousands are identified as “responsive” (relevant to the case) by
junior attorneys, in order to be submitted for review by the opposing side. Each page of these
documents is given a unique number for subsequent retrieval. An online retrieval index is created
by litigation support workers; the index encodes document attributes such as date, sender,
recipient, and type. The attorneys assumed that their job (making the subjective relevance
decisions) would be facilitated by image processing that encodes a documents’s objective attributes
(e.g., date, sender). However, studies of actual document processing revealed activities that were
not objective at all, but rather relied on the informed judgment of the support staff. Something as
simple as a document date was often ambiguous, because it might display the date it was written,
signed, and/or delivered; the date encoded required understanding the document’s content and role
in a case. Even determining what constituted a document required judgment, as papers came with
attachments and no indication of beginning or end. Taking the perspective of the support staff
revealed knowledge-based activities that were invisible to the attorneys, but that had critical limiting
implications for the role of image-processing technologies (see Blomberg, 1995).
SBUE—Chapter 3 2
Copyright  1999 by Mary Beth Rosson and John M. Carroll
What is Requirements Analysis?
The purpose of requirements analysis is to expose the needs of the current situation with
respect to a proposed system or technology. The analysis begins with a mission statement or
orienting goals, and produces a rich description of current activities that will motivate and guide
subsequent development. In the legal office case described above, the orienting mission was
possible applications of image processing technology; the rich description included a view of case
processing from both the lawyers’ and the support staffs’ perspectives. Usability engineers
contribute to this process by analyzing what and how features of workers’ tasks and their work
situation are contributing to problems or successes1. This analysis of the difficulties or
opportunities forms a central piece of the requirements for the system under development: at the
minimum, a project team expects to enhance existing work practices. Other requirements may arise
from issues unrelated to use, for example hardware cost, development schedule, or marketing
strategies. However these pragmatic issues are beyond the scope of this textbook. Our focus is on
analyzing the requirements of an existing work setting and of the workers who populate it.
Understanding Work
What is work? If you were to query a banker about her work, you would probably get a
list of things she does on a typical day, perhaps a description of relevant information or tools, and
maybe a summary of other individuals she answers to or makes requests of. At the least,
describing work means describing the activities, artifacts (data, documents, tools), and social
context (organization, roles, dependencies) of a workplace. No single observation or interview
technique will be sufficient to develop a complete analysis; different methods will be useful for
different purposes.
Tradeoff 3.1: Analyzing tasks into hierarchies of sub-tasks and decision rules brings order
to a problem domain, BUT tasks are meaningful only in light of organizational goals and
A popular approach to analyzing the complex activities that comprise work is to enumerate
and organize tasks and subtasks within a hierarchy (Johnson, 1995). A banker might indicate that
the task of “reviewing my accounts” consists of the subtasks “looking over the account list”,
“noting accounts with recent activity”, and “opening and reviewing active accounts”. Each of these
sub-tasks in turn can decomposed more finely, perhaps to the level of individual actions such as
picking up or filing a particular document. Some of the tasks will include decision-making, such
1 In this discussion we use “work” to refer broadly to the goal-directed activities that take place in the
problem domain. In some cases, this may involve leisure or educational activities, but in general the same methods
can be applied to any situation with established practices.
SBUE—Chapter 3 3
Copyright  1999 by Mary Beth Rosson and John M. Carroll
as when the banker decides whether or not to open up a specific account based on its level of
A strength of task analysis is its step-by-step transformation of a complex space of
activities into an organized set of choices and actions. This allows a requirements analyst to
examine the task’s structure for completeness, complexity, inconsistencies, and so on. However
the goal of systematic decomposition can also be problematic, if analysts become consumed by
representing task elements, step sequences, and decision rules. Individual tasks must be
understood within the larger context of work; over-emphasizing the steps of a task can cause
analysts to miss the forest for the trees. To truly understand the task of reviewing accounts a
usability engineer must learn who is responsible for ensuring that accounts are up to date, how
account access is authorized and managed, and so on.
The context of work includes the physical, organizational, social, and cultural relationships
that make up the work environment. Actions in a workplace do not take place in a vacuum;
individual tasks are motivated by goals, which in turn are part of larger activities motivated by the
organizations and cultures in which the work takes place (see Activities of a Health Care Center,
below). A banker may report that she is reviewing accounts, but from the perspective of the
banking organization she is “providing customer service” or perhaps “increasing return on
investment”. Many individuals — secretaries, data-entry personnel, database programmers,
executives — work with the banker to achieve these high-level objectives. They collaborate
though interactions with shared tools and information; this collaboration is shaped not only by the
tools that they use, but also by the participants’ shared understanding of the bank’s business
practice — its goals, policies, and procedures.
Tradeoff 3.2: Task information and procedures are externalized in artifacts, BUT the impact
of these artifacts on work is apparent only in studying their use.
A valuable source of information about work practices is the artifacts used to support task
goals (Carroll & Campbell, 1989). An artifact is simply a designed object — in an office setting, it
might be a paper form, a pencil, an in-basket, or a piece of computer software. It is simple and fun
to collect artifacts and analyze their characteristics (Norman, 1990). Consider the shape of a
pencil: it conveys a great deal about the size and grasping features of the humans who use it;
pencil designers will succeed to a great extent by giving their new designs the physical
characteristics of pencils that have been used for years. But artifacts are just part of the picture.
Even an object as simple as a pencil must be analyzed as part of a real world activity, an activity
that may introduce concerns such as erasability (elementary school use), sharpness (architecture
firm drawings), name-brands (pre-teen status brokering), cost (office supplies accounting), and so
Usability engineers have adapted ethnographic techniques to analyze the diverse factors
influencing work. Ethnography refers to methods developed within anthropology for gaining
insights into the life experiences of individuals whose everyday reality is vastly different from the
SBUE—Chapter 3 4
Copyright  1999 by Mary Beth Rosson and John M. Carroll
analyst’s (Blomberg, 1990). Ethnographers typically become intensely involved in their study of a
group’s culture and activities, often to the point of becoming members themselves. As used by
HCI and system design communities, ethnography involves observations and interviews of work
groups in their natural setting, as well as collection and analysis of work artifacts (see Team Work
in Air Traffic Control, below). These studies are often carried out in an iterative fashion, where
the interpretation of one set of data raises questions or possibilities that may be pursued more
directly in follow-up observations and interviews.
Figure 3.1: Activity Theory Analysis of a Health Care Center
(after Kuuiti and Arvonen, 1992)
Activities of a Health Care Center: Activity Theory (AT) offers a view of individual
work that grounds it in the goals and practices of the community within which the work takes
place. Engeström (1987) describes how an individual (the subject) works on a problem (the
object) to achieve a result (the outcome), but that the work on the problem is mediated by the tools
available (see Figure 3.2m). An individual’s work is also mediated by the rules of practice shared
within her community; the object of her work is mediated by that same communities division of
Kuutti and Arvonen (1992; see also Engeström 1990; 1991; 1993) applied this framework
to their studies of a health care organization in Espoo, Finland. This organization wished to evolve
Tools Supporting Activity:
Subject Involved in Activity:
Community sponsoring Activity:
Object of Activity:
Activity Outcome:
Division of LaborRules of Practice
patient record, medicines, etc.
one physician in a health care unit
all personnel of the health care unit
the complex, multi-dimensional
problem of a patient
patient problem resolved
SBUE—Chapter 3 5
Copyright  1999 by Mary Beth Rosson and John M. Carroll
from a rather bureaucratic organization with strong separations between its various units (e.g.,
social work, clinics, hospital) to a more service-oriented organization. A key assumption in doing
this was that the different units shared a common general object of work—the “life processes” of
the town’s citizens. This high-level goal was acknowledged to be a complex problem requiring the
integrated services of complementary health care units.
The diagram in Figure 3.1 summarizes an AT analysis developed for one physician in a
clinic. The analysis records the shared object (the health conditions of a patient). At the same time
it shows this physician’s membership in a subcommunity, specifically the personnel at her clinic.
This clinic is both geographically and functionally separated from other health care units, such as
the hospital or the social work office. The tools that the physician uses in her work, the rules that
govern her actions, and her understanding of her goals are mediated by her clinic. As a result, she
has no way of analyzing or finding out about other dimensions of this patient’s problems, for
example the home life problems being followed by a social worker, or emotional problems under
treatment by psychiatric personnel. In AT such obstacles are identified as contradictions which
must be resolved before the activity can be successful.
In this case, a new view of community was developed for the activity. For each patient,
email or telephone was used to instantiate a new community, comprised of individuals as relevant
from different health units. Of course the creation of a more differentiated community required
negotiation concerning the division of labor (e.g. who will contact whom and for what purpose),
and rules of action (e.g., what should be done and in what order). Finally, new tools (composite
records, a “master plan”) were constructed that better supported the redefined activity.
Figure 3.2 will appear here, a copy of the figure provided by Hughes et al. in their
ethnographic report. Need to get copyright permission.
Team Work in Air Traffic Control: An ethnographic study of British air traffic
control rooms by Hughes, Randall and Shapiro (CSCW’92) highlighted the central role played by
the paper strips used to chart the progress of individual flights. In this study the field workers
immersed themselves in the work of air traffic controllers for several months. During this time
they observed the activity in the control rooms and talked to the staff; they also discussed with the
staff the observations they were collecting and their interpretation of these data.
The general goal of the ethnography was to analyze the social organization of the work in
the air traffic control rooms. In this the researchers showed how the flight progress strips
supported “individuation”, such that each controller knew what their job was in any given
situation, but also how their tasks were interdependent with the tasks of others. The resulting
division of labor was accomplished in a smooth fashion because the controllers had shared
knowledge of what the strips indicated, and were able to take on and hand off tasks as needed, and
to recognize and address problems that arose.
SBUE—Chapter 3 6
Copyright  1999 by Mary Beth Rosson and John M. Carroll
Each strip displays an airplane’s ID and aircraft type; its current level, heading, and
airspeed; its planned flight path, navigation points on route, estimated arrival at these points; and
departure and destination airports (see Figure 3.2). However a strip is more than an information
display. The strips are work sites, used to initiate and perform control tasks. Strips are printed
from the online database, but then annotated as flight events transpire. This creates a public
history; any controller can use a strip to reconstruct a “trajectory” of what the team has done with a
flight. The strips are used in conjunction with the overview offered by radar to spot exceptions or
problems to standard ordering and arrangement of traffic. An individual strip gets “messy” to the
extent it has deviated from the norm, so a set of strips serves as a sort of proxy for the orderliness
of the skies.
The team interacts through the strips. Once a strip is printed and its initial data verified, it is
placed in a holder color-coded for its direction. It may then be marked up by different controllers,
each using a different ink color; problems or deviations are signaled by moving a strip out of
alignment, so that visual scanning detects problem flights. This has important social consequences
for the active controller responsible for a flight. She knows that other team members are aware of
the flight’s situation and can be consulted; who if anyone has noted specific issues with the flight;
if a particularly difficult problem arises it can be passed on to the team leader without a lot of
explanation; and so on.
The ethnographic analysis documented the complex tasks that revolved around the flight
control strips. At the same time it made clear the constraints of these manually-created and
maintained records. However a particularly compelling element of the situation was the
controllers’ trust in the information on the strips. This was due not to the strips’ physical
characteristics, but rather to the social process they enable—the strips are public, and staying on
top of each others’ problem flights, discussing them informally while working or during breaks, is
taken for granted. Any computerized replacement of the strips must support not just management
of flight information, but also the social fabric of the work that engenders confidence in the
information displayed.
User Involvement
Who are a system’s target users? Clearly this is a critical question for a user-centered
development process. It first comes up during requirements analysis, when the team is seeking to
identify a target population(s), so as to focus in on the activities that will suggest problems and
concerns. Managers or corporation executives are a good source of high-level needs statements
(e.g., reduce data-processing errors, integrate billing and accounting). Such individuals also have
a well-organized view of their subordinates’ responsibilities , and of the conditions under which
various tasks are completed. Because of the hierarchical nature of most organizations, such
individuals are usually easily to identify and comprise a relatively small set. Unfortunately if a
requirements team accepts these requirements too readily, they may miss the more detailed and
situation-specific needs of the individuals who will use a new system in their daily work.
SBUE—Chapter 3 7
Copyright  1999 by Mary Beth Rosson and John M. Carroll
Tradeoff 3.3: Management understands the high-level requirements for a system, BUT is
often unaware of workers’ detailed needs and preferences.
Every system development situation includes multiple stakeholders (Checklund, 1981).
Individuals in management positions may have authorized a system’s purchase or development;
workers with a range of job responsibilities will actually use the system; others may benefit only
indirectly from the tasks a system supports. Each set of stakeholders has its own set of
motivations and problems that the new system might address (e.g., productivity, satisfaction, ease
of learning). What’s more, none of them can adequately communicate the perspectives of the
others — despite the best of intentions, many details of a subordinate’s work activities and
concerns are invisible to those in supervisory roles. Clearly what is needed in requirements
analysis is a broad-based approach that incorporates diverse stakeholder groups into the
observation and interviewing activities.
Tradeoff 3.4: Workers can describe their tasks, BUT work is full of exceptions, and the
knowledge for managing exceptions is often tacit and difficult to externalize.
But do users really understand their own work? We made the point above that a narrow
focus on the steps of a task might cause analysts to miss important workplace context factors. An
analogous point holds with respect to interviews or discussions with users. Humans are
remarkably good (and reliable) at “rationalizing” their behaivor (Ericsson & Simon, 1992).
Reports of work practices are no exception — when asked workers will usually first describe a
most-likely version of a task. If an established “procedures manual” or other policy document
exists, the activities described by experienced workers will mirror the official procedures and
policies. However this officially-blessed knowledge is only part of the picture. An experienced
worker will also have considerable “unofficial” knowledge acquired through years of encountering
and dealing with the specific needs of different situations, with exceptions, with particular
individuals who are part of the process, and so on. This expertise is often tacit, in that the
knowledgeable individuals often don’t even realize what they “know” until confronted with their
own behavior or interviewed with situation-specific probes (see Tacit Knowledge in Telephone
Trouble-Shooting, below). From the perspective of requirements analysis, however, tacit
knowledge about work can be critical, as it often contains the “fixes” or “enhancements” that have
developed informally to address the problems or opportunities of day-to-day work.
One effective technique for probing workers’ conscious and unconscious knowledge is
contextual inquiry (Beyers & Holtzblatt, 1994). This analysis method is similar to ethnography, in
that it involves the observation of individuals in the context of their normal work environment.
However it includes the perogative to interrupt an observed activity at points that seem informative
(e.g., when a problematic situation arises) and to interview the affected individual(s) on the spot
concerning the events that have been observed, to better understand causal factors and options for
continuing the activity. For example, a usability engineer who saw a secretary stop working on a
SBUE—Chapter 3 8
Copyright  1999 by Mary Beth Rosson and John M. Carroll
memo to make a phone call to another secretary, might ask her afterwards to explain what had just
happened between her and her co-worker.
Tacit Knowledge in Telephone Trouble-Shooting: It is common for workers to
see their conversations and interactions with each other as a social aspect of work that is enjoyable
but unrelated to work goals. Sachs (199x) observed this in her case study of telephony workers in
a phone company. The study analyzed the work processes related to detecting, submitting, and
resolving problems on telephone lines; the focus of the study was the Trouble Ticketing System
(TTS), a large database used to record telephone line problems, assign problems (tickets) to
engineers for correction, and keep records of problems detected and resolved.
Sachs argues that TTS takes an organizational view of work, treating work tasks as
modular and well-defined: one worker finds a problem, submits it to the database, TTS assigns it
to the engineer at the relevant site, that engineer picks up the ticket, fixes the problem, and moves
on. The original worker is freed from the problem analysis task once the original ticket, and the
second worker can move on once the problem has been addressed. TTS replaced a manual system
in which workers contacted each other directly over the phone, often working together to resolve a
problem. TTS was designed to make work more efficient by eliminating unnecessary phone
In her interviews with telephony veterans, Sachs discovered that the phone conversations
were far from unnecessary. The initiation, conduct, and consequences of these conversations
reflected a wealth of tacit knowledge on the part of the worker–selecting the right person to call
(one known to have relevant expertise for this apparent problem), the “filling in” on what the first
worker had or had not determined or tried to this point, sharing of hypotheses and testing methods,
iterating together through tests and results, and carrying the results of this informal analysis into
other possibly related problem areas. In fact, TTS had made work less efficient in many cases,
because in order to do a competent job, engineers developed “workarounds” wherein they used
phone conversations as they had in the past, then used TTS to document the process afterwards.
Of interest was that the telephony workers were not at first aware of how much knowledge
of trouble-shooting they were applying to their jobs. They described the tasks as they understood
them from company policy and procedures. Only after considerable data collection and discussion
did they recognize that their jobs included the skills to navigate and draw upon a rich organizational
network of colleagues. In further work Sachs helped the phone company to develop a fix for the
observed workarounds in the form of a new organizational role: a “turf coordinator”, a senior
engineer responsible for identifying and coordinating the temporary network of workers needed to
collaborate on trouble-shooting a problem. As a result of Sach’s analysis, work that had been tacit
and informal was elevated to an explicit business responsibility.
Requirements Analysis with Scenarios
As introduced in Chapter 2, requirements refers to the first phase of SBUE. As we also
have emphasized, requirements cannot be analyzed all at once in waterfall fashion. However some
SBUE—Chapter 3 9
Copyright  1999 by Mary Beth Rosson and John M. Carroll
analysis must happen early on to get the ball rolling. User interaction scenarios play an important
role in these early analysis activities. When analysts are observing workers in the world, they are
collecting observed scenarios, episodes of actual interaction among workers that may or may not
involve technology. The analysis goal is to produce a summary that captures the critical aspects of
the observed activities. A central piece of this summary analysis is a set of requirements scenarios.
The development of requirements scenarios begins with determining who are the
stakeholders in a work situation — what their roles and motivations are, what characteristics they
possess that might influence reactions to new technology. A description of these stakeholders’
work practice is then created, through a combination of workplace observation and generation of
hypothetical situations. These sources of data are summarized and combined to generate the
requirements scenarios. A final step is to call out the most critical features of the scenarios, along
with hypotheses about the positive or negative consequences that these features seem to be having
on the work setting.
Introducing the Virtual Science Fair Example Case
The methods of SBUE will be introduced with reference to a single open-ended example
problem, the design of a virtual science fair (VSF). The high-level concept is to use computer-
mediated communication technology (e.g., email, online chat, discussion forums,
videoconferencing) and online archives (e.g., databases, digital libraries) to supplement the
traditional physical science fairs. Such fairs typically involve student creation of science projects
over a period of months. The projects are then exhibited and judged at the science fair event. We
begin with a very loose concept of what a virtual version of such a fair might be — not a
replacement of current fairs, but rather a supplement that expands the boundaries of what might
constitute participation, project construction, project exhibits, judging, and so on.
Stakeholder Analysis
Checklund (1981) offers a mnemonic for guiding development of an early shared vision of
a system’s goals — CATWOE analysis. CATWOE elements include Clients (those people who
will benefit or suffer from the system), Actors (those who interact with the system), a
Transformation (the basic purpose of the system), a Weltanschauung (the world view promoted by
the system), Owners (the individuals commissioning or authorizing the system), and the
Environment (physical constraints on the system). SBUE adapts Checklund’s technique as an aid
in identifying and organizing the concerns of various stakeholders during requirements
analysis.The SBUE adaptation of Checklund’s technique includes the development of thumbnail
scenarios for each element identified. The table includes just one example for each VSF element
called out in the analysis; for a complex situation multiple thumbnails might be needed. Each
scenario sketch is a usage-oriented elaboration of the element itself; the sketch is points to a future
situation in which a possible benefit, interaction, environmental constraint, etc., is realized. Thus
the client thumbnails emphasize hoped-for benefits of the VSF; the actor thumbnails suggest a few
interaction variations anticipated for different stakeholders. The thumbnail scenarios generated in
SBUE—Chapter 3 10
Copyright  1999 by Mary Beth Rosson and John M. Carroll
this analysis are not yet design scenarios, they simply allow the analyst to begin to explore the
space of user groups, motivations, and pragmatic constraints.
The CATWOE thumbnail scenarios begin the iterative process of identifying and analyzing
the background, motivations, and preferences that different user groups will bring to the use of the
target system. This initial picture will be elaborated throughout the development process, through
analysis of both existing and envisioned usage situations.
Clients Students
Community members
A high school student learns about road-bed coatings from a
retired civil engineer.
A busy housewife helps a middle school student organize her
bibliographic information.
Actors Students
Community members
A …

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.