1 TITLE and RUNNING HEAD: Syndetic Modelling David Duke

Transcription

1 TITLE and RUNNING HEAD: Syndetic Modelling David Duke
1
TITLE and RUNNING HEAD: Syndetic Modelling
David Duke, University of York
Philip Barnard, MRC Cognition and Brain Sciences Unit
David Duce, Rutherford Appleton Laboratory
Jon May, University of Sheffield
David Duke is a computer scientist with an interest in formal methods,
interactive systems, and computer graphics; he is a Lecturer in the Department
of Computer Science, University of York, United Kingdom. Philip Barnard
is a psychologist with an interest in theories of mental architecture and their
application to complex tasks, emotion and a range of psychopathologies; he is
on the scientific staff of the Medical Research Council’s Cognition and Brain
Sciences Unit. David Duce is a computer scientist with interests in computer
graphics, human-computer interaction and formal techniques; he is a senior staff
researcher in the Department for Computation and Information, Rutherford
Appleton Laboratory, United Kingdom. Jon May is a psychologist with an
interest in the application of unified models of cognition to perception,
particularly with regard to the effects of task and context; he is a Lecturer in the
Department of Psychology at the University of Sheffield.
2
ABSTRACT
Syndesis n. (pl. ~ es). [mod, L, f. Gk SYNdesis} binding together (sundeo
bind together)].
—The Concise Oxford Dictionary, Seventh Edition, 1986.
User and system models are typically viewed as independent representations
that provide complementary insights into aspects of human-computer
interaction. Within system development it is usual to see the two activities as
separate, or at best loosely coupled, with either the design artefact or some third
‘mediating’ expression providing the context in which the results of modelling
can be related. This paper proposes that formal system models can be combined
directly with a representation of human cognition to yield an integrated view of
human-system interaction: a syndetic model. Aspects of systems that affect
usability can then be described and understood in terms of the conjoint
behaviour of user and computer. This paper introduces and discusses, in
syndetic terms, two scenarios with markedly different properties. We show
how syndesis can provide a formal foundation for reasoning about interaction.
3
CONTENTS
1. INTRODUCTION
2. THE NEED FOR MULTI-DISCIPLINARY INSIGHT
2.1. Why Theoretical Integration Matters
2.2. Why Mathematics Matters
2.3. Recent Work
3. A MATHEMATICAL MODEL OF A MULTIMODAL USER INTERFACE
4. INTERACTING COGNITIVE SUBSYSTEMS (ICS)
4.1. Configurations
4.2. Blending of Data Streams
4.3. Cyclical Configurations
4.4. Buffering of Transformations
4.5. Interleaving and Oscillation
5. A MATHEMATICAL MODEL OF ICS
5.1. Basic Definitions
5.2. The Architecture
5.3. Information Processing in ICS
5.4. Key Points
6. A SYNDETIC MODEL OF MATIS
7. A SYNDETIC MODEL OF GESTURE
8. PROSPECTS AND CONCLUSIONS
APPENDIX: Glossary of Notation
4
1. INTRODUCTION
Most disciplines routinely use various kinds of model to represent and reason
about the behaviour of particular phenomena. In science, the model usually
relates or characterises observations that can be made of an extant system. For
engineering, or system design in general, the model often represents the
observations of a system that a designer will be asked to realise. In the process
of mapping a ‘requirements’ model into a delivered artefact the designer may
generate additional models that focus on problematic aspects of the desired
system, in order for example to resolve a choice between design options. In this
case the model allows the designer to bring some external body of theory to bear
on the problem, and clearly the value of carrying out the modelling will depend
on the degree of insight that the theory can provide.
In the context of human-computer interaction, two quite separate classes of
model have emerged. One is the concept of a system model, based on formal or
operational models of software systems and components, that can be used to
prototype or analyse the function and appearance of system components.
Prototype models are poor on theoretical insight, but can be used to assess
performance over specific scenarios. Other, more abstract, models allow some
degree of description and reasoning about what a system might or should do in
particular circumstances. The second class of models are those that address the
cognitive factors within interaction. While these may be grounded in cognitive
psychology, they are often either highly operational, for example GOMS
(Kieras & Polson, 1985), or are expressed in a way that hides the underlying
theory beneath rules or guidelines, for example Cognitive Walkthroughs
(Wharton et al, 1994). In either case there is a common problem: human
5
computer interaction is a phenomena involving (at least) two agents - the
‘computer’ system and its human user, and therefore any theoretically-based
model of interaction must draw on insight from both the cognitive and system
perspectives.
In practice designers cannot rely on a single model, but must draw on a variety
of perspectives, particularly in systems that recruit users' cognitive abilities to
understand and interact with software artefacts using novel metaphors, or in
contexts where security or human safety is at risk. The use of disparate
techniques does give extra leverage on design problems, but at the cost of
introducing a new problem: that of integrating the different modelling products
and viewpoints. One reaction to this problem has been the development of
Design Rationale (DR) frameworks, see for example (MacLean et al., 1991).
While DR provides a flexible means of relating modelling output in the context
of design, it does so by separating recommendations from their theoretical
underpinning. Thus it can be difficult to understand why a particular problem
exists, or what changes in either system or user performance might overcome a
design problem or improve usability.
This paper introduces an approach to modelling interaction that integrates the
contrasting representations of user and system theories. By using a common
language to represent the user and system, the approach allows properties of
interaction to be described and understood in terms of the conjoint behaviour of
both agents. We use the term syndetic model to describe this new approach, to
emphasise its bringing together of previously disparate methodologies. Section
2 sets out, in more detail, the case for such an integrated approach to humancomputer interaction, and outlines existing approaches to this problem. It also
6
outlines the motivation for using mathematical structures as the basis for the
common representation.
Theoretical developments in HCI have tended to lag behind technological
innovation. We believe that the framework presented here is important in part
because it can address issues raised by the use of novel interface technologies.
In support of this claim we have chosen two such areas, multimodal input and
gestural interaction, to illustrate syndetic modelling. We begin however by
describing the approaches in isolation. Section 3 shows how mathematical
structures and logic can be used to represent the behaviour of an interface. As an
illustration, we model an experimental interface for multimodal input with
deictic reference (Nigay, 1994; Nigay & Coutaz, 1995). However, although the
expressiveness of the approach makes it possible to represent rich and
potentially complex interface behaviour like deixis, the model itself has no
foundation for supporting claims related to usability. One framework that has
been used to investigate usability from a cognitive perspective is Interacting
Cognitive Subsystems (ICS), which is introduced in Section 4. ICS (Barnard &
May, 1993; Barnard & May, 1995) is a broad framework for understanding
human information processing in terms of the mental representations and
resources that a person needs to deploy in performing particular tasks.
However, in common with other cognitive modelling approaches, ICS does not
itself provide representations for the environment in which cognitive activity is
located.
The unifying step, taken in Section 5, is to show that the structure and
principles of the ICS system can be expressed in the mathematical framework
used to model the interface. It is this expression of both user and system
7
requirements within a common mathematical framework that we call syndetic
modelling. Armed with this new framework, in Section 6 we return to the
example of multimodal input and deixis. We identify a potential difficulty in
employing deictic reference within the system, and by locating the source of the
problem within the model, suggest a modification to the design that could
alleviate the problem.
Multimodal interaction is an important example because it is one of a growing
number of techniques that does not fit well into ‘stimulus-response’ models of
interaction that have served well for other technologies. Syndetic modelling
itself makes no commitment to any particular model of interaction, and can
therefore accommodate diverse technologies. The expressive power of
mathematical modelling means that a range of abstractions over device
behaviour can be constructed and situated within the framework. To illustrate
this, we show in Section 7 how the cognitive part of the model can be reused in
the context of a different technology, gestural interaction. Multimodal and
gestural interfaces have different characteristics in terms of the resources they
demand of the user, and the analyses show why user and system models are
insufficient in themselves to resolve the design issues contained within each
problem. The paper concludes with a description of ongoing research which
aims at extending the theory into a general model of information processing
within cognitive and computational systems.
2. THE NEED FOR MULTI-DISCIPLINARY INSIGHT
Human-computer interaction has developed from considering the layout and
operation of simple graphical or textual interfaces to examine technologies such
8
as media-spaces, gesture, and multi-modal interfaces. These touch on a broad
range of issues ranging from ‘hard’ system constraints, through human
perception and cognition, to broad social issues concerning the use of
appropriate systems to support workers' needs. Insight into the design of
interactive systems needs to draw, at times, on any or all of these disciplines.
Forging a coherent explanation of a design problem from these disparate
insights has only recently been addressed by, for example work on design space
representations (Bellotti, 1993) and collaborative modelling (Young & Abowd,
1994). Building a coherent theory that accounts for a range of design problems
represents a further step. The need to take this step, and its overt structure, are
the topic of this section.
2.1. Why Theoretical Integration Matters
In the introduction we suggested that Design Rationale approaches address the
integration problem, but at the cost of separating modelling results from
modelling theory. If theoretically-based modelling is to provide designers with
insight into how to improve an artefact, then the theory underlying the analysis
must be brought forward so that it can be inspected to reveal why a design issue
is problematic and what modifications of the design would address the issue.
May & Barnard (1995) argued that the trend towards HCI design methods that
dissociate evaluation techniques from their theoretical foundations was
problematic, in that they undermined the supportive role that evaluation should
play in redesign, and disguised the contribution that theoretical work in HCI can
make to practice. More fundamentally, we suggest that some form of integration
of user and system representations is necessary even to describe (let alone
9
resolve) certain design issues. An illustration of this problem can be found in
the design of systems that employ 3D-graphics, for example (Duke, 1995a).
One task that such a system needs to support is the presentation of a
3-dimensional scene (part of the application state) as a 2-dimensional image
presented on some output device. For our purposes, the scene can be
considered as a collection of geometrical solids. An abstract view of the
information contained and presented by the system is shown in Figure 1.
FIGURE 1 ABOUT HERE
Using a model or theory of presentations, (for example, Duke & Harrison,
1994b), the ‘image’ might be described as a collection of three objects (the
cube, cone, and sphere). It is possible to state requirements involving this
presentation, for example that the presentation of the sphere should be ‘hidden’
behind that of the cube. This constraint however makes no mention of what
information a user of this system should perceive, and provides no basis for
arguing about whether the proposed form of ‘hidden surface removal’ is either
necessary or sufficient for the user’s tasks. Other design options should perhaps
be considered, for example
•
the provision of reference lines, as in the ‘picture’ of the internal state of
the scene in Figure 1;
•
the use of shadows or variable colour intensity;
•
allowing the user to change the viewpoint, i.e. to view the scene from
different angles.
In other words, a designer should consider whether the presentation of a system
will allow the user to construct an appropriate ‘mental representation’ of the
10
relevant state. To express and reason about these issues with any precision, a
designer will need some appropriate representation of human cognitive abilities.
And as the choice of representation has consequences for the design of the
system, for instance extra functionality to support navigation, the description of
cognitive assumptions should be linked to the model of system requirements.
This paper proposes that user and system modelling can be represented within a
common framework in order to address these issues. By expressing the
concepts and insight of a cognitive model as an approximate formal theory we
aim to achieve two results.
1. To provide a representational framework that can accommodate both the
information carried and presented by the system, and that perceived and
understood by the user. Indeed, we suggest that understanding
properties of interactive systems in terms of the conjoint, syndetic,
behaviour of user and system models is ultimately a necessary step for
HCI, given the development of interactive technology that relies
increasingly on latent human cognitive abilities (Barnard & May, 1995,
in press).
2. Given a suitably expressive framework, we aim to link the analysis of
user and system models directly, by-passing the use of ‘intermediate’
design representations that occlude the underlying theory.
To understand why these results are potentially important, note that empirical
approaches play a role in HCI analogous to the testing phase of the software
development cycle. Both are intended to assess the fitness of some artefact for a
given purpose, and involve carrying out a series of experiments to test or allow
the formulation of some conjecture about the system. In the case of software
11
testing, either the specification of the system or the software artefact itself are
used to generate the test cases, and if an experiment refutes a given conjecture,
then the rationale that lead to the construction of that test case can be used to
determine the cause of the failure. In contrast, the refutation of a conjecture
within experimental HCI then raises the question of why a particular outcome
was observed, or how the behaviour of the device or its user(s) might be altered
to address a problem. Without a body of theory in which to situate the
conjecture and experimental results, it is difficult to see how improvements in
either a specific artefact or a body of theory can be made other than through adhoc change and intuition.
In summary, our view is that multi-disciplinary co-operation is vital if HCI is to
develop a cohesive and applicable theory that addresses the design of advanced
interfaces. Syndetic modelling aims to achieve this integration at the level of the
theoretical models, rather than through the results of modelling.
2.2. Why Mathematics Matters
In most branches of science, the development of fundamental theory has both
prompted and drawn on advances in mathematical models and methods.
Mathematical techniques are routinely used in disciplines such as physics,
biology, engineering and economics as tools for modelling complex systems
and deriving predictions from such models. The concepts and structure of
mathematics, and the notation used to denote these, have evolved over millennia
often through the challenge of better understanding natural phenomena and of
making more precise statements and predictions about the behaviour of
artefacts. The development and use of mathematics for the description of
12
software and hardware systems has also been widely reported, including
applications specific to human-computer interaction (see for example Dix, 1991;
Duke & Harrison, 1993; Paternó & Palanque, 1997, and in particular the series
of annual Eurographics workshops on Design, Specification and Verification of
Interactive Systems, e.g. Harrison & Torres, 1997). While there is an ongoing
debate about the role of these mathematical techniques in the routine practice of
software development (see for example Saiedian, 1996), several key advances
in understanding complex problems in computing have come about through the
development of mathematical abstractions (Hoare, 1996; Milner, 1993).
The need within software development to (i) carry out rigorous checks on
models, and (ii) support the development of models using software tools,
requires that the mathematics used for software specification be defined to a
level of rigour that is not commonly applied in other areas. The results are the so
called ‘formal methods’, such as VDM, Z and CSP, that define collections of
mathematical structures with a specific syntax and semantics for use in systems
modelling. One difference between mathematical models of computing systems
and those developed in other domains is that of scale. Equations in physics even complex expressions - typically involve a small number of observables,
for example energy, velocity, mass, and/or derivatives of these. In contrast, the
description of a computing system can require a comparatively large number of
observables to characterise its state space and behaviour. The difficulty this
causes for developing, presenting and reasoning about a model has lead to the
development of notational frameworks for structuring the mathematical
description of software systems. Some of these are quite abstract, and are
13
designed to support compositional development and reasoning about systems,
for example the schema notation of Z (Spivey, 1992).
Any description of interaction requires some representational framework, be it
diagrammatic, informal, tabular or mathematical. Through syndetic modelling
we aim to understand the constraints and principles that govern human
information processing in the context of a given device. To achieve this we
require a framework that can capture both the behaviour of a range of
technologies, and also the constraints imposed by a specific model of human
information processing. From previous experience we know that the
mathematical techniques developed for software specification can capture the
behaviour of the devices, and consequently these techniques have been adopted
as the basis for syndetic modelling.
Obviously, building a model of any system involves selecting abstractions, and
encoding these within the available representations. Mathematical abstractions
also have limits; there are aspects of systems for which suitable mathematical
structures are either unknown, or where the cost and complexity of their
development outweighs the insight that the model will generate. There is a
concern that mathematics by its nature presents a barrier to the non-specialist, or
that it does not adequately cover certain aspects of interaction that are clearly
important in understanding the use of a system (aspects of the work domain
being a particular example). The first point is answered, in part, by noting that
the imprecision and ambiguity of non-mathematical representations can also
creates barriers to understanding. More fundamentally, we would expect few
designers to wish or be able to apply the approach in its current form. At
present, it is a theoretical tool for understanding the role of cognitive resources
14
in interaction with novel technologies. It would be pointless to use such a
framework for design problems for which there is already a body of practical
understanding or design techniques; an example of using the proverbial
sledgehammer to crack a walnut. However, as we will demonstrate, there are
interactive technologies emerging from research work that can benefit from the
kind of analysis that syndetic modelling is able to provide. The history of
applied mathematics suggests that the technique will be adopted by designers as
and when the perceived benefits outweigh the initial costs of learning the
foundations. This, however, is not of concern within the present paper. As for
the second point — syndetic modelling is not an attempt to create a
representation of interaction that is ‘complete’ in any sense. Developing artefacts
in any discipline involves the construction of a range of models, from abstract
mathematical descriptions through to concrete prototypes or informal
arguments, each with their own specific role or function. A syndetic model is
just one of these models; its role is the analysis of the conjoint behaviour of a
human user and an artefact in the environment of that user.
To summarise, mathematics is used in syndetic modelling because it provides an
expressive and concise means of describing the behaviour of complex systems,
at a level of precision above that of diagrams or text. By working with a
representation that is independent of any specific notion of interaction or device,
the theories that we develop with syndesis will be more easily reused and
generalised across domains. In the sections that follow, we will introduce the
particular kind of mathematics needed for this paper, and will show how it can
be used to describe an interface.
15
2.3. Recent Work
The design space of integrative frameworks can be characterised in terms of a
power-generality trade-off (Blandford & Duke, 1997). In this view, syndesis
falls at the opposite end of the spectrum from design rationale approaches that
endeavour to capture modelling results but which provide no means of relating
them to the underlying theories involved. The presence of a psychologically
grounded cognitive model in the approach also distinguishes syndesis from a
number of recent attempts at extending models of the interface to accommodate
claims about user behaviour (see for example Moher et al, 1996).
Two particular examples of approaches that attempt to put integration on a
theoretical basis reveal some of the further trade-offs that are involved. The
approaches are the Interaction Framework (IF) (Blandford, Harrison &
Barnard, 1995) and the production system model described by Kieras and
Polson (Kieras & Polson, 1985). The former is intended to provide an agentneutral view, abstracting away from any specific user or system representation
by working with a notion of event trajectory, a set of events ordered in time
representing communication acts between agents in the system. In contrast,
Kieras and Polson combine a production-system model of the user with a GTN
representation of the system to obtain a detailed operational model of both
agents. While both of these methods have contributed to a multidisciplinary
understanding of HCI, they represent extreme approaches. IF operates by
abstracting away from all details of a model that might bias the interpretation of
an agent towards either user or system. In particular, the states of agents are not
represented, making it difficult to express all but quite simple external properties
of behaviour. At the other extreme, the highly operational nature of the GTNs
16
used by Kieras and Polson means that properties and requirements on
interaction are obscured by the detail needed to express the low-level behaviour
of user and system: the only way to understand such a model is to execute it.
Syndesis represents a third, intermediate, level that integrates the contrasting
representations of user and system theories into a single model. By expressing
the behaviour of user and system in a common language, properties of an
interactive system can be described and understood in terms of the conjoint
behaviour of both agents. This is qualitatively different from the approach of
Kieras and Polson in that the model is fundamentally not executable, and cannot
therefore be used for simulation. The loss of executability is a consequence of
using a much richer and more expressive language.
3. A MATHEMATICAL MODEL OF A MULTIMODAL
USER INTERFACE
In this section we demonstrate the use of mathematical techniques to develop the
specification of an interface that accepts multimodal input. MATIS (Nigay,
1994; Nigay & Coutaz, 1995) is an experimental system to investigate this
technology, in particular the use of deictic references in which, for example,
spoken information can be combined with gesture via a mouse to produce a
single command for an application. The domain of MATIS is flight information;
the system allows a user to plan a multi-stage journey by completing ‘query’
forms that can be used to search a database for matching flights. The forms can
be completed using multiple modalities, either individually or in combinations.
The example in Figure 2 shows a user combining spoken natural language with
mouse-based gesture to fill in the second query template. On the left hand side
17
the user has begun to speak a request that contains a deictic reference, “this
city”, which is resolved when the user clicks on a field containing a city name;
the right hand side of Figure 2 shows the query form after the system has
interpreted the user’s input.
FIGURE 2 ABOUT HERE
A detailed mathematical model of MATIS will not be developed here; this has
been done elsewhere (Duke & Harrison, 1994a). The model of MATIS
developed here is minimal, being sufficient just to illustrate the approach, and
the questions that arise concerning usability. It has been mentioned that the
specification of computing artefacts benefits from the use of structures that
organise the mathematics into useful abstractions. For work on interactive
systems, structures called interactors (Duke & Harrison, 1993; Duke &
Harrison, 1995b) have been developed. The use of interactors for identifying
design issues and suggesting or evaluating design options has been explored in
a number of case studies; see (Duke & Harrison, 1995a; Duke & Harrison,
1995b). The role of interactors within multidisciplinary modelling has also been
described; for this, see (Bellotti et al., 1996) and (Buckingham Shum et al.,
1996).
An interactor consists of an internal state representing some facet of the
application domain, and a presentation that describes the perceivable
components of that state. As interactors are only a framework for a
specification, they can be used with a variety of mathematical techniques for
modelling behaviour. The approach taken here follows widespread practice (see
for example Spivey, 1992) in using discrete structures such as sets, functions,
relations and sequences to describe the structure of the state space of a system,
18
as well as the presentation. Invariants (properties of the state) and dynamic
behaviour, i.e. the evolution of the system through its state space, are described
here using Modal Action Logic (MAL) (Goldsack, 1988; Ryan, Fiadeiro &
Maibaum, 1991; Kent, Maibaum & Quick, 1993). This, rather than the
operation notation of Z or VDM, has been used as the axioms required for these
examples can be stated and documented concisely. A description of the
mathematical structures and notation used in the paper is given in Appendix A.
The process of building a mathematical model of an interface is little different
from that of modelling any other phenomenon. It begins with the identification
of those aspects of the system that we want to model, and the definition of the
concepts needed to describe those features. In the case of MATIS, we are
interested in how a user might carry out the task of constructing a query using a
combination of speech and gesture. Consequently the state of the model
encompasses the contents of the data fields on the form, and the data from the
input devices used by the system. We assume that there exists a type (set of
values) called ‘name’, representing the names of fields on the forms on the
MATIS interface. The type ‘name’ is a ‘given’ type. This means that we indicate
simply that some set of values exists; the structure of the values within the set is
not of interest. A second given type, ‘data’, is similarly introduced to represent
the set of values that might be provided by the user, either by speech or by
pointing.
To model the fusion of information from separate data streams we will represent
both speech and mouse data as a sequence of values. For speech, this sequence
will contain pairs, each consisting of a database field name and an optional data
value. A ‘missing’ data value (represented by the symbol ‘nil’) in the speech
19
stream will indicate that the user has employed deictic reference; the data for that
field will be provided on the mouse stream. The mouse data stream itself is just
the sequence of values that have been selected. So for example, if the user utters
the query “Flights from this city in the morning to this city”, while using the
mouse to select the values ‘London’ and ‘Paris’ on the display, the
corresponding data streams might look like:
speech = 〈(From, nil), (Time, morning), (To, nil)〉
mouse = 〈London, Paris〉
More generally, we define the type ‘value’ to be the union of ‘data’ and the
constant ‘nil’. A ‘slot’ (on a form, or on the speech input stream) then is a pair
consisting of the field name and a corresponding value.
value =ˆ data ∪ {nil}
data =ˆ name × value
In a comprehensive model of MATIS it is convenient to distribute the
specification across a number of interactors (Duke & Harrison, 1994a). Here
however it is simpler to work with just one interactor, the state of which
consists of five components, or attributes. These represent
•
the content of each query form;
•
the identity of the query form that the user is constructing;
•
the sequence of input received along both speech and mouse data
streams; and
•
the query form that would result from the ‘fusion’ of the two input data
streams.
20
The text of the mathematical specification begins as follows:
interactor
MATIS
attributes
vis
fields
: qnr x name →
+ value
- query content
vis
current
: qnr
- current query
mouse
: seq data
- data stream from mouse
speech
: seq slot
- data (and holes) from speech
result
: name →
+ data
- outcome of resolving deixis
The annotation ‘vis’ is used to indicate that a particular observable is part of the
visual presentation of the system. In this case, both the chosen user and the
enabled buttons are (potentially) perceivable by the user of the system. The
annotation indicates that when these components are perceivable, this is via the
‘visual’ modality. Observables in the presentation are called ‘percepts’ (Duke &
Harrison, 1994b). However, just because a percept is defined in the state
doesn't mean that it is always perceivable; the conditions under which a percept
could be perceived are included in the axioms of the system.
The dynamic behaviour of the MATIS system is described in terms of a number
of actions. Like other observables, each action has a signature which indicates
the kind of information that is involved in the action. Four actions are defined
on the MATIS interactor. The first two actions relate to use of the speech and
mouse modalities, and as indicated by the annotations, will be effected by the
articulatory and limb channels of the user. The third action, ‘fuse’ is used to
define the effect of performing fusion on the data streams. We will not discuss
when fusion should be carried out, or how the results are inserted into the
surrounding application by the ‘fill’ action.
21
actions
art speak
:
name x value
lim select
:
data
- articulate a data value
- select a data value
fuse
fill
- fuse input streams
- fill in slots on a query form
The remaining part of the interactor is the collection of axioms that inter-relate
the observables of the system. In this paper the axioms are expressed in modal
action logic. This extends the usual connectives and quantifiers of first order
logic with a modal operator [A] for each action ‘A’, and two deontic operators
that can be used to express that an action is either permitted or obliged under
particular conditions. The meaning of each axiom is explained in the
accompanying commentary. Axiom 1 defines the effect of the ‘speak’ action on
the speech data stream. If the value of the speech stream is X, the axiom
requires that the effect of speaking a name-data pair is to append that pair to X.
axioms
speech = X ⇒ [speak(nm,d)] speech = X^〈(nm, d)〉
(1)
If the speech stream holds X, then speaking a name/data pair
results in a speech stream with that pair appended to X.
Axiom 2 defines similar behaviour for the ‘select’ action, though here the new
value is a data item that is appended to the stream of mouse input.
mouse = M ⇒ [select(d)] mouse = M^〈d〉
If the mouse stream holds M, then selecting a data item d results in
a stream in which d is appended to M.
(2)
22
The third axiom is an invariant, that is a property of the system that is true over
all time. When expressing properties of percepts, we enclose attributes in boxes
to indicate that it is the perceivable representation of the value, rather than the
value itself, that is being referred to. It requires that all the data in the current
query is available in the presentation of the interactor:
∀n : name • (current, n) ∈ dom fields ⇒ fields(current, n) in MATIS
(3)
For any field name ‘n’, if there is an entry in the fields of the form
labelled ‘current’ for n, then that data is part of the presentation of
MATIS.
Before appraising the role, value, and limitations of this model, a few comments
on the mode of expression are required. Each part of this particular model has
been given as a fragment of mathematical text, accompanied by a significant
chunk of explanatory prose. It may be tempting therefore to question whether
the mathematics is actually needed, or whether it is just formalism for its own
sake. One way of understanding why this is not the case is to take a
mathematics textbook and find a worked problem. The ‘trick’ of annotating each
formula, used in this example, can be applied just as well to any other piece of
mathematics. Can we therefore conclude that mathematics (or, at least,
mathematical notation) is unnecessary (or formalism for the sake of formalism)?
Hopefully, the answer is ‘no’; this will certainly be the case if the reader has
ever used applied mathematics. Applying liberal doses of natural language
commentary to a complete, developed model is straightforward. However, in
comparison with the equivalent mathematical text, linguistic representations are
cumbersome, prone to greater ambiguity, and significantly more difficult to
manipulate in any systematic way. Thus mathematical representations are more
23
than just terse versions of natural language statements; crucially, the notation
affords the definition and application of operations on representations in the
form of calculation and proof. In the example so far we have simply used the
mathematics to describe a situation; what will be demonstrated later in the paper
is how this description can be manipulated to derive proofs of properties that
can be interpreted as interesting statements about the usability of the artefact they
model.
As shown in (Duke & Harrison, 1994a), the model can be extended to
encompass a detailed description of the components that make up the MATIS
interface, and details such as the effect of the ‘fuse’ action can be included.
These are beyond the needs of this paper. The development of a model like this
can be a useful source of insight in development; by encouraging a developer to
document the structure and behaviour of an interface explicitly, latent questions
and ungrounded assumptions can be teased out (Bellotti et al., 1996;
Buckingham Shum et al., 1996). What the model does not (and cannot) address
is how the information provided by the system can or should be understood by
users, and how users' perception of the system will mediate execution of the
tasks for which the system was designed. For example, will users be able to
utilise the deixis capability? As interactive systems make increasingly rich use of
different modalities, and rely more on users' often latent knowledge of the
world (Barnard & May, 1995), these questions are increasingly beyond the
capability of any one modelling approach. To answer questions about the
usability of the system captured in the specification, we will need to work
within a framework that can make authoritative statements about human
capabilities and limitations.
24
4. INTERACTING COGNITIVE SUBSYSTEMS (ICS)
In contrast to models of cognition which seek to simulate the thinking of an
individual person processing some specific information, ICS is a resourcebased framework which describes the overall pattern of information flow
through the human cognitive architecture. Cognition is represented as the
exchange, storage, revival and transformation of information by nine
independent cognitive subsystems, each of which deals with a different level of
mental representation. In principle, this modularity allows the processes within
each ICS subsystem to be modelled mathematically in the same way that
components of the system can be modelled. This section presents an overview
of the key aspects of ICS, which will be taken up in the subsequent modelling
examples.
As an overall theory of cognition, ICS consists of ‘architectural’ constraints on
information flow, the use of memory, and the blending of information streams,
as well as ‘local’ theories for specific types of information processing. It
describes cognition in terms of interaction within a collection of sub-systems
that each operate on a specific level of mental representation, or ‘code’.
Although specialised to deal with specific codes, all subsystems have a common
architecture, shown in Figure 3.
FIGURE 3 ABOUT HERE
Incoming representations in the appropriate code arrive at an input array, from
which they are copied into an image record representing an unbounded episodic
store of all representations received by that subsystem. In parallel with this basic
copy process, each subsystem also contains transformation processes that have
learned to convert incoming representations into other mental codes (although
25
not all subsystems can produce all other codes, as will be described below).
These transformed outputs are passed through a data network to other
subsystems. If the incoming representation is incomplete or ‘noisy’ a
transformation process can augment it by accessing similar patterns stored in the
image record (in Figure 3, ‘transform C to X’ is accessing the image record).
This allows the revival of both specific instances of stored representations, and
regular patterns abstracted from representations that have arrived at different
times.
ICS assumes the existence of nine distinct subsystems, each based on the
common architecture described above, and linked together as shown in Figure
4. The subsystems can be defined by the nature of the representations that they
process, and these are briefly described in Figure 5.
FIGURE 4 ABOUT HERE
FIGURE 5 ABOUT HERE
4.1. Configurations
The overall behaviour of the cognitive system is constrained by several
principles of processing (for more detailed information, see Barnard & May,
1993, 1995, in press). The most important concept that is significant for our
understanding of interaction is that of configuration. This describes the way in
which cognitive resources are deployed for a particular processing task. As an
example, the lines in the ICS diagram in Figure 4 show how the subsystems
would be configured for locating some graphical object though gesture, for
instance pointing to an icon on a display using a mouse.
26
In order to locate the icon, information arriving at the visual system (1) will be
transformed into object code (2) that contains the basic organisation of visual
elements on the display. This transformation is written as ∗vis-obj: where the ∗
indicates information being exchanged with the external world (i.e., arriving
from the senses), and the : indicates information being exchanged internally. At
the same time, the propositional subsystem is copying information about the
desired target (3) to its image record, and using :prop-obj: to produce an object
code representation (4). When this representation can be blended at the OBJ
subsystem with the incoming representation from ∗vis-obj:, :obj-prop: will be
able to return a matching representation (5) to the propositional subsystem to
indicate that a possible target has been found. Finally motion of the mouse via
the hand is controlled by the limb subsystem through :obj-lim∗ (6).
While this configuration is actively locating an object, a second sequence of
processes could be engaged in producing spoken output, such as “now where is
that icon?” This would require the :prop-mpl: process (7) to produce a
morphonolexical structure to drive the generation of speech (8) via :mpl-art: and
:art-speech∗ processes (the latter actually being a set of :art-lips*, :art-toungue*,
:art-breath*, etc processes). The occurrence of secondary configurations such as
this is constrained by the fact that a transformation process can only operate on
one coherent representation at a time (although, as is described in the next
section, that representations may be a blend of information from multiple
sources) and so can only produce a single output representation. Furthermore,
the number of possible configurations is limited by the nature of the output
codes each subsystem can produce. Figure 4 indicates which outputs are
possible from each subsystem.
27
4.2. Blending of Data Streams
Because it deals with information flow, ICS is well placed to model multimodal
cognition. As shown in Figure 4, visual and acoustic representations can be
structurally interpreted (by the ∗vis-obj: and :ac-mpl: processes) before being
represented in propositional form (by :mpl-prop: and :obj-prop: processes). At
the same time, the higher level, implicational meaning of these representations
can be directly brought together (by the ∗ac-implic: and ∗vis-implic: processes),
and these can give rise to an internally generated propositional representation
(by :implic-prop:).
A consequence of this is that the input arrays of the central subsystems will
receive several representations, from different sources. Within the subsystem,
each transformation process can independently ‘lock on’ to any part of the input
array, and so will base its output on a subset of the total information that is
available in its own code. The subset that is used can in principle be formed
from any part of the input array, and so can comprise information from more
than one source. It should be noted that as well as having beneficial effects, this
can also result in interference, if a ‘coherent’ representation is actually derived
from an irrelevant source.
4.3. Cyclical Configurations
Coherent representations can be produced when events in the world are detected
through more than one sensory modality, or when single sensory
representations are consistent with ongoing central processing. The latter
situation often occurs when transformation processes form a cyclical
configuration. In the example shown in Figure 4, the :obj-prop: and :prop-obj:
28
processes are involved in a reciprocal exchange of object and propositional
representations (notationally abbreviated as a ‘POP loop’).
One effect of cyclical configurations is to maintain the stability of processing
over time, with new information being interpreted in the context of ongoing
cognition. They also allow ‘top down’ influences on the interpretation of
sensory data to occur. Two other important cyclical flows involve the
:prop-mpl: and :mpl-prop: processes (a ‘PMP loop’) for the semantic processing
of verbal information, and the :prop-implic: and :implic-prop: processes (a ‘PIP
loop’) for the schematic comprehension and interpretation of propositions. This
latter cycle is so important in the formation of internal goals and the regulation
of mental states that it is known as the ‘central engine’ of cognition (Teasdale &
Barnard, 1993).
The visual and acoustic subsystems can not take part in direct, internal cycles
because their input arrays do not receive information from the data network, but
only from the external world. They can participate in indirect cycles if they are
being used to observe or listen to the individual’s own actions or speech. The
body state subsystem can also take part in indirect cycles, since the bodily
consequences of the outputs produced by the articulatory and limb subsystems
are sensed as body state representations. These can be used by the ∗bs-art: and
∗bs-lim: transformations to provide proprioceptive feedback, which is useful in
co-ordinating speech and motor output. It can also detect the somatic and
visceral outputs from the implicational subsystem, and produce feedback by the
∗bs-implic: transformation. This ‘BIB’ loop mediates affective influences on
cognition, such as those of mood, emotion and state dependent effects.
29
4.4. Buffering of Transformations
At steps (3) and (4) in Figure 4 the :prop-obj: transformation is not operating
directly on information as it arrives on the input array of the subsystem, but is
accessing it indirectly, after it has been copied to the image record. This
‘buffering’ of a process allows it to operate on information that is arriving too
quickly for normal, direct processing to cope with, and also allows a form of
pre-processing to occur in the image record, so that the process actually operates
on information that has been integrated over a succession of representations
received on the input array. As with cyclical processing, this can provide
additional stability of processing, in that short-term ambiguities in data can be
overcome, and it also allows a process to continue producing its output at its
own steady time-base.
Because buffering requires access to the image record, and a subsystem’s image
record can only revive information for one transformation at a time, it follows
that only one process within a subsystem can operate in buffered mode at a
time. In fact, because a buffered process operates on its own time-base, freed
from the input rate of information, it determines the rate of flow of information
through the rest of a configuration. It is thought that a configuration can only
contain one buffered process without these temporal constraints resulting in it
becoming unsynchronised and breaking down. This is consistent with the
theoretical assumption that buffering is associated with focussed awareness of
information (as distinct to the copy process, which results in diffuse
awareness), for phenomenologically we are only able to focus on one stream of
information at a time (while we remain diffusely aware of others). The
30
theoretical and architectural limitations on buffering provide another important
constraint upon the configurations that can co-occur.
4.5. Interleaving and Oscillation
The configurations described so far come into being because the flow of
information through the overall architecture both requires them to occur and
supports their occurrence. If the representations on the input array of any
subsystem fail to provide enough information for a process in a configuration to
produce an appropriate output, the configuration can collapse; and if other
representations provide a stronger input to a process, they can displace the
‘configural’ representation, leading to an unexpected output and a change in the
configuration as subsequent processes lock on to other sources of information.
Since mental representations can arise from sensory or central data streams,
human cognition is thus dependent upon both external and internal control,
although there is no controlling ‘agent’ as such: control arises from the dynamic
interaction of subsystems operating independently and in parallel.
While two (or more) configurations can operate simultaneously if they do not
require conflicting resources, it is more likely that at least one shared process,
image record access, or buffer is required. In these situations the configurations
must interleave, with each information flow taking ‘control’ of the architecture
for as long as it can before the demands of the other information flow overcome
it. Interleaving can be thought of as a form of ‘multitasking’, where each task is
processed in alternation rather than truly simultaneously. The rate of interleaving
depends upon the relative ‘strength’ of the configurations, determined by the
degree to which their respective information flows support them, and it is
31
possible for a weaker configuration to be frozen out until the stronger
configuration fades or concludes.
Interleaving occurs when the competing configurations are operating on
different information flows, producing different effector or central ‘outputs’.
Competition for resources can also occur within a configuration, when, for
example, two processes within a subsystem need image record access, or two
processes require buffering. In these situations, the ‘shared’ resource has to
oscillate between the competing processes. Since they are both part of the same
configuration, it is not possible for one process to gain control of the shared
resource for very long before the absence of output from the other process limits
the stability of the information flow. Oscillations between the processes are thus
self-limiting, and less ‘competitive’ than interleaving of configurations.
The amount of interleaving and oscillation that a task involves can be used as an
indication of its cognitive complexity, with greater requirements for configural
changes tending to result in slower overall rates of processing, less capacity for
other simultaneous task performance, and an increased likelihood of errors
occurring through poor synchronisation of transformation processes and the
exchange of representations between subsystems.
5. A MATHEMATICAL MODEL OF ICS
As a cognitive model, ICS encapsulates a substantial body of theory about
human information processing. Experiences in the application of theoretically
based models to problems of user interface design have been reported for
models such as PUM (Blandford & Young, 1995), and ICS (Barnard & May,
1993). However, these analyses are primarily concerned with the cognitive
32
aspects of a particular scenario. The techniques themselves are not intended to
provide general frameworks for modelling the behaviour of computing systems.
Syndetic modelling is intended to provide a single framework to represent the
behaviours of both cognitive and computational systems and in so doing allow
both software and cognitive perspectives to brought to bear on problems of
interaction. In this way, the assumptions and insights of both parties can be
represented and considered explicitly.
Although the idea of bringing together user and system models is in principle
independent of the underlying approaches or representations, we have particular
reasons for selecting ICS as the cognitive foundation for syndesis. First, ICS
has both the breadth of applicability and depth of theory to support the analysis
of the kind of novel and sophisticated technology that is moving out of research
contexts into social and industrial application. Its scope of application ranges
from display structure, through blending of multi-modal data streams, to issues
of affect and emotion that are applicable both to clinical aspects of cognition
(e.g. depression, Teasdale & Barnard, 1993) as well as interface elements
(Barnard & May, 1995). In this section we take the mathematical concepts that
were used to develop a model of the MATIS interface, and with them build a
representation of ICS. We will show subsequently that the mathematics
supports the composition of these two models into a form in which questions
about interaction can be phrased and answered rigorously.
The mathematical development is broken into three sub-sections: the basic
concepts needed to describe ICS, the state space of the ICS architecture, and the
axioms that govern information processing within ICS. Following the
33
presentation of the model, the section concludes with a review of key points
raised by this approach.
5.1. Basic Definitions
Our model of ICS is based around the main resources described by the model transformation processes and mental representations. To begin the process of
formalisation, we define given types (sets) to represent the concepts of
subsystem and representation.
sys
- ICS subsystems, e.g. vis, prop, obj etc.
repr
- Mental representations
In this model we do not address the internal structure of mental representations,
although there is a significant body of psychological theory that we could draw
on in order to do so. For example, (May, Scott & Barnard, 1995) describe a
model in which representations consist of basic units of information organised
into superordinate structures. This level of detail would greatly expand the paper
without adding much insight into the scope or use of syndetic models. Indeed,
one of the advantages of an axiomatic framework is that abstraction can be used
to hide details that do not contribute to the understanding or analysis of a
system. Thus, while coherence of representations depends on several issues,
including the timing of representations, that will not be addressed here, we can
still account for the concept of coherence by defining a relation over
representations.
_≈_ : repr ↔ repr
34
We can then write ‘p ≈ q’ to express the requirement or property that
representations ‘p’ and ‘q’ are coherent. A more detailed model could then
define this relationship in terms of the structure of the representations. Here, we
can require that any representation is consistent with itself (1), and that the
relationship is symmetric (2).
1 ∀r : repr • r ≈ r
2 ∀p, q : repr • p ≈ q ⇔ q ≈ p
As discussed in Section 4.1.2, representations arriving at a subsystem may be
blended with the result that a transformation effectively operates on a datum
derived from multiple sources. Again, as our model abstracts away from the
constituent elements of representations we cannot give an explicit account of
how blending takes place. We can however model its effect on representations
by defining a relation, ‘E’, such that ‘p E q’ means that the representation ‘p’ is
part of ‘q’; in other words, q is the result of blending p with (possibly) other
representations. This relationship defines a partial ordering over representations,
i.e. it is reflexive, antisymmetric and transitive.
_ E _: repr ↔ repr
The next concept to be introduced is that of a transformation between two
mental codes of ICS, for example :obj-prop:. Any transformation can be
identified by the source and destination subsystems, and so the type ‘tr’ of
transformations is modelled as an ordered pair. For convenience, we also define
two functions that extract the first (src) and second (dst) components of a
transformation.
35
tr =ˆ sys × sys
— names of transformation processes
src, dst : tr → sys
This definition is actually rather loose, in that it admits - for example - (obj,vis)
as a transformation as well as (obj,implic). To be rigorous, such non-existent
processes would be eliminated by adding a predicate to enumerate all legal
transformations. Each transformation process within the ICS subsystems
operates on and generates a stream of representations. In most cases, these
streams are carried by the internal data network of the architecture, but clearly if
cognition is to be located in an environment then it must be possible for streams
to both originate (perception) and terminate (action) in the outside world. For
convenience, we will write transformations as :src-dst:, ∗src-dst: or :src-dst*
depending on whether the stream originates or terminates in the external world
(∗tr:, :tr∗) or is completely internal to the data network (:tr:) of the architecture.
That is, ‘∗’ denotes the external world and ‘:’ is the data network. In the
remainder of the paper, we will use the term ‘stream’ to refer to the input or
output of a transformation.
A set of transformations involved in some information processing task is called
a configuration, while the “chain” of transformations involved in information
processing for a particular task is called a flow. For example, Figure 4 shows a
flow containing (amongst others) the following chain of transformations:
〈∗vis-obj:, :obj-prop:, :prop-obj:, :obj-lim:, :lim-hand∗〉
The corresponding configuration includes the set of transformations that appear
in this sequence. In general, a flow consists of a subset of the transformations
that make up a configuration, and a given transformation may occur more than
36
once. Formally, we define the type ‘Config’ to be a set transformations, and the
type ‘Flow’ to be a sequence of transformations.
Config
=ˆ
P tr
Flow
=ˆ
seq tr
5.2. The Architecture
The state of the ICS interactor captures the flows of information involved in
processing activities, and the properties of specific transformations such as
stability and coherence which define the quality of processing, or in other
words, user competence at particular tasks. The source of data for each
transformation is represented by a function ‘sources’ that takes each
transformation ‘t’ to the set of transformations from which ‘t’ is taking input. In
general only a subset of transformations are producing stable output, and this
set is defined by the attribute ‘stable’. The function ‘input’ maps each
transformation to the representation that is currently available to it as input. As
we will see, this input representation may be derived by blending the output of
several other processes.
interactor
ICS
attributes
sources
:
tr → P tr
stable
:
P tr
input
:
tr → repr
The representations being generated by a transformation are given by the
relation ‘_on_’, where ‘p on t’ means that representation p is available as the
37
output of t. All representations arriving at a subsystem are copied to the image
record, and the contents of these records are represented by the attribute ‘_@_’
where ‘p@s’ means that representation ‘p’ is part of the image record of
subsystem ‘s’.
_on_
:
repr ↔ tr
_@_
:
repr ↔ sys
As not all representations are coherent, only certain subsets of the data streams
arriving at a system can be employed by a process to generate stable output. The
set ‘coherent’ contains those groups of transformations whose output in the
current state can be blended. If the inputs to a process are coherent but unstable,
the process can still generate a stable output by buffering the input flow via the
image record and thereby operating on an extended representation. However,
only one process in the configuration can be buffered at any time (this is actually
a simplification for the purposes of this paper, since as explained in Section 4.5
oscillation of the buffer can occur) and this process is identified by the attribute
‘buffered’.
coherent :
tr x tr → B
buffered :
tr
The configuration itself is defined to be those processes whose output is stable
and which are contributing to the current processing activity. This processing
activity, in turn, consists of a set of flows carrying data through the architecture,
and these are represented by the attribute called ‘flows’.
config
:
Config
flows
:
P Flow
38
Four actions are addressed in this model. The first two, ‘engage’ and
‘disengage’, allow a process to modify the set of streams from which they are
taking information, by adding or removing a stream. A process can enter
buffered mode via the ‘buffer’ action. Lastly, the actual processing of
information is represented by ‘trans’, which allows representations at one
subsystem to be transferred by processing activity to another subsystem.
actions
engage
: tr x tr
disengage : tr x tr
buffer
trans
5.3. Information Processing in ICS
The principles of information processing embodied by ICS are expressed as
axioms over the model defined above. Axiom 1 defines coherence of data
streams in terms of coherence of the representations available on those streams.
axioms
coherent(t1, t2) ⇔ dest(t1) = dest(t2) ∧
(1)
∀ p, q : repr • p on t1 ∧ q on t2 ⇒ p ≈ q
Stream, t1 and t2 are coherent if and only if they have the same
destination, and for any representation p available on t1 and q on
stream t2, p and q are coherent.
The second axiom defines the concept of a stream’s stability. This requires that
the inputs to the transformation generating the stream are at least stable.
39
However, coherent input doesn't guarantee stable output, as the input may only
be a partial representation of the data that the process needs to generate output.
If the input is unstable, then the process will need to be buffered. A
configuration is then the set of processes that are generating output that is both
stable and which is used elsewhere in the overall processing cycle.
t ∈ stable ⇔
∀s1, s2 : sources(t) • coherent(s1, s2) ∧
(2)
(t = buffered ∨ sources(t) ⊆ stable)
A transformation ‘t’ is stable if and only if every pair of streams on
which it operates are coherent, and either the transformation is
buffered, or the input streams are themselves stable.
t ∈ config
⇔ (t ∈ stable ∧ src(t) ∉ {art, lim}
(3)
⇒ ∃ s : tr • t ∈ sources(s))
A stream or process ‘t’ is part of the processing configuration if
and only if it is stable and, unless it is part of an effector
subsystem, there is some other transformation ‘s’ that is using the
stream from t.
Axioms 4 and 5 concern flows. Any transformation that is part of a flow must
be part of the configuration, and similarly if a transformation is in the
configuration it must be part of some flow. This is expressed by axiom 4.
Axiom 5 captures the ‘chaining’ property of flows. If two transformations are
adjacent in a flow, then the first transformation must be one of the sources used
by the second transformation. The symbol ‘^’ is sequence concatenation.
t ∈ config ⇔ (∃ f : flows • t ∈ ran f)
(4)
40
A transformation ‘t’ is in the configuration if and only if there
exists some flow ‘f’ that contains t.
∀ s, u : Flow; t1,t2 : tr • s^〈t1,t2〉^u ∈ flows ⇔ t1 ∈ sources(t2)
(5)
For arbitrary flows ‘s’ and ‘u’, and transformations ‘t1’ and ‘t2’,
there is a flow in the system containing t1 followed by t2 if and only
if t1 is a source of t2.
A process will not (normally) engage an unstable stream, a constraint that is
captured in axiom 6 via a deontic predicate. If the output of a process is
unstable, it will either engage a stable stream, disengage an unstable stream, or
try to enter buffered mode (axiom 7).
per(engage(t,
src)) ⇒ src ∈ stable
(6)
A process t is permitted to engage a stream ‘src’ if and only if ‘src'
is stable.
 ∃s : tr • s ∈ stable ∧ s ∉ sources (t) ∧ obl(engage(t,s)) 

∨


t ∉ stable ⇒  ∃s : tr • s ∉ stable ∧ s ∈ sources (t) ∧ obl(disengage(t,s))

∨



 obl(buffer(t))
(7)
If the stream t is not stable, then either (i) there is a stable stream s
that the process isn't using and which it is required to engage, or
(ii) the process is using an unstable stream s and is required to
disengage this stream, or (iii) the process should enter buffered
processing mode.
41
The effects of the buffer, engage, and disengage actions are straightforward and
are given by axioms 8-10.
[buffer(t)] buffered = t
(8)
After the architecture buffers the process t, t is in buffered mode.
sources(t) = S ⇒ [engage(t, s)] sources(t) = S ∪ {s}
(9)
If the sources of a transformation t are given by the set S of
streams, then after t engages the stream ‘s’, its sources will be S
extended with ‘s’.
sources(t) = S ⇒ [disengage(t, s)] sources(t) = S – {s}
(10)
If the sources of a transformation t are given by the set S of
streams, then after t disengages from the stream ‘s’, its sources
will be the set S minus the element ‘s’.
The next two axioms define the effect of information transfer across the
architecture. Axiom 11 describes how the input to a given transformation is
related to the contents of the various data streams, while axiom 12 defines the
action that changes the contents of the streams.
src(t) ∉ {ac, vis, bs} ⇒ ∀s : sources(t) • p on s ⇒ p E input(t)
(11)
If a transformation isn't part of a sensory system, then for any
source s with which t is engaged, any representation p that is on s
will form part of the representation that is input to the
transformation.
t ∈ stable ∧ input(t) = p ⇒ [trans] p on t
(12)
42
If a process ‘t’ is stable and has input ‘p’, then after transformation
a representation derived from p will be on the output stream
corresponding to ‘t’.
The constraint that the input rule in axiom 11 applies only to non-sensory
subsystems is important, as it identifies a key point where a link must be made
between the cognitive processes and the environment or system in which they
operate. That is, the input to the sensory systems can only be defined after the
architecture has been located in some context. It is the counterpart to the
constraint in ICS axiom 3 concerning utilisation of output subsystems. Also,
note the phrase ‘derived from p’ in the comment of axiom 12. Clearly, ICS
processes produce output that is substantively different from the input
representation(s). However, without describing the constituent structures etc of
representations, we cannot describe the effect of processes on the data. The
model presented here focuses on the flow of data through the system and
abstracts away from representations.
The remaining axiom describes the COPY process that is a part of each
subsystem; any representation carried on a data stream will be copied into the
image record of the destination system.
([trans] p@s) ⇔ p@s ∨ ∃t : sys • p on :t-s:
(13)
After transformation, a representation ‘p’ is in the image record at
subsystem ‘s’ if and only if p was either in the record beforehand, or
there is some subsystem ‘t’ such that p is on the stream from t to s.
More axioms have been given than will actually be required to model the two
scenarios in this paper. The point of this model, however, is that it is not
43
specific to the particular problems examined here, but can be applied to a range
of interface techniques, and could be extended to address aspects of cognitive
processing, i.e. utilisation of memory records, in greater detail.
5.4. Key Points
The mathematical model of ICS given above is obviously an approximation to
the available body of psychological theory. We have not, for example,
considered how processes access the contents of their system’s image record
when in buffered mode. For the scenarios addressed here, the primary concern
is with the cognitive resources and their utilisation within an overall
configuration of information processing. The important point is that the
mathematical model is general and can therefore be applied or specialised to a
variety of domains. The axioms developed above are not specific to any one
processing task, and indeed in the remainder of the paper we will utilise
particular axioms in reasoning about patterns of interaction with two quite
different systems. That such a model could be developed is not in itself
surprising. There was already evidence, in the form of two ‘expert’ systems
(May, Barnard & Blandford, 1993), that significant principles underlying ICS
could be represented within a formal (computable) framework. In contrast, the
model developed here utilises the general body of mathematics, rather than the
restricted deduction apparatus that underlies an expert system shell. In the
conclusion we will have more to say about future work on the mathematical
foundations of human information processing.
In closing this section, it is worth noting that one further reason for recruiting
ICS as the foundation for syndetic modelling is that the theory has a
44
fundamentally declarative interpretation that maps well onto the methods that we
had used previously for modelling computing components of interactive
systems. In both cases, observables are used to characterise the intended
behaviour of some system. Mathematical models are insensitive to whether their
subject is computer software and hardware, or cognitive resources, information
flow, and transformation. User and system components both impose constraints
on the processing of information within the overall system.
6. A SYNDETIC MODEL OF MATIS
The specification in Section 3 of MATIS included the capability of the system to
handle deictic input. Deixis is a feature of human-human interaction, so one can
make an informal case that it represents a potentially useful tool for humancomputer interaction. To explore whether this is in fact the case, we will
construct a syndetic model by combining the MATIS specification with the
model of ICS, and then make a conjecture about the conditions under which
deixis will be possible to a user of the system. We will then show how we can
reason within the mathematical theory about the validity of this conjecture.
The original specifications were developed independently of each other, and in
bringing them together into a syndetic model we extend the original models with
additional observables (in this case an action) that captures the interplay between
the two agents. For MATIS, we posit a ‘read’ action that allows the user to
locate some lexical item, such as the name of a city, on the presentation. In a
more substantial system model, this action would be bound to the contents of
the query forms (see Figure 2) that were available at any time on the screen.
45
This degree of detail however is not essential for illustrating the role played by
syndesis in understanding deixis.
interactor
MATIS-User
MATIS
- include the MATIS spec
ICS
- and the ICS framework
actions
read
: data
- observe the MATIS presentation
The conjoint behaviour of the two agents is captured by three axioms that span
the two sets of observables, in MATIS and ICS. The first axiom defines the
condition under which it is possible for the user to read an item of data from the
presentation. On the system side, there must exist a field on a query such that
the value of the field is the data item. On the user side, the configuration must
include a data flow from the visual system, through the object and
morphonolexical levels, to the propositional subsystem.
axioms
per(read(d))⇒
d in MATIS ∧〈∗vis-obj:, :obj-mpl:, :mpl-prop:〉∈flows
(1)
It is possible to read some data item ‘d’ if d is part of a field of a
query in the display and the cognitive configuration enables
reading.
In this scenario we are concerned with the representations that are being
processed within a flow. To capture this idea concisely, we define a relational
symbol named ‘on-flow’ as an abbreviation for a condition involving
components of the ICS interactor:
46
on-flow : repr ↔ Flow
=ˆ r on-flow f ⇔ f ∈ flows ∧ ∀t : ran f • r on t
A representation ‘r’ is on a flow ‘f’ if and only if the flow is part of
the processing configuration, and for all transformations that are in
the range of the sequence defining the flow, the corresponding
representation is available as output of those transformations.
Axioms 2 and 3 address the cognitive requirements associated with the action of
selecting a data item with the mouse, and uttering some part of a query. As
items on the MATIS display are lexical structures, the configuration for object
search used in the AV scenario is not sufficient. The mpl and prop systems need
to be recruited to find lexical objects (words) on the screen and compare them
with the users' goals. This will require that the representation of the word is on
the flow defined by a search configuration suitable for lexicographical data
derived from visual input. For speech, the data flow will begin within a PIP
loop and then will be processed via the mpl and art subsystems to produce
spoken words.
word-search
=ˆ 〈∗vis-obj:, :obj-mpl:, :mpl-prop:, :prop-mpl:, :mpl-prop∗〉
speech
=ˆ PIP^ 〈:prop-mpl:, :mpl-art:, :art-speech∗〉
Note that the final three transformations in the ‘word-search’ flow define a
processing cycle referred to as a PMP loop. That is, propositional information
produced by mpl may be used by processes in prop to construct new mpl
representations. The PMP loop and PIP loop needed for speech indicate a cyclic
47
interchange of representations between two or more processes, as described in
Section 4.3.
per(select(d)) ⇒
d on-flow word-search ∧ d in MATIS
(2)
If it is possible to select an item (with the mouse) then the item
must be part of the display, and a representation of the item must
be processed within a flow configured for lexicographical search
and comparison.
per(speak(s))
⇒ s on-flow speech
(3)
If it is possible to articulate part of a query then a representation of
the phrase must be processed through a data flow that originates as
a PIP loop and then results in the production of speech via the mpl
system.
Since deixis involves operating on two streams of potentially different
representations (one dealing with data to be spoken, the other with data involved
in lexical search), we conjecture that there might be a difficulty in using the
interface if these streams conflict. We construct a hypothesis that in order for the
user to speak a phrase ‘s’ and select a data item ‘d’ concurrently, the
representations of ‘s’ and ‘d’ must be coherent. This can be expressed formally,
as the sequent given below. It reads “From the list of axioms given in the
MATIS-User specification, and the predicate per(speak(s)&select(d)), it can be
shown that the predicate s ≈ d is true”. Note that the axioms of the MATIS-User
specification include those of the interactors that it inherits, i.e. MATIS and
ICS.
MATIS-User ∧ per(speak(s)&select(d)) ≡〉 s ≈ d
48
Mathematical reasoning takes a variety of forms, depending on the structures
involved and the level of rigour required. In this paper we have chosen to use a
specification logic (MAL). MAL subsumes first order logic, and most of the
reasoning that we will do with specifications here just utilises the inference rules
of classical first order logic (Lemmon, 1993). For the first proof, the only
additional rule needed relates to the deontic operator (per ) in the conjecture. As
an aid to readability and understanding, we have opted for a ‘calculational’ style
of presentation, in which the proof is set out much like an algebraic calculation
in ‘standard’ mathematics. The mathematical expressions are interleaved with
text that explains how each step in the argument is supported; consequently, the
argument has the following structure:
Assumption (i.e. per(speak(s)&select(d))
(1)
Explanation of why assumption justifies statement 2.
≡〉 Statement 2
(2)
Explanation of why statement 2 justifies statement 3.
≡〉 …
Explanation of why statement n - 1 justifies conclusion.
≡〉 Conclusion (i.e. s ≈ d)
(n)
To see the analogy with conventional mathematical argument, simply replace the
‘therefore’ symbol (≡〉) in the outline with equality (=). In practice, the
presentation of a proof tends to be terser, and required rather more knowledge
of mathematical reasoning than has been assumed here. There is no reason why
this proof could not be expanded into a more structured and formal account,
using the style of (Lemmon, 1993) or indeed expressed in a form suitable for an
automated verification tool such as PVS (Owre et al, 1995).
49
The proof appears below.
per(speak(s)&select(d))
(1)
The first step is the only step that relies on deontic logic; if is
permissible to do two actions concurrently, then it is permissible to
do each of the actions.
≡〉 per(speak(s) ∧ per(select(d))
(2)
The next two steps use Modus Ponens, the inference law that states
that from P, and P ⇒ Q, one can deduce Q. This is applied first to
axiom 3 of MATIS-User; here ‘P’ is the statement that
‘per(speak(s))’ and ‘Q’ is the consequent part of axiom 3, ‘s onflow speech’.
≡〉 s on-flow speech ∧ per(select(d))
(3)
The same process is repeated using axiom 2 of MATIS-User with
the hypothesis per(select(d)). Effectively, we have expanded what
it ‘means’ for the actions speak(s) and select(d) to be permitted.
≡〉 s on-flow speech, d on-flow word-search ∧ d in MATIS
(4)
The definition of ‘on-flow’ is now expanded:
≡〉 speech ∈ flows ∧ ∀t : ran speech • s on t
∧
word-search ∈ flows ∧ ∀t : ran word-search • d on t
speech is a sequence of transformation processes representing a
flow of representations, and one of the transformations within that
flow is :prop-mpl: Axiom 4 of ICS states that a transformation is
(5)
50
part of a flow if and only if it is part of the overall configuration,
and applying this rule by substituting ‘:prop-mpl:’ for the variable
(t) in the axiom 4 gives:
≡〉 speech ∈ flows ∧ ∀t : ran speech • s on t ∧ :prop-mpl: ∈ config
(6)
∧
word-search ∈ flows ∧ ∀t : ran word-search • d on t
An important property of deduction is that if we can ‘focus in’ on
part of a problem and work on it in isolation. That is, if we have
the statement ‘A ∧ B’, and an axiom or rule of inference allows us
to deduce ‘C’ from ‘A’, then we can re-write the original statement
as ‘C ∧ B’. Another way of stating this is that, if C follows from
A, it also follows from A ∧ B, so if we wish we can focus in on
one part of a statement. Here, we focus on the requirement related
to the configuration:
≡〉 :prop-mpl: ∈ config ∧ ...
(7)
Necessary and sufficient conditions for a transformation to be a
part of the configuration were given as axiom 3 of the ICS
interactor. Substituting ‘:prop-mpl:’ for the variable ‘t’ in that
axiom results in the following:
≡〉 :prop-mpl: ∈ stable
(8)
∧
src(:prop-mpl:) ∉ {art, lim} ⇒ ∃s : tr • :prop-mpl: ∈ sources(s) ∧ ...
We now focus in on the need for stability:
≡〉 :prop-mpl: ∈ stable ∧ ...
(9)
51
The ICS theory provides an axiom (number 2) that gives
conditions necessary for a stream to be stable. The following
sequent results from substituting :prop-mpl: for t in that axiom.
≡〉 ∀s1, s2 : sources(:prop-mpl:) • coherent(s1, s2)
(10)
∧
(:prop-mpl: = buffered ∨ sources (:prop-mpl:) ⊆ stable)
To make further progress, at this point we need to find appropriate
transformations to replace the quantified variables s1 and s2 in statement (10);
that is, we need to see if the model makes any statements about the input being
used by :prop-mpl:. Here we make use of the flow rule (axiom 5 of ICS), using
the predicates about flow introduced in step (5). For example, since
speech
= PIP^ 〈:prop-mpl:, :mpl-art:, :art-speech∗〉
=
〈:implic-prop:, prop-implic:〉
^ 〈:implic-prop:, prop-mpl:〉
^ 〈:mpl-art:, :art-speech∗〉
we can apply ICS axiom 5 by making the following substitutions:
s
a
〈:implic-prop:, prop-implic:〉
t1
a
:implic-prop:
t2
a
:prop-mpl:
u
a
〈:mpl-art:, :art-speech∗〉
We can thus conclude that :implic-prop: ∈ sources(:prop-mpl:). Similarly, by
using
the
‘word-search’
flow,
we
can
conclude
that
:mpl-prop:∈sources(:prop-mpl:). That is, to understand whether :prop-mpl: will
be stable, we need to look at processes :mpl-prop: and :implic-prop: that
52
produce data for it. We now substitute s1 and s2 in (10) by :mpl-prop: and
:implic-prop:, giving:
coherent(:mpl-prop:, :implic-prop:)
(11)
∧
(:prop-mpl: = buffered ∨ sources(:prop-mpl:) ⊆ stable)
ICS axiom 1 gives the necessary and sufficient conditions for
coherence, and so the conjunct coherent(:mpl-prop:, :implic-prop:)
in (11) can be rewritten to:
≡〉

 dst(: mpl − prop :) = dst(: implic − prop :)
⇒p≈q
∧


 ∀p,q : repr • p on : mpl − prop : ∧ q on : implic − prop :
(12)
The first conjunct of this, dealing with the destinations of two
processes, is trivially true from the architecture, and can be
eliminated. That leaves the following requirement:
≡〉 ∀p, q : repr • p on :mlp-prop: ∧ q on :implic-prop: ⇒ p ≈ q
(13)
Substituting the symbol ‘s’ (the phrase spoken by the user) for the
variable ‘p’, and similarly putting ‘d’ in place of ‘q’, in the
predicate gives:
≡〉 s on :mpl-prop: ∧ d on :implic-prop: ⇒ s ≈ d
In statement (5), we have the predicates
• ∀t : ran speech • s on t, and
• ∀t : ran word-search • d on t,
(14)
53
generated from the definition of ‘on-flow’. These are still available
for use; we simply haven't been copying them through each of the
steps. Since :mpl-prop: is in the speech flow, we can conclude
that ‘s on :mpl-prop:’, and similarly, that ‘d on :implic-prop:’. We
can therefore use modus ponens on statement (14) to eliminate the
antecedent, and leave the following:
≡〉 s ≈ d
(15)
Although this presentation of the proof is fairly lengthy, it is quite
straightforward. It demonstrates that it is possible to calculate properties of
human-computer interaction in a systematic way. When expressed in a suitable
form, a proof such as this can be carried out and checked using a theorem
prover or proof assistant such as MURAL (Jones et al., 1991) or PVS (Owre et
al., 1995). Indeed, the simplicity of the above derivation means that it could
probably be discharged with little or no human intervention by most of the
current generation of theorem-proving tools.
This result shows that a user will not be able to articulate a phrase at the same
time as they search for a different result on the display, since the two
representations needed as input to the prop-mpl transformation are not the same.
Thus the syndetic model shows that, in order to employ the resources defined in
the system model, a user of MATIS may have to ‘interrupt’ a spoken request in
order to locate a value for deictic reference. This need to switch processing
mode will be distracting. In fact, the configuration for deixis requires that the
user has a morphonolexical representation of the search target, and as the user is
already articulating the ‘this ...’ part of the command, they will probably find it
easier just to continue speaking the whole command, rather than switch
54
modality. If the system also requires selection to occur within some temporal
window around a deictic utterance (Nigay & Coutaz, 1995), the user may not be
able to carry out the context switching and location of an appropriate value in
time. Mathematical techniques exist that would allow the explicit representation
and analysis of such constraints.
Let us summarise the process that lead to this result. Starting with a model of a
specific interface, and a general model of a cognitive architecture, we set out to
explore a conjecture that the concurrent use of multiple data streams required to
achieve deictic reference would place a strong requirement on the user.
Reasoning within the syndetic model, we concluded that deictic reference
required that the representations being processed on two data streams would
need to be coherent; interpreting this result in the context of the models lead us
to claim that users would find deictic reference difficult.
In reviewing this process it is important to note the role of the mathematics.
Describing MATIS and ICS as a set of axioms did not in itself lead to the
conjecture about usability, or the subsequent proof. Nor would one expect it to.
The use of mathematics here in HCI is no different from its use in any other
scientific discipline; it is a tool for representing models of the world and for
manipulating those models to test conjectures and carry out calculation.
However, this is not to say that the mathematics played no role in discovering
the result. The mathematical model makes the role of data streams explicit, and
by providing a concise vocabulary for describing the properties and behaviour
of these streams, there is a sense in which the formulae afford exploration of
properties related to stream-based processing. In this sense the mathematical
representation enables discovery of these processes in the same way that
55
powerful and expressive bodies of mathematical theory empower physicists to
calculate properties of electromagnetic fields or quantum states. Indeed, the
successful development of theoretical models to explain and predict the results
of experiments was in part due to the existence of mathematics, such as vector
spaces, operators, and differential equations in which the observations could be
expressed concisely and clearly. More recently, aspects of computing such as
concurrency theory have benefited from the existence of simple mathematical
theories such as CCS (Milner, 1989) that allowed the construction and
manipulation of models that captured what otherwise are apparently complex
interactions.
Identifying potential problems is only one aspect of design; the dual issue is
how to address a problem once identified. Syndetic models are important here,
because they make explicit both the chain of reasoning that leads to problem
identification, and the fundamental principles or assumptions on which this
chain is grounded. In contrast, purely empirical approached to evaluation can
identify that a problem exists, and may localise the context in which it occurs,
but without an explicit theory base they lack authority to state the cause of the
problem, and consequently do not in themselves provide help in identifying
solutions.
In the case of MATIS, the problem that we have identified is that if the user is to
employ deixis, both the articulated phrase and the target object for gesture must
have coherent propositional representations (see lines 14 and 15 of the
argument). Now, we know that people are able to employ deixis, indeed, it is
because it is such an intrinsic part of human-human communication that
interface designers are interested in recruiting it for human-computer interaction.
56
So what has gone wrong? The requirement on coherence of representation
follows from the rule that data streams contributing to a processing task be
coherent, and therefore stable (this was followed through in lines 7-12). So the
problem is that the interface is requiring the parallel use of two data streams in a
fundamentally incompatible way. We can trace this back to line 5, where it was
required that both speech and word-search are part of the configuration. This
requirement, and the associated conditions that ‘s’ and ‘d’ must be on particular
streams, leads to the coherence requirement. Could we find different streams
that would allow the user to perform effectively the same task, but which avoid
the need for coherence between ‘s’ and ‘d’? Within the approximate model of
ICS that we have available here, our principle freedom in this context is in
selecting flows; a more sophisticated model might encompass assumptions
about record content (e.g. user training) and properties of the dynamic
oscillation of control, i.e. the use of short term memory and sub-vocal rehearsal
in this case (Barnard & May, 1993). As the conflict involves an exchange
between the propositional and morphonolexical levels, it is useful to consider
whether the processes involved can be by-passed entirely. Now, in humanhuman communication, deixis typically involves pointing at objects; “... that
person, ...”, “... this button, ...”, etc. Within ICS, the recognition of (general)
visual objects, and the control of limbs necessary for gesturing at objects, is
devolved to processes within the VIS, OBJ, and LIM subsystems, and would
be a highly proceduralised skill. We could recruit this ability, if, rather than
working with purely lexical targets, the targets for deictic reference with the
mouse had some spatial or visual characteristic that allowed them to be detected
visually. How this might be realised in an interface is of course a matter for
design creativity, but one suggestion, given the domain of MATIS (air travel),
57
would be to utilise a map, showing the location of cities served by the system.
Assuming that the operator was familiar with the approximate location of cities,
pointing to and selecting a city on the map could bypass the need for
morphonolexical processing and thereby eliminate (or at least moderate) the
conflict identified in the analysis.
To conclude this case study, we note that the design issue considered here is
quite fundamental and extends beyond the specific example of MATIS: under
what conditions might deixis be used as a component of any interface? We have
demonstrated that resolution of this issue involves system considerations such
as how and when deictic reference is accepted and resolved, and user issues, for
example what mental resources are needed to interact with the system via
particular modalities, and how those mental resources are constrained.
7. A SYNDETIC MODEL OF GESTURE
This section demonstrates that the core component of syndetic modelling (the
relatively ‘fixed’ cognitive architecture) is generic, i.e. can be reused across
applications. We do so by developing a syndetic model of a second interaction
technique that is qualitatively different from the techniques that have been
addressed by user or system models in isolation. Gestural interaction involves
the use of series of hand positions or ‘poses’ to control actions within a system.
To illustrate some of the human issues related to this approach, we will use an
example based on the Dynamic Gesture Language (DGL) presented by
Bordegoni & Hemmje (1993). An initial syndetic model of this problem was
presented in (Duke, 1995b). Our analysis is not a critique; rather, we aim to
show how a model of human-system interaction could be used to inform the
58
development of gestural technology, identifying potential problems with
interaction techniques before they are embedded into large-scale applications.
In DGL, a gesture is defined as a sequence of static postures (poses)
characterised by the position and orientation of a user’s hand as measured
through a data glove. The completion of a gesture is recognised by factors such
as trajectory and posture. One application that could utilise this technology is an
editor for 3-dimensional scenes constructed from visual objects. Four gestures
that could be relevant in this context are given below:
FIGURE 6 ABOUT HERE
Visual feedback about gestures is provided to the user in the form of a ‘cursor’
that can either continuously model the pose of the hand or can be set to a
specific shape within the gesture that is appropriate to the task the user is trying
to perform. For example, a narrow pointer can be more useful than the ‘hand’
cursor for ‘picking’ objects as its rendering obscures less of the scene. Other
types of feedback, for example that some object has been selected, depend on
the surrounding application. The ‘system’ interactor models the cursor by a
value drawn from a given type ‘Image’; all we assume about this type is that it
includes a value ‘init’ representing some default cursor shape. In addition, the
interactor records the history of poses received by the system through the
attribute ‘history’, and provides a mapping that takes any sequence of poses to
the corresponding feedback.
59
interactor
Gesture-Engine
attributes
vis
cursor
: Image
history
: seq Pose
feedback
: (seq Pose) → Image
Two actions are provided. The first allows the user to form a pose, while the
second, ‘render’, represents the system updating the shape of the cursor to
reflect the current sequence of poses.
actions
lim
form
:
Pose
render
The axioms that govern behaviour of the gesture engine are stated below.
axioms
[] history = 〈 〉 ∧ cursor = init
(1)
Initially, the history is empty, and the cursor is set to the default
initial shape.
history = H⇒ [form(p)]history = H^〈p〉
(2)
If the history has some value ‘H’, the effect of forming a pose ‘p’
is to make the history equal to H extended by p.
[render]cursor = feedback(history)
The effect of the render action is to make the cursor equal to the
feedback associated with the current history of poses.
(3)
60
The syndetic model is created by introducing both the user and system models
into a new interactor and then defining the axioms that govern the conjoint
behaviour of the two agents. Three attributes, ‘glove’, ‘goals’ and ‘interp’, are
used to ‘contextualise’ the generic ICS model to the specific features of the
gestural interface. The first, ‘glove’, represents the posture that the user is
making with their hand at any point in time. A user’s intended behaviour is
represented by ‘goals’; informally, this is the sequence of poses that the user
wants to make, for example to effect a desired command. During interaction,
both proprioceptive and visual feedback of pose formation will be available to
the user. The latter takes the form of the cursor image; the attribute ‘interp’ is
introduced to map the image of the cursor on the display to a user’s
understanding of that shape as a hand pose. Here the mapping is expressed as a
total function, i.e. all possible hand shapes have a unique interpretation as a
pose. This is a simplifying assumption for the purposes of this paper, and could
be altered in a more extensive model to accommodate ambiguity or ill-defined
shapes.
interactor
Gesture-User
Gest-Engine - include the system model
ICS
- and the ICS framework
attributes
glove
: Pose
goals
: seq Pose
interp
: Hand → Pose
The ‘form’ action defined in the gesture engine interactor is driven by the user’s
limb subsystem. In order for the user to consciously form a pose, the
61
configuration must be set to transform a propositional representation into
musculature control using the following data flow:
PH
=ˆ 〈:prop-obj:, :obj-lim:, :lim-hand:〉
In order for the object to limb transformation to control movement of the hand,
it will also need access to visually derived data concerning the current cursor
(hand) orientation, and proprioceptive feedback from the body-state system
concerning the current posture of the user’s own hand. These data will be
carried on the following flows:
VO
=ˆ 〈∗vis-obj:, :obj-lim:〉
BH
=ˆ 〈∗bs-lim:, :lim-hand∗〉
Axiom 1 describes the constraint on pose formation. It requires that the data
flows given above exist in the system, and that a representation of the current
cursor shape is available on the display. Axiom 2 states that a user works
sequentially through the sequence of poses that make up their current goal. For
brevity, we write ‘hand’ for ‘interp(cursor)’.
Axioms
per(form(p)) ⇒ p on-flow
PH
∧ hand on-flow V O
(1)
∧ glove on-flow B H ∧ buffered = :prop-obj:
In order to form a pose ‘p’, the cognitive configuration must
include flows for processing gesture formation and posture
feedback (visual and proprioceptive). The user will also need to be
buffering (thinking about) the pose they are intending to construct.
goals = 〈p〉 ^ G ⇒ [form(p)]glove = p ∧ goals = G
(2)
62
If the user has a sequence of goals that begins with a pose ‘p’, then
the effect of forming the pose ‘p’ is that the user’s hand/dataglove
is in that pose, and the goals become the remaining poses.
As in the case of MATIS, a syndetic model can be used to test conjectures about
how users might interact with a system. In the gesture user interactor, pose
formation is permitted explicitly when a representation of some pose is available
on a visually-derived stream. However, the underlying cognitive architecture
imposes implicit constraints on the processing of information. In this particular
case, we wish to explore whether or not the representation shown on the display
(cursor) needs to be coherent with the pose that is to be formed.
per(form(p))
(1)
The conditions under which a pose can (normally) be formed are
set out in axiom 1 of the Gesture-User interactor, so the first step is
a simple replacement:
≡〉 p on-flow
PH
∧ hand on-flow V O
∧ glove on-flow
BH
∧ buffered = :prop-obj:
(2)
As we're concerned with the requirement for coherence between
the intended pose and the display, we focus in on the streams
involving these representations, and ignore the proprioceptive
stream.
≡〉 p on-flow
PH
∧ hand on-flow V O
We now expand definitions, here replacing ‘on-flow’ with its
defining text.
(3)
63
≡〉
PH
∈ flows ∧ ∀t1 : ran
PH
• p on t1
(4)
∧
VO
∈ flows ∧ ∀t2 : ran V O • hand on t2
The universal quantifiers (∀) can be eliminated by replacing the
bound variables (t1 and t2) with specific transformations from the
streams; specifically t1 is replaced by :prop-obj:, and t2 by
∗vis-obj:.
≡〉
PH
∈ flows ∧ p on :prop-obj:
(5)
∧
VO
∈ flows ∧ hand on ∗vis-obj:
Axiom 4 of ICS requires that any transformation that is in a flow
must also be part of the current configuration. We now instantiate
this axiom, with the transformation :obj-lim: replacing the variable
‘t’ in the axiom, and simplify the result by removing terms that will
play no direct role in the subsequent parts of the calculation.
≡〉 :obj-lim: ∈ config ∧ p on :prop-obj: ∧ hand on ∗vis-obj:
(6)
We apply axiom 3 of ICS to expand on the meaning of a
transformation being part of a configuration.
≡〉 :obj-lim: ∈ stable ∧ src(:obj-lim:) ∉ {art, lim}
⇒ ∃s : tr • :obj-lim: ∈ sources(s)
The focus of the proof now moves to the requirement that the data
stream from :obj-lim: be stable, so we eliminate the other parts of
the problem at this point.
≡〉 :obj-lim: ∈ stable
(8)
(7)
64
Necessary and sufficient conditions for stream stability are given in
axiom 2 of the ICS interactor, which is substituted in at this point,
with :obj-lim: replacing the variable ‘t’.
≡〉 ∀s1, s2 : sources(:obj-lim:) • coherent(s1, s2)
∧ (buffered = :obj-lim: ∨ sources(:obj-lim:) ⊆ stable)
(9)
The sources of :obj-lim: are defined by the architecture, so we
choose the two sources that are significant in this problem (from
the prop and vis subsystems) and use these to eliminate the
universal quantifier by substituting for the bound variables s1 and
s2
≡〉 coherent(:prop-obj:, ∗vis-obj:)
∧
(buffered = :obj-lim: ∨ sources(:obj-lim:) ⊆ stable)
(10)
We again narrow the focus to a specific part of the requirement
above, that the two streams (from prop and vis) to obj be coherent.
≡〉 coherent(:prop-obj:, ∗vis-obj:)
(11)
The definition for stream coherence is given by axiom 1 of ICS,
which we use to expand the previous line, substituting :prop-obj:
for t1 and ∗vis-obj: for t2.
≡〉 dest(:prop-obj:) = dest(∗vis-obj:)
∧
∀r, q : repr • r on :prop-obj: ∧ hand on ∗vis-obj: ⇒ r ≈ q
The requirement that the two streams have the same destination is
trivially satisfied and can be eliminated. Earlier (step 4) we
(12)
65
established that ‘p’ was on flow :prop-obj: and ‘hand’ was on the
flow ∗vis-obj:, so we can use these constants to eliminate the
remaining universal quantifier by substituting p for r and hand for
q.
≡〉 p on :prop-obj: ∧ hand on ∗vis-obj: ⇒ r ≈ q
(13)
Using the fact that ‘p’ is available on flow :prop-obj: and ‘hand’ is
on the flow ∗vis-obj: again, we can satisfy the antecedent, leaving
the conclusion:
≡〉 p ≈ hand (14)
Thus in order to form a pose, the information about hand position interpreted
from the display must be coherent with the pose that is to be formed. If the two
data flows become unsynchronised, the result is that the :obj-lim: process
cannot remain within the configuration while processing both visual and
propositional input. It will disengage from processing one or other of the
streams, leading to a breakdown of processing. This situation is similar to the
problems that arise when the sound track of a film becomes out of step with the
image, and can be highly disruptive. In terms of the gesture engine, it is
therefore important that (1) the rate of gesture formation is commensurate with
that of the rendering action that updates the display, and (2) that the user is able
to interpret the rendering of the hand as a sufficiently accurate model of their
actual hand position. The required bounds of accuracy are properly a subject for
experimental evaluation; the value of the syndetic model here is that it helps to
establish a set of requirements for designing a suitable experiment, and explains
why those requirements exist in the first place.
66
In the step from line 2 to 3 of the argument, we deliberately focused attention
away from proprioceptive feedback via the stream
BH .
It is straightforward to
see that a similar argument holds if we consider the streams
PH
and
BH ;
both of
these streams involve the process :lim-hand∗, and we would find, following the
argument, that there is a requirement for coherence between the ‘glove’ state
perceived by the user, and the pose which they are intending to form. Limb
control is a highly proceduralised activity, which usually involves little if any
conscious control (i.e. focal awareness), and so one would normally expect
little difficulty with this coherence requirement. However, for the first time user
of a data glove, the physical sensation of gesturing while wearing a
(comparatively) bulky unit may affect the quality of the proprioceptive stream
originating in the body-state subsystem. If the quality of this stream degrades,
i.e. it becomes unstable, then from axiom 7 of ICS we know that the
transformation process (∗bs-lim:) will either try to find a stable source of data,
decouple from the unstable source, or enter buffered mode. In the context of
this model, only the latter option is available, and it will attempt to operate on an
extended representation of posture by accessing its image record, i.e. by
buffering. Dually, from ICS axiom 2, we also know that a process that needs to
appear in a configuration must either operate on stable input streams, or itself
operate in buffered mode. Consequently, :lim-hand∗, which needs the data
stream from ∗bs-lim:, will also attempt to enter buffered mode. The architecture
only allows a single process to be buffered within a configuration, and the
consequence is that utilisation of the buffer will probably oscillate between these
processes. This is unfortunate for the would-be user, as axiom 1 of the GestureUser model states that the :prop-obj: transformation will need to be buffered, as
users will need to focus on the poses that they are forming. Consequently, for
67
novice users of the interface, the need to concentrate on forming and controlling
poses at the motor level will affect their ability to think and reason about the
tasks that they are performing within the interface. However, over time,
experience with the input device will lead to proceduralisation of the limb
control, and hence the difficulty will be alleviated (see Barnard & May, 1993,
for an account of how cognitive activity can be scoped by experience).
This analysis raises two general questions about the use of syndetic models.
First, the analysis carried out here bears striking similarities with that done for
MATIS, although the two technologies are different. Both involved the potential
for breakdown of processing caused by incompatible representations being
carried on data streams that are necessary to a particular cognitive task. One
might ask whether this reasoning (which is, after all, comparatively simple in
hindsight) might not be more easily captured in a simple model of data streams?
The answer of course is yes, given the architecture, one could derive a set of
general principles regarding the specific issue of data stream utilisation. Indeed,
such an approach might be one part of a useful “discount” method based on the
approach. But it is important to understand that being able to abstract such a
theory is a consequence of having a general purpose model in the first place.
To give an example from mathematics, one can give a very simple rule for
computing the definite integral of a polynomial function, once one has
understood what the concept of an integral is in the first place; the value of
theories such as the Reimann or Lebesque integral is that they also allow you to
apply the process to problems that are not just simple polynomials (Voxman &
Goetschel, 1981). To return to the context of HCI, while we could write a set of
rules or principles for dealing with data streams in ICS, in practice the rules that
68
govern data stream operation are also bound into broader processing concerns
such as record contents, timing, etc. The aim of syndetic modelling and the
formal description of ICS is to provide a framework in which the principles and
theory underlying these phenomena can be described, examined, and then —
and only then — lifted into a set of discount methods and approximations for
general use.
The second question relates to how much of the work of the analysis was done
by the model, and how much was driven by general knowledge of ICS, and
insight into human information processing. Obviously, any model will only
provide insight up to the level of detail that it addresses. In this paper we have
given an approximate model of ICS, focusing on aspects such as data stream
stability and coherence, and ignoring other issues such as record contents and
the structure of representations. Even given this restriction, we were still able to
use the model to demonstrate requirements on coherence in two quite different
examples. Once that requirement was identified from the model, it was then a
question of reflecting on the meaning of this requirement in terms of the
problem domain and our broader knowledge about ICS that had not been
encoded in the model. This in no way diminishes the value of the model; after
all, exactly the same process occurs in any science when one builds models —
the role of the model is to answer specific questions, or to explore particular
aspects of a problem. Once that has been done, progress in understanding how
to deal with an issue, or how to enrich a theory, takes place outside of the
model. For example, an electronics engineer who uses a circuit model to explore
a particular design does not expect the circuit model to tell her directly how to
improve that design; hopefully however it will help her to identify the location
69
(and cause) of any design problems — how those are then addressed depends
on her background knowledge of theory, and if the result is a significant change
in the knowledge that lead to the model, it may well be incorporated as part of
the modelling technique itself.
8. PROSPECTS AND CONCLUSIONS
The previous section showed that detailed and sometimes unexpected
constraints on user performance can be deduced from this approach. In contrast
system models can only describe what the system should do. Any claims or
assumptions made about user performance must be validated separately, either
through appeal to cognitive theory or directly, through prototyping and
experimental evaluation. User models likewise require assumptions about
system behaviour. This is a key limitation for both. By expressing user and
system constraints in equal terms, syndetic models allow direct (and formal)
comparison between the capabilities and limitations of the two parties in an
interactive system. As the underlying cognitive and system theories are built into
the model, the reason why some problem exists, such as difficulty in expressing
deictic queries, can be found. Alternative design solutions can then be driven by
theoretical insight, rather than through a potentially expensive ‘generate and test’
cycle of ad-hoc changes. In the case of a system that builds on the technology of
MATIS, location of cities might be supported better through a graphical display,
for example a map in which spatial location may reduce the need to invoke
:prop-mpl: transformations.
One of the key advantages of syndetic modelling is that it operates with abstract
axiomatic specifications of user and system. It therefore avoids a number of
70
difficulties associated with models that require detailed specifications of user
and system. Such detail was required with the model of Kieras and Polson
(Kieras & Polson, 1985) discussed earlier. Over time, this kind of framework
has undergone significant development, culminating recently in the development
of the EPIC architecture (Kieras & Meyer, 1998; Meyer & Kieras, 1997). Like
the approach adopted here, this architecture also has data flow properties which
encompass multiple sensory and effector processors, and with a single cognitive
processor. Its extended capability now supports the simulation of a far more
extensive range of behaviours than its precursors, including multiple task
performance and working memory phenomena (Kieras, et al., 1998). This and
other more recent approaches to integrating user and system representations also
retain a commitment to detailed modelling. So for example, the work of Moher
and Dirda (Moher et al., 1996) uses coloured Petri nets as a unifying
representational formalism for device models, and users' mental models and
task plans. The resulting models of this latter type are again at a lower-level of
detail than ours, and seem to derive more from the tradition of programmable
user models (see for example Blandford & Young, 1995) than from a specific
cognitive theory. By avoiding detail, syndetic modelling should support design
reasoning in advance of highly specific interface commitments.
In the absence of a comprehensive model that accounts for the behaviour of both
user and device components of a system, questions such as those given above
can only be addressed by obtaining a separate user analysis of the problem
domain. The problem then is to connect this insight into the overall system
perspective within some integrational framework. For example, design issues
and options might be captured within a notation for design rationale, for
71
example QOC (MacLean et al., 1991). User and system assessments of these
options can then be expressed as criteria for evaluating options (Bellotti, 1993).
However, this approach is critically dependent upon the success of the
intermediate representation. Unless this ‘mediating’ expression is at least as rich
as the modelling formalism, the process of translation both from the system
modellers to the user modellers, and back again, will lose information. Our
experience with representing user and system analyses within QOC, as part of
the experiment described in (Bellotti et al., 1995), was that the organisation and
approach imposed by a mediating representation like QOC was difficult to
reconcile with the detailed constraints originating from modelling analyses. We
argue that overcoming this problem requires an approach that brings user and
system representations into direct contact. Syndetic modelling effects this
contact by expressing the resources and constraints that describe cognitive
behaviour in the same form as we express the observables that characterise or
prescribe system behaviour. From an observational viewpoint, both the user
and the design artefact are just systems of observables whose behaviour is
amenable to description within a suitable mathematical framework.
Syndesis begins to address two other significant problems in multi-disciplinary
HCI. First, if user and system models are developed separately and only come
into contact through their results, it is not possible to see directly whether the
assumptions of each model concur, or whether changing aspects of one model
would have consequences for the other. This means that potentially expensive
effort may be spent in reasoning about (say) a user’s cognitive demands when
in fact system constraints may support claims that these demands will not be
imposed. Second, the separation of modelling theory from the modelling
72
analysis actually recorded in a mediating representation makes it difficult to
locate or investigate the theoretical reason or context behind a specific problem
or recommendation. This is particularly acute if the insight derives from the
interplay between user and system constraints. This means that re-design cannot
readily draw on the modelling theory, but instead relies principally on the skills
of the designer. While the element of craft skill is an important part of
innovation, the direct expression of user and system requirements in a syndetic
model means that application of that skill can be directed towards the specific
problem that underlies the design issue.
Like syndetic modelling, Interaction Framework (Blandford, Harrison &
Barnard, 1995) also works at a high level of abstraction. Both approaches can
capitalise upon the re-use of specific interactional principles. However, whereas
Interaction Framework operates with its own event-based representation, the
syndetic approach involves commitment to specific cognitive and system
models. This provides another key benefit of syndetic modelling: it can
cumulatively capitalise upon prior modelling work by making direct use of the
apparatus of existing cognitive and system theories. The results of analysis are
then expressed in terms that can be used to guide behavioural evaluation or
system implementation. Of course, syndetic analyses will also be subject to the
limitations of the cognitive and system models upon which they depend.
Development of a formal account of ICS has reached the point where we can
begin to discuss the details of data transformation, blending, and the ability of
processes to extend their repertoire based on experience.
Our apparently disparate scenarios have shown some fundamentally similar
characteristics. By using appropriate abstractions we are able to draw out the
73
fundamental properties of each system, discarding superficial differences arising
from the technologies involved. The structures and properties that we were able
to model were clearly dependent on the notation used to express the models.
Modal action logic was chosen for this paper because it is a simple, concise
means of describing states and actions on states. However, syndetic modelling
is not dependent on the use of modal action logic, or on expressing ICS in terms
of states and actions. We have begun to investigate how techniques for
describing real-time systems, for example the duration calculus (Chaochen,
1993) can be used to model and reason about temporal properties of ICS and
interactive systems. We are also beginning to explore how stochastic modelling
techniques might allow the development and exploration of significantly broader
conjectures concerning cognitive performance within interaction.
In summary, syndetic modelling expresses models of systems, users, and their
interaction, within a common mathematical framework. Principles are expressed
in an abstract form and, by avoiding detail, can be used directly to support
design reasoning without requiring simulation of cognitive mechanisms or a
fully elaborated system specification. The value of abstraction has been
illustrated here by demonstrating how general principles, developed to deal with
one design context can find re-use to support reasoning about another design
with markedly different surface and domain features. In the longer term, we
expect to investigate how reasoning about syndetic models can best be
supported through software tools for developing, maintaining, exercising and
reasoning about a mathematical text, and how other forms of presentation such
as graphics or animation can be used to capture and communicate the insight
obtained from these models.
74
NOTES
Acknowledgements. We thank A.E. Blandford, T. Green and M.D.
Harrison for their helpful comments. Particular thanks are due to the ‘owners’
of the systems used as the source of scenarios in this paper, both for making
their systems available to the Amodeus project, and for their time in providing
feedback on modelling work that formed the starting point for the material
reported here. We would also like to thank the anonymous referees, whose
detailed comments on the first draft were of significant help in improving the
focus and clarity of the paper.
Support. This work was carried out as part of the Amodeus-2 project,
ESPRIT Basic Research Action 7040 funded by the Commission of the
European Communities. Technical reports from the Amodeus project are
available
via
the
World
Wide
http://www.mrc-cbu.cam.ac.uk/amodeus/
Web
at
URL
75
Authors’ Addresses.
David Duke: Dept of Computer Science, University of York, Heslington,
York, YO1 5DD, U.K.
Email: duke@cs.york.ac.uk
Philip Barnard: MRC Cognition and Brain Sciences Unit, 15 Chaucer Road,
Cambridge, CB2 2EF, U.K.
Email: philb@mrc-cbu.cam.ac.uk
David Duce: Rutherford Appleton Laboratory, Chilton, Didcot, OX11 0QX,
U.K.
Email: d.a.duce@rl.ac.uk
Jon May: Dept of Psychology, University of Sheffield, Western Bank,
Sheffield,S10 2TP, U.K.
Email: jon.may@sheffield.ac.uk
76
REFERENCES
Barnard, P. and May, J. (1993). Cognitive modelling for user requirements. In
P.F. Byerley, P.J. Barnard & J. May (Eds.), Computers,
Communication and Usability: Design Issues, Research and Methods
for Integrated Services (pp. 101-145). Amsterdam: Elsevier.
Barnard, P. & May, J. (1995). Interactions with advanced graphical interfaces
and the deployment of latent human knowledge. In F. Paternó (Ed.),
Eurographics Workshop on Design, Specification and Verification of
Interactive Systems (pp. 15-49). Berlin: Springer-Verlag.
Barnard, P. and May, J. (in press). Representing cognitive activity in complex
tasks. Human Computer Interaction.
Bellotti, V. (1993). Integrating theoreticians' and practitioners' perspectives
with design rationale. Proceedings of the INTERCHI'93 Conference
on Human Factors in Computing Systems, 101-106. AddisonWesley.
Bellotti, V., Blandford, A., Duke, D., Maclean, A., May, J. & Nigay, L.
(1996). Interpersonal access control in computer-mediated
communications: A systematic analysis of the design space. Human
Computer Interaction, 6, 357-432.
Bellotti, V., Buckingham Shum, S., MacLean, A. & Hammond, N. (1995).
Multidisciplinary modelling in HCI design ... in theory and in
77
practice. Proceedings of the CHI '95 Conference on Human Factors
in Computer Systems, 146-153. New York: ACM.
Blandford, A. & Duke, D. (1997). Integrating user and computer system
concerns in the design of interactive systems. International Journal of
Human-Computer Studies, 46, 653-679.
Blandford, A., Harrison, M. & Barnard, P. (1995). Using interaction
framework to guide the design of interactive systems. International
Journal of Human-Computer Studies, 43, 101-130.
Blandford, A. & Young, R. (1995). Separating user and device descriptions for
modelling interactive problem solving. In K. Nordby, P. Helmersen,
D.J. Gilmore & S. Arnsen (Eds.), Human-Computer Interaction:
INTERACT'95 (pp. 91-96). London: Chapman and Hall.
Bordegoni, M. & Hemmje, M. (1993). A dynamic gesture language and
graphical feedback for interaction in a 3d user interface. Computer
Graphics Forum, 12(3), 1-11.
Buckingham Shum, S., Blandford, A., Duke, D., Good, J., May, J., Paternó,
F. & Young, R. (1996). Multidisciplinary modelling for user-centred
system design: An air-traffic control case study. Proceedings of
HCI’96: 11th British Computer Society Conference on HumanComputer Interaction, 201-219. London: Springer-Verlag.
Chaochen, Z. (1993). Duration calculi: An overview. In D. Bjørner, M. Broy
& I. Pottosin (Eds.), Formal Techniques in Programming and Their
78
Applications, volume 735 of Lecture Notes in Computer Science (pp.
256-266). Springer-Verlag.
Dix, A. (1991). Formal Methods for Interactive Systems. Academic Press.
Duke, D. (1995a). Reasoning about gestural interaction. Computer Graphics
Forum, 14(3), 55-66.
Duke, D. (1995b). Time and synchronisation in PREMO: A formal
specification of the NNI proposal. Technical Report OME-116,
ISO/IEC JTC1 SC24/WG6.
ftp://ftp.cwi.nl/premo/RapporteurGroup/Miscellaneous/
OME-116.ps.gz
Duke, D. & Harrison, M. (1993). Abstract Interaction Objects. Computer
Graphics Forum, 12(3), C-25 - C-36.
Duke, D. & Harrison, M. (1994a). Matis: A case study in formal specification.
Technical Report SM/WP17, ESPRIT BRA 7040 Amodeus-2.
Available via http://www.mrc-cbu.cam.ac.uk/amodeus/.
Duke, D. & Harrison, M. (1994b). A theory of presentations. In M. Naftalin,
T. Denvir & M. Bertran (Eds.), FME'94: Industrial Benefit of
Formal Methods, volume 873 of Lecture Notes in Computer Science
(pp. 271-290). Berlin: Springer-Verlag.
Duke, D. & Harrison, M. (1995a). From formal models to formal methods. In
R.N. Taylor & J. Coutaz (Eds.), Software Engineering and HumanComputer Interaction: ICSE’94 Workshop on SE-HCI: Joint
79
Research Issues, volume 896 of Lecture Notes in Computer Science
(pp. 159-173). Springer-Verlag.
Duke, D. & Harrison, M. (1995b). Interaction and task requirements. In P.
Palanque & R. Bastide (Eds.), DSV-IS'95: Eurographics Workshop
on Design, Specification and Verification of Interactive Systems (pp.
54-75). Wien: Springer-Verlag.
Goldsack, S. (1988). Specification of an operating system kernel: FOREST and
VDM compared. In R. Bloomfield, L. Marshall & R. Jones (Eds.),
VDM'88: VDM - The Way Ahead, volume 328 of Lecture Notes in
Computer Science (pp. 88-100). Springer-Verlag.
Harrison, M. & Torres, J. (Eds.). (1997). Design, Specification and
Verification of Interactive Systems'97. Wien: Springer-Verlag.
Hoare, C. (1996). How did software get so reliable without proof? In M.-C.
Gaudel & J. Woodcock (Eds.), FME'96: Industrial Benefit and
Advances in Formal Methods, volume 1051 of Lecture Notes in
Computer Science (pp. 1-17). Springer-Verlag.
Jones, C., Jones, K., Lindsay, P. & Moore, R. (1991). MURAL: A Formal
Development Support System. Springer-Verlag.
Kent, S., Maibaum, T. & Quirk, W. (1993). Formally specifying temporal
constraints and error recovery. Proceedings of the IEEE International
Workshop on Requirements Engineering, (pp. 208-215). IEEE
Press.
80
Kieras, D. & Meyer, D. (1998). An overview of the epic architecture for
cognition and performance with application to human-computer
interaction. Human Computer Interaction. In press.
Kieras, D., Meyer, D., Mueller, S. & Seymour, T. (1998). Insights into
working memory from the perspective of the EPIC architecture for
modelling skilled perceptuo-motor and cognitive human performance.
In A. Miyake & P. Shah (Eds.), Models of Working Memory:
Mechanisms of Active Maintenance and Executive Control.
Cambridge University Press. In press.
Kieras, D. and Polson, P. (1985). An approach to the formal analysis of user
complexity. International Journal of Man-Machine Studies, 22, 365394.
Lemmon, E. (1993). Beginning Logic (3rd ed.). Chapman and Hall.
MacLean, A., Young, R., Bellotti, V. & Moran, T. (1991). Questions,
options, and criteria: Elements of design space analysis. HumanComputer Interaction, 6(3&4), 201-250.
May, J. & Barnard, P. (1995). The case for supportive evaluation during
design. Interacting with Computers, 7, 115-144.
May, J., Barnard, P. & Blandford, A. (1993). Using structural descriptions of
interfaces to automate the modelling of user cognition. User
Modelling and User Adaptive Interfaces, 3, 27-64.
81
May, J., Scott, S. & Barnard, P. (1995). Structuring Displays: A
Psychological Guide. Eurographics Tutorial Notes PS95 TN4, ISSN
1017-4656. Geneva: European Association for Computer Graphics.
Meyer, D. & Kieras, D. (1997). A computational theory of executive cognitive
processes and multiple-task performance: Part 1. Psychological
Review, 104, 3-65.
Milner, R. (1989). Communication and Concurrency. Series in Computer
Science. London: Prentice Hall International.
Milner, R. (1993). Elements of interaction - 1993 Turing Award Lecture.
Communications of the ACM, 36(1), 78-89.
Moher, T., Dirda, V., Bastide, R. & Palanque, P. (1996). Monolingual,
articulated modeling of users, devices and interfaces. DSV-IS'96:
Eurographics Workshop on Design, Specification and Verification of
Interactive Systems (pp. 312-329). Wien: Springer-Verlag.
Nigay, L. (1994). Conception et modélisation logicielles des systèmes
interactifs. Ph.D. Thèse de l'Université Joseph Fourier, Grenoble.
Nigay, L. & Coutaz, J. (1995). A generic platform for addressing the
multimodal challenge. Proceedings of the CHI '95 Conference on
Human Factors in Computing Systems, 98-105. Addison-Wesley.
Owre, S., Rushby, J., Shankar, N. & von Henke, F. (1995). Formal
verification for fault-tolerant architectures: Prolegomena to the design
of PVS. IEEE Transactions on Software Engineering, 21(2), 107125.
82
Paternó, F. & Palanque, P. (Eds.). (1997). Formal Methods in Human
Computer Interaction. Springer-Verlag.
Ryan, M., Fiadeiro, J. & Maibaum, T. (1991). Sharing actions and attributes in
modal action logic. In T. Ito, T. & A. Meyer (Eds.), Theoretical
Aspects of Computer Software, volume 526 of Lecture Notes in
Computer Science (pp. 569-593). Springer-Verlag.
Saiedian, H. (1996). An Invitation to Formal Methods. IEEE Computer, 29(4),
16 - 17.
Spivey, J. (1992). The Z Notation: A Reference Manual (2nd ed.). London:
Prentice Hall International.
Teasdale, J.D. & Barnard, P. (1993). Affect, Cognition and Change: Remodelling Depressive Thought. Lawrence Erlbaum Associates.
Voxman, W. & Goetschel, Jr., R. (1981). Advanced Calculus: An Introduction
to Modern Analysis. Marcel Dekker Inc.
Wharton, C., Rieman, J., Lewis, C. & Polson, P. (1994). The Cognitive
Walkthrough method: A practitioner’s guide. In J. Nielson & R.
Mack (Eds.), Usability Inspection Methods (pp. 105-140). Wiley.
Young, R. & Abowd, G. (1994). Multi-perspective modelling of interface
design issues: Undo in a collaborative editor. In G. Cockton, S.W.
Draper & G.R.S. Weir (Eds.), People and Computers IX:
Proceedings of HCI'94 (pp. 249-260). Cambridge, England:
Cambridge University Press.
83
APPENDIX GLOSSARY OF NOTATION
The data types and notation used in this paper are based on the mathematical
notation of Z (Spivey, 1992) embedded within the structured ‘theory’
presentation of modal action logic (Goldsack, 1988; Ryan, Fiadeiro &
Maibaum, 1991). We assume the existence of the following basic data types:
N
{Natural numbers: {0, 1, 2, ...}
Z
{Integers: {..., -1, 0, 1, ...}
B
{Boolean values: {true, false}
Logic
Let P and Q be predicates, and x a variable.
P∧Q
Both P and Q hold
P∨Q
Either P or Q (or both) hold
P⇒Q
P implies Q: If P holds, so must Q
P ⇔ Q P if and only if Q
x : S • P For all values of x in S, P holds
x : S • P There exists a value of x in S, for which P holds
Modal Action Logic
MAL extends classical first order logic with action expressions and a modal
operator. Let P be a predicate, and let A be an action:
84
[A] P P must hold after performance of A
per(A) Permission: the action may occur
obl(A) Obligation: the action must occur
Sets
Let S and T be sets, P a predicate, E an expression, and ti terms. Let xi be
variables, and let D be a declaration, e.g. x1:S, x2:T.
∅
Empty set
{t 1 ,...,t n } Set enumeration: the set of t1 through to tn
PS
Power set: the set of all subsets of S
E∈S
Membership: the value E is a member of the set S
{D | P • E} Comprehension: the set of all values of E, such that P holds given D
{D | P}
The set of values for D such that P holds
S∩T
Set intersection: the set of values in both S and T
S∪T
Set union: the set of values in either S or T
S⊆T
Containment: S is a subset of T
SxT
Cartesian product: the set of pairs (x,y) s.t. x ∈ S and y ∈ T
Functions and Relations
Functions and relations are viewed as sets of pairs; a function is then a relation
where every element in the domain of the relation is paired with exactly one
85
element in the range of the relation. Let S and T be sets, and F and G be
relations or functions:
{}
Empty function
S→T
The set of total functions from S to T; for f ∈ S→ T, dom f = S
S→
+ T
The set of partial functions from S to T; for f ∈ S→
+ T, dom f ⊆ S
S↔T
The set of relations between S and T
{x a y} The function that maps x to y
dom F
Domain: the set {x | ∃y • (x,y) ∈ F}
ran F
Range: the set {y | ∃x • (x,y) ∈ F}
Sequences
A sequence is a function whose domain is either empty (the null sequence) or is
a set of contiguous natural numbers, e.g. {1,2,...,n} for some n ∈ N . This
means that operators like ‘ran’ can also be applied to sequences. Let X be a set,
and S and T sequences:
〈〉
Empty (null) sequence
seqX
The set of sequences whose range is a subset of X
〈x, y, ..., z〉 Sequence enumeration: the sequence containing x, y, ..., z in that
order
s^t
Concatenation : 〈s1, .., sn〉^〈t1 ,.., tm〉 = 〈s1, .., sn , t1 ,.., tm〉
86
Miscellaneous
P ≡〉 Q
From P it is possible to prove/derive Q
x =ˆ E
The name x is defined to be the expression E
p
The perceivable component of attribute p
p in q
The presentation of p is within that of q
87
FIGURE CAPTIONS
FIGURE 1
Three Levels of Description -
FIGURE 2
Deictic blending of speech and gesture in MATIS
FIGURE 3
Generic structure of an ICS sub-system
FIGURE 4
ICS configured for locating visual object
FIGURE 5
Brief descriptions of representations processed by ICS
Subsystems
FIGURE 6
The Dynamic Gesture Language (Bordegoni and Hemmje,
1993)
Fig 1
88
89
Fig 2
90
Fig 3
91
Fig 4
92
Sensory subsystems
Meaning subsystems
VIS
PROP
visual: hue, contour etc.
from the eyes
AC
acoustic: pitch, rhythm
relationships
IMPLIC
etc. from the ears
BS
propositional: semantic
implicational: holistic
meaning
body-state: proprioceptive
feedback
Structural subsystems
Effector subsystems
OBJ
ART
object: mental imagery,
shapes, etc.
MPL
morphonolexical: words,
lexical forms
Fig 5
articulatory: subvocal
rehearsal and speech
LIM
limb: motion of limbs,
eyes, etc
93
Navigation
This allows the user to change their position
within the scene
Picking
The 'pistol-like' pose is used to select some
object
Gripping
A group of objects can be grabbed and
moved with the 'fist' pose
Exit
Finish a session with the editor
Fig 6