1 TITLE and RUNNING HEAD: Syndetic Modelling David Duke
Transcription
1 TITLE and RUNNING HEAD: Syndetic Modelling David Duke
1 TITLE and RUNNING HEAD: Syndetic Modelling David Duke, University of York Philip Barnard, MRC Cognition and Brain Sciences Unit David Duce, Rutherford Appleton Laboratory Jon May, University of Sheffield David Duke is a computer scientist with an interest in formal methods, interactive systems, and computer graphics; he is a Lecturer in the Department of Computer Science, University of York, United Kingdom. Philip Barnard is a psychologist with an interest in theories of mental architecture and their application to complex tasks, emotion and a range of psychopathologies; he is on the scientific staff of the Medical Research Council’s Cognition and Brain Sciences Unit. David Duce is a computer scientist with interests in computer graphics, human-computer interaction and formal techniques; he is a senior staff researcher in the Department for Computation and Information, Rutherford Appleton Laboratory, United Kingdom. Jon May is a psychologist with an interest in the application of unified models of cognition to perception, particularly with regard to the effects of task and context; he is a Lecturer in the Department of Psychology at the University of Sheffield. 2 ABSTRACT Syndesis n. (pl. ~ es). [mod, L, f. Gk SYNdesis} binding together (sundeo bind together)]. —The Concise Oxford Dictionary, Seventh Edition, 1986. User and system models are typically viewed as independent representations that provide complementary insights into aspects of human-computer interaction. Within system development it is usual to see the two activities as separate, or at best loosely coupled, with either the design artefact or some third ‘mediating’ expression providing the context in which the results of modelling can be related. This paper proposes that formal system models can be combined directly with a representation of human cognition to yield an integrated view of human-system interaction: a syndetic model. Aspects of systems that affect usability can then be described and understood in terms of the conjoint behaviour of user and computer. This paper introduces and discusses, in syndetic terms, two scenarios with markedly different properties. We show how syndesis can provide a formal foundation for reasoning about interaction. 3 CONTENTS 1. INTRODUCTION 2. THE NEED FOR MULTI-DISCIPLINARY INSIGHT 2.1. Why Theoretical Integration Matters 2.2. Why Mathematics Matters 2.3. Recent Work 3. A MATHEMATICAL MODEL OF A MULTIMODAL USER INTERFACE 4. INTERACTING COGNITIVE SUBSYSTEMS (ICS) 4.1. Configurations 4.2. Blending of Data Streams 4.3. Cyclical Configurations 4.4. Buffering of Transformations 4.5. Interleaving and Oscillation 5. A MATHEMATICAL MODEL OF ICS 5.1. Basic Definitions 5.2. The Architecture 5.3. Information Processing in ICS 5.4. Key Points 6. A SYNDETIC MODEL OF MATIS 7. A SYNDETIC MODEL OF GESTURE 8. PROSPECTS AND CONCLUSIONS APPENDIX: Glossary of Notation 4 1. INTRODUCTION Most disciplines routinely use various kinds of model to represent and reason about the behaviour of particular phenomena. In science, the model usually relates or characterises observations that can be made of an extant system. For engineering, or system design in general, the model often represents the observations of a system that a designer will be asked to realise. In the process of mapping a ‘requirements’ model into a delivered artefact the designer may generate additional models that focus on problematic aspects of the desired system, in order for example to resolve a choice between design options. In this case the model allows the designer to bring some external body of theory to bear on the problem, and clearly the value of carrying out the modelling will depend on the degree of insight that the theory can provide. In the context of human-computer interaction, two quite separate classes of model have emerged. One is the concept of a system model, based on formal or operational models of software systems and components, that can be used to prototype or analyse the function and appearance of system components. Prototype models are poor on theoretical insight, but can be used to assess performance over specific scenarios. Other, more abstract, models allow some degree of description and reasoning about what a system might or should do in particular circumstances. The second class of models are those that address the cognitive factors within interaction. While these may be grounded in cognitive psychology, they are often either highly operational, for example GOMS (Kieras & Polson, 1985), or are expressed in a way that hides the underlying theory beneath rules or guidelines, for example Cognitive Walkthroughs (Wharton et al, 1994). In either case there is a common problem: human 5 computer interaction is a phenomena involving (at least) two agents - the ‘computer’ system and its human user, and therefore any theoretically-based model of interaction must draw on insight from both the cognitive and system perspectives. In practice designers cannot rely on a single model, but must draw on a variety of perspectives, particularly in systems that recruit users' cognitive abilities to understand and interact with software artefacts using novel metaphors, or in contexts where security or human safety is at risk. The use of disparate techniques does give extra leverage on design problems, but at the cost of introducing a new problem: that of integrating the different modelling products and viewpoints. One reaction to this problem has been the development of Design Rationale (DR) frameworks, see for example (MacLean et al., 1991). While DR provides a flexible means of relating modelling output in the context of design, it does so by separating recommendations from their theoretical underpinning. Thus it can be difficult to understand why a particular problem exists, or what changes in either system or user performance might overcome a design problem or improve usability. This paper introduces an approach to modelling interaction that integrates the contrasting representations of user and system theories. By using a common language to represent the user and system, the approach allows properties of interaction to be described and understood in terms of the conjoint behaviour of both agents. We use the term syndetic model to describe this new approach, to emphasise its bringing together of previously disparate methodologies. Section 2 sets out, in more detail, the case for such an integrated approach to humancomputer interaction, and outlines existing approaches to this problem. It also 6 outlines the motivation for using mathematical structures as the basis for the common representation. Theoretical developments in HCI have tended to lag behind technological innovation. We believe that the framework presented here is important in part because it can address issues raised by the use of novel interface technologies. In support of this claim we have chosen two such areas, multimodal input and gestural interaction, to illustrate syndetic modelling. We begin however by describing the approaches in isolation. Section 3 shows how mathematical structures and logic can be used to represent the behaviour of an interface. As an illustration, we model an experimental interface for multimodal input with deictic reference (Nigay, 1994; Nigay & Coutaz, 1995). However, although the expressiveness of the approach makes it possible to represent rich and potentially complex interface behaviour like deixis, the model itself has no foundation for supporting claims related to usability. One framework that has been used to investigate usability from a cognitive perspective is Interacting Cognitive Subsystems (ICS), which is introduced in Section 4. ICS (Barnard & May, 1993; Barnard & May, 1995) is a broad framework for understanding human information processing in terms of the mental representations and resources that a person needs to deploy in performing particular tasks. However, in common with other cognitive modelling approaches, ICS does not itself provide representations for the environment in which cognitive activity is located. The unifying step, taken in Section 5, is to show that the structure and principles of the ICS system can be expressed in the mathematical framework used to model the interface. It is this expression of both user and system 7 requirements within a common mathematical framework that we call syndetic modelling. Armed with this new framework, in Section 6 we return to the example of multimodal input and deixis. We identify a potential difficulty in employing deictic reference within the system, and by locating the source of the problem within the model, suggest a modification to the design that could alleviate the problem. Multimodal interaction is an important example because it is one of a growing number of techniques that does not fit well into ‘stimulus-response’ models of interaction that have served well for other technologies. Syndetic modelling itself makes no commitment to any particular model of interaction, and can therefore accommodate diverse technologies. The expressive power of mathematical modelling means that a range of abstractions over device behaviour can be constructed and situated within the framework. To illustrate this, we show in Section 7 how the cognitive part of the model can be reused in the context of a different technology, gestural interaction. Multimodal and gestural interfaces have different characteristics in terms of the resources they demand of the user, and the analyses show why user and system models are insufficient in themselves to resolve the design issues contained within each problem. The paper concludes with a description of ongoing research which aims at extending the theory into a general model of information processing within cognitive and computational systems. 2. THE NEED FOR MULTI-DISCIPLINARY INSIGHT Human-computer interaction has developed from considering the layout and operation of simple graphical or textual interfaces to examine technologies such 8 as media-spaces, gesture, and multi-modal interfaces. These touch on a broad range of issues ranging from ‘hard’ system constraints, through human perception and cognition, to broad social issues concerning the use of appropriate systems to support workers' needs. Insight into the design of interactive systems needs to draw, at times, on any or all of these disciplines. Forging a coherent explanation of a design problem from these disparate insights has only recently been addressed by, for example work on design space representations (Bellotti, 1993) and collaborative modelling (Young & Abowd, 1994). Building a coherent theory that accounts for a range of design problems represents a further step. The need to take this step, and its overt structure, are the topic of this section. 2.1. Why Theoretical Integration Matters In the introduction we suggested that Design Rationale approaches address the integration problem, but at the cost of separating modelling results from modelling theory. If theoretically-based modelling is to provide designers with insight into how to improve an artefact, then the theory underlying the analysis must be brought forward so that it can be inspected to reveal why a design issue is problematic and what modifications of the design would address the issue. May & Barnard (1995) argued that the trend towards HCI design methods that dissociate evaluation techniques from their theoretical foundations was problematic, in that they undermined the supportive role that evaluation should play in redesign, and disguised the contribution that theoretical work in HCI can make to practice. More fundamentally, we suggest that some form of integration of user and system representations is necessary even to describe (let alone 9 resolve) certain design issues. An illustration of this problem can be found in the design of systems that employ 3D-graphics, for example (Duke, 1995a). One task that such a system needs to support is the presentation of a 3-dimensional scene (part of the application state) as a 2-dimensional image presented on some output device. For our purposes, the scene can be considered as a collection of geometrical solids. An abstract view of the information contained and presented by the system is shown in Figure 1. FIGURE 1 ABOUT HERE Using a model or theory of presentations, (for example, Duke & Harrison, 1994b), the ‘image’ might be described as a collection of three objects (the cube, cone, and sphere). It is possible to state requirements involving this presentation, for example that the presentation of the sphere should be ‘hidden’ behind that of the cube. This constraint however makes no mention of what information a user of this system should perceive, and provides no basis for arguing about whether the proposed form of ‘hidden surface removal’ is either necessary or sufficient for the user’s tasks. Other design options should perhaps be considered, for example • the provision of reference lines, as in the ‘picture’ of the internal state of the scene in Figure 1; • the use of shadows or variable colour intensity; • allowing the user to change the viewpoint, i.e. to view the scene from different angles. In other words, a designer should consider whether the presentation of a system will allow the user to construct an appropriate ‘mental representation’ of the 10 relevant state. To express and reason about these issues with any precision, a designer will need some appropriate representation of human cognitive abilities. And as the choice of representation has consequences for the design of the system, for instance extra functionality to support navigation, the description of cognitive assumptions should be linked to the model of system requirements. This paper proposes that user and system modelling can be represented within a common framework in order to address these issues. By expressing the concepts and insight of a cognitive model as an approximate formal theory we aim to achieve two results. 1. To provide a representational framework that can accommodate both the information carried and presented by the system, and that perceived and understood by the user. Indeed, we suggest that understanding properties of interactive systems in terms of the conjoint, syndetic, behaviour of user and system models is ultimately a necessary step for HCI, given the development of interactive technology that relies increasingly on latent human cognitive abilities (Barnard & May, 1995, in press). 2. Given a suitably expressive framework, we aim to link the analysis of user and system models directly, by-passing the use of ‘intermediate’ design representations that occlude the underlying theory. To understand why these results are potentially important, note that empirical approaches play a role in HCI analogous to the testing phase of the software development cycle. Both are intended to assess the fitness of some artefact for a given purpose, and involve carrying out a series of experiments to test or allow the formulation of some conjecture about the system. In the case of software 11 testing, either the specification of the system or the software artefact itself are used to generate the test cases, and if an experiment refutes a given conjecture, then the rationale that lead to the construction of that test case can be used to determine the cause of the failure. In contrast, the refutation of a conjecture within experimental HCI then raises the question of why a particular outcome was observed, or how the behaviour of the device or its user(s) might be altered to address a problem. Without a body of theory in which to situate the conjecture and experimental results, it is difficult to see how improvements in either a specific artefact or a body of theory can be made other than through adhoc change and intuition. In summary, our view is that multi-disciplinary co-operation is vital if HCI is to develop a cohesive and applicable theory that addresses the design of advanced interfaces. Syndetic modelling aims to achieve this integration at the level of the theoretical models, rather than through the results of modelling. 2.2. Why Mathematics Matters In most branches of science, the development of fundamental theory has both prompted and drawn on advances in mathematical models and methods. Mathematical techniques are routinely used in disciplines such as physics, biology, engineering and economics as tools for modelling complex systems and deriving predictions from such models. The concepts and structure of mathematics, and the notation used to denote these, have evolved over millennia often through the challenge of better understanding natural phenomena and of making more precise statements and predictions about the behaviour of artefacts. The development and use of mathematics for the description of 12 software and hardware systems has also been widely reported, including applications specific to human-computer interaction (see for example Dix, 1991; Duke & Harrison, 1993; Paternó & Palanque, 1997, and in particular the series of annual Eurographics workshops on Design, Specification and Verification of Interactive Systems, e.g. Harrison & Torres, 1997). While there is an ongoing debate about the role of these mathematical techniques in the routine practice of software development (see for example Saiedian, 1996), several key advances in understanding complex problems in computing have come about through the development of mathematical abstractions (Hoare, 1996; Milner, 1993). The need within software development to (i) carry out rigorous checks on models, and (ii) support the development of models using software tools, requires that the mathematics used for software specification be defined to a level of rigour that is not commonly applied in other areas. The results are the so called ‘formal methods’, such as VDM, Z and CSP, that define collections of mathematical structures with a specific syntax and semantics for use in systems modelling. One difference between mathematical models of computing systems and those developed in other domains is that of scale. Equations in physics even complex expressions - typically involve a small number of observables, for example energy, velocity, mass, and/or derivatives of these. In contrast, the description of a computing system can require a comparatively large number of observables to characterise its state space and behaviour. The difficulty this causes for developing, presenting and reasoning about a model has lead to the development of notational frameworks for structuring the mathematical description of software systems. Some of these are quite abstract, and are 13 designed to support compositional development and reasoning about systems, for example the schema notation of Z (Spivey, 1992). Any description of interaction requires some representational framework, be it diagrammatic, informal, tabular or mathematical. Through syndetic modelling we aim to understand the constraints and principles that govern human information processing in the context of a given device. To achieve this we require a framework that can capture both the behaviour of a range of technologies, and also the constraints imposed by a specific model of human information processing. From previous experience we know that the mathematical techniques developed for software specification can capture the behaviour of the devices, and consequently these techniques have been adopted as the basis for syndetic modelling. Obviously, building a model of any system involves selecting abstractions, and encoding these within the available representations. Mathematical abstractions also have limits; there are aspects of systems for which suitable mathematical structures are either unknown, or where the cost and complexity of their development outweighs the insight that the model will generate. There is a concern that mathematics by its nature presents a barrier to the non-specialist, or that it does not adequately cover certain aspects of interaction that are clearly important in understanding the use of a system (aspects of the work domain being a particular example). The first point is answered, in part, by noting that the imprecision and ambiguity of non-mathematical representations can also creates barriers to understanding. More fundamentally, we would expect few designers to wish or be able to apply the approach in its current form. At present, it is a theoretical tool for understanding the role of cognitive resources 14 in interaction with novel technologies. It would be pointless to use such a framework for design problems for which there is already a body of practical understanding or design techniques; an example of using the proverbial sledgehammer to crack a walnut. However, as we will demonstrate, there are interactive technologies emerging from research work that can benefit from the kind of analysis that syndetic modelling is able to provide. The history of applied mathematics suggests that the technique will be adopted by designers as and when the perceived benefits outweigh the initial costs of learning the foundations. This, however, is not of concern within the present paper. As for the second point — syndetic modelling is not an attempt to create a representation of interaction that is ‘complete’ in any sense. Developing artefacts in any discipline involves the construction of a range of models, from abstract mathematical descriptions through to concrete prototypes or informal arguments, each with their own specific role or function. A syndetic model is just one of these models; its role is the analysis of the conjoint behaviour of a human user and an artefact in the environment of that user. To summarise, mathematics is used in syndetic modelling because it provides an expressive and concise means of describing the behaviour of complex systems, at a level of precision above that of diagrams or text. By working with a representation that is independent of any specific notion of interaction or device, the theories that we develop with syndesis will be more easily reused and generalised across domains. In the sections that follow, we will introduce the particular kind of mathematics needed for this paper, and will show how it can be used to describe an interface. 15 2.3. Recent Work The design space of integrative frameworks can be characterised in terms of a power-generality trade-off (Blandford & Duke, 1997). In this view, syndesis falls at the opposite end of the spectrum from design rationale approaches that endeavour to capture modelling results but which provide no means of relating them to the underlying theories involved. The presence of a psychologically grounded cognitive model in the approach also distinguishes syndesis from a number of recent attempts at extending models of the interface to accommodate claims about user behaviour (see for example Moher et al, 1996). Two particular examples of approaches that attempt to put integration on a theoretical basis reveal some of the further trade-offs that are involved. The approaches are the Interaction Framework (IF) (Blandford, Harrison & Barnard, 1995) and the production system model described by Kieras and Polson (Kieras & Polson, 1985). The former is intended to provide an agentneutral view, abstracting away from any specific user or system representation by working with a notion of event trajectory, a set of events ordered in time representing communication acts between agents in the system. In contrast, Kieras and Polson combine a production-system model of the user with a GTN representation of the system to obtain a detailed operational model of both agents. While both of these methods have contributed to a multidisciplinary understanding of HCI, they represent extreme approaches. IF operates by abstracting away from all details of a model that might bias the interpretation of an agent towards either user or system. In particular, the states of agents are not represented, making it difficult to express all but quite simple external properties of behaviour. At the other extreme, the highly operational nature of the GTNs 16 used by Kieras and Polson means that properties and requirements on interaction are obscured by the detail needed to express the low-level behaviour of user and system: the only way to understand such a model is to execute it. Syndesis represents a third, intermediate, level that integrates the contrasting representations of user and system theories into a single model. By expressing the behaviour of user and system in a common language, properties of an interactive system can be described and understood in terms of the conjoint behaviour of both agents. This is qualitatively different from the approach of Kieras and Polson in that the model is fundamentally not executable, and cannot therefore be used for simulation. The loss of executability is a consequence of using a much richer and more expressive language. 3. A MATHEMATICAL MODEL OF A MULTIMODAL USER INTERFACE In this section we demonstrate the use of mathematical techniques to develop the specification of an interface that accepts multimodal input. MATIS (Nigay, 1994; Nigay & Coutaz, 1995) is an experimental system to investigate this technology, in particular the use of deictic references in which, for example, spoken information can be combined with gesture via a mouse to produce a single command for an application. The domain of MATIS is flight information; the system allows a user to plan a multi-stage journey by completing ‘query’ forms that can be used to search a database for matching flights. The forms can be completed using multiple modalities, either individually or in combinations. The example in Figure 2 shows a user combining spoken natural language with mouse-based gesture to fill in the second query template. On the left hand side 17 the user has begun to speak a request that contains a deictic reference, “this city”, which is resolved when the user clicks on a field containing a city name; the right hand side of Figure 2 shows the query form after the system has interpreted the user’s input. FIGURE 2 ABOUT HERE A detailed mathematical model of MATIS will not be developed here; this has been done elsewhere (Duke & Harrison, 1994a). The model of MATIS developed here is minimal, being sufficient just to illustrate the approach, and the questions that arise concerning usability. It has been mentioned that the specification of computing artefacts benefits from the use of structures that organise the mathematics into useful abstractions. For work on interactive systems, structures called interactors (Duke & Harrison, 1993; Duke & Harrison, 1995b) have been developed. The use of interactors for identifying design issues and suggesting or evaluating design options has been explored in a number of case studies; see (Duke & Harrison, 1995a; Duke & Harrison, 1995b). The role of interactors within multidisciplinary modelling has also been described; for this, see (Bellotti et al., 1996) and (Buckingham Shum et al., 1996). An interactor consists of an internal state representing some facet of the application domain, and a presentation that describes the perceivable components of that state. As interactors are only a framework for a specification, they can be used with a variety of mathematical techniques for modelling behaviour. The approach taken here follows widespread practice (see for example Spivey, 1992) in using discrete structures such as sets, functions, relations and sequences to describe the structure of the state space of a system, 18 as well as the presentation. Invariants (properties of the state) and dynamic behaviour, i.e. the evolution of the system through its state space, are described here using Modal Action Logic (MAL) (Goldsack, 1988; Ryan, Fiadeiro & Maibaum, 1991; Kent, Maibaum & Quick, 1993). This, rather than the operation notation of Z or VDM, has been used as the axioms required for these examples can be stated and documented concisely. A description of the mathematical structures and notation used in the paper is given in Appendix A. The process of building a mathematical model of an interface is little different from that of modelling any other phenomenon. It begins with the identification of those aspects of the system that we want to model, and the definition of the concepts needed to describe those features. In the case of MATIS, we are interested in how a user might carry out the task of constructing a query using a combination of speech and gesture. Consequently the state of the model encompasses the contents of the data fields on the form, and the data from the input devices used by the system. We assume that there exists a type (set of values) called ‘name’, representing the names of fields on the forms on the MATIS interface. The type ‘name’ is a ‘given’ type. This means that we indicate simply that some set of values exists; the structure of the values within the set is not of interest. A second given type, ‘data’, is similarly introduced to represent the set of values that might be provided by the user, either by speech or by pointing. To model the fusion of information from separate data streams we will represent both speech and mouse data as a sequence of values. For speech, this sequence will contain pairs, each consisting of a database field name and an optional data value. A ‘missing’ data value (represented by the symbol ‘nil’) in the speech 19 stream will indicate that the user has employed deictic reference; the data for that field will be provided on the mouse stream. The mouse data stream itself is just the sequence of values that have been selected. So for example, if the user utters the query “Flights from this city in the morning to this city”, while using the mouse to select the values ‘London’ and ‘Paris’ on the display, the corresponding data streams might look like: speech = 〈(From, nil), (Time, morning), (To, nil)〉 mouse = 〈London, Paris〉 More generally, we define the type ‘value’ to be the union of ‘data’ and the constant ‘nil’. A ‘slot’ (on a form, or on the speech input stream) then is a pair consisting of the field name and a corresponding value. value =ˆ data ∪ {nil} data =ˆ name × value In a comprehensive model of MATIS it is convenient to distribute the specification across a number of interactors (Duke & Harrison, 1994a). Here however it is simpler to work with just one interactor, the state of which consists of five components, or attributes. These represent • the content of each query form; • the identity of the query form that the user is constructing; • the sequence of input received along both speech and mouse data streams; and • the query form that would result from the ‘fusion’ of the two input data streams. 20 The text of the mathematical specification begins as follows: interactor MATIS attributes vis fields : qnr x name → + value - query content vis current : qnr - current query mouse : seq data - data stream from mouse speech : seq slot - data (and holes) from speech result : name → + data - outcome of resolving deixis The annotation ‘vis’ is used to indicate that a particular observable is part of the visual presentation of the system. In this case, both the chosen user and the enabled buttons are (potentially) perceivable by the user of the system. The annotation indicates that when these components are perceivable, this is via the ‘visual’ modality. Observables in the presentation are called ‘percepts’ (Duke & Harrison, 1994b). However, just because a percept is defined in the state doesn't mean that it is always perceivable; the conditions under which a percept could be perceived are included in the axioms of the system. The dynamic behaviour of the MATIS system is described in terms of a number of actions. Like other observables, each action has a signature which indicates the kind of information that is involved in the action. Four actions are defined on the MATIS interactor. The first two actions relate to use of the speech and mouse modalities, and as indicated by the annotations, will be effected by the articulatory and limb channels of the user. The third action, ‘fuse’ is used to define the effect of performing fusion on the data streams. We will not discuss when fusion should be carried out, or how the results are inserted into the surrounding application by the ‘fill’ action. 21 actions art speak : name x value lim select : data - articulate a data value - select a data value fuse fill - fuse input streams - fill in slots on a query form The remaining part of the interactor is the collection of axioms that inter-relate the observables of the system. In this paper the axioms are expressed in modal action logic. This extends the usual connectives and quantifiers of first order logic with a modal operator [A] for each action ‘A’, and two deontic operators that can be used to express that an action is either permitted or obliged under particular conditions. The meaning of each axiom is explained in the accompanying commentary. Axiom 1 defines the effect of the ‘speak’ action on the speech data stream. If the value of the speech stream is X, the axiom requires that the effect of speaking a name-data pair is to append that pair to X. axioms speech = X ⇒ [speak(nm,d)] speech = X^〈(nm, d)〉 (1) If the speech stream holds X, then speaking a name/data pair results in a speech stream with that pair appended to X. Axiom 2 defines similar behaviour for the ‘select’ action, though here the new value is a data item that is appended to the stream of mouse input. mouse = M ⇒ [select(d)] mouse = M^〈d〉 If the mouse stream holds M, then selecting a data item d results in a stream in which d is appended to M. (2) 22 The third axiom is an invariant, that is a property of the system that is true over all time. When expressing properties of percepts, we enclose attributes in boxes to indicate that it is the perceivable representation of the value, rather than the value itself, that is being referred to. It requires that all the data in the current query is available in the presentation of the interactor: ∀n : name • (current, n) ∈ dom fields ⇒ fields(current, n) in MATIS (3) For any field name ‘n’, if there is an entry in the fields of the form labelled ‘current’ for n, then that data is part of the presentation of MATIS. Before appraising the role, value, and limitations of this model, a few comments on the mode of expression are required. Each part of this particular model has been given as a fragment of mathematical text, accompanied by a significant chunk of explanatory prose. It may be tempting therefore to question whether the mathematics is actually needed, or whether it is just formalism for its own sake. One way of understanding why this is not the case is to take a mathematics textbook and find a worked problem. The ‘trick’ of annotating each formula, used in this example, can be applied just as well to any other piece of mathematics. Can we therefore conclude that mathematics (or, at least, mathematical notation) is unnecessary (or formalism for the sake of formalism)? Hopefully, the answer is ‘no’; this will certainly be the case if the reader has ever used applied mathematics. Applying liberal doses of natural language commentary to a complete, developed model is straightforward. However, in comparison with the equivalent mathematical text, linguistic representations are cumbersome, prone to greater ambiguity, and significantly more difficult to manipulate in any systematic way. Thus mathematical representations are more 23 than just terse versions of natural language statements; crucially, the notation affords the definition and application of operations on representations in the form of calculation and proof. In the example so far we have simply used the mathematics to describe a situation; what will be demonstrated later in the paper is how this description can be manipulated to derive proofs of properties that can be interpreted as interesting statements about the usability of the artefact they model. As shown in (Duke & Harrison, 1994a), the model can be extended to encompass a detailed description of the components that make up the MATIS interface, and details such as the effect of the ‘fuse’ action can be included. These are beyond the needs of this paper. The development of a model like this can be a useful source of insight in development; by encouraging a developer to document the structure and behaviour of an interface explicitly, latent questions and ungrounded assumptions can be teased out (Bellotti et al., 1996; Buckingham Shum et al., 1996). What the model does not (and cannot) address is how the information provided by the system can or should be understood by users, and how users' perception of the system will mediate execution of the tasks for which the system was designed. For example, will users be able to utilise the deixis capability? As interactive systems make increasingly rich use of different modalities, and rely more on users' often latent knowledge of the world (Barnard & May, 1995), these questions are increasingly beyond the capability of any one modelling approach. To answer questions about the usability of the system captured in the specification, we will need to work within a framework that can make authoritative statements about human capabilities and limitations. 24 4. INTERACTING COGNITIVE SUBSYSTEMS (ICS) In contrast to models of cognition which seek to simulate the thinking of an individual person processing some specific information, ICS is a resourcebased framework which describes the overall pattern of information flow through the human cognitive architecture. Cognition is represented as the exchange, storage, revival and transformation of information by nine independent cognitive subsystems, each of which deals with a different level of mental representation. In principle, this modularity allows the processes within each ICS subsystem to be modelled mathematically in the same way that components of the system can be modelled. This section presents an overview of the key aspects of ICS, which will be taken up in the subsequent modelling examples. As an overall theory of cognition, ICS consists of ‘architectural’ constraints on information flow, the use of memory, and the blending of information streams, as well as ‘local’ theories for specific types of information processing. It describes cognition in terms of interaction within a collection of sub-systems that each operate on a specific level of mental representation, or ‘code’. Although specialised to deal with specific codes, all subsystems have a common architecture, shown in Figure 3. FIGURE 3 ABOUT HERE Incoming representations in the appropriate code arrive at an input array, from which they are copied into an image record representing an unbounded episodic store of all representations received by that subsystem. In parallel with this basic copy process, each subsystem also contains transformation processes that have learned to convert incoming representations into other mental codes (although 25 not all subsystems can produce all other codes, as will be described below). These transformed outputs are passed through a data network to other subsystems. If the incoming representation is incomplete or ‘noisy’ a transformation process can augment it by accessing similar patterns stored in the image record (in Figure 3, ‘transform C to X’ is accessing the image record). This allows the revival of both specific instances of stored representations, and regular patterns abstracted from representations that have arrived at different times. ICS assumes the existence of nine distinct subsystems, each based on the common architecture described above, and linked together as shown in Figure 4. The subsystems can be defined by the nature of the representations that they process, and these are briefly described in Figure 5. FIGURE 4 ABOUT HERE FIGURE 5 ABOUT HERE 4.1. Configurations The overall behaviour of the cognitive system is constrained by several principles of processing (for more detailed information, see Barnard & May, 1993, 1995, in press). The most important concept that is significant for our understanding of interaction is that of configuration. This describes the way in which cognitive resources are deployed for a particular processing task. As an example, the lines in the ICS diagram in Figure 4 show how the subsystems would be configured for locating some graphical object though gesture, for instance pointing to an icon on a display using a mouse. 26 In order to locate the icon, information arriving at the visual system (1) will be transformed into object code (2) that contains the basic organisation of visual elements on the display. This transformation is written as ∗vis-obj: where the ∗ indicates information being exchanged with the external world (i.e., arriving from the senses), and the : indicates information being exchanged internally. At the same time, the propositional subsystem is copying information about the desired target (3) to its image record, and using :prop-obj: to produce an object code representation (4). When this representation can be blended at the OBJ subsystem with the incoming representation from ∗vis-obj:, :obj-prop: will be able to return a matching representation (5) to the propositional subsystem to indicate that a possible target has been found. Finally motion of the mouse via the hand is controlled by the limb subsystem through :obj-lim∗ (6). While this configuration is actively locating an object, a second sequence of processes could be engaged in producing spoken output, such as “now where is that icon?” This would require the :prop-mpl: process (7) to produce a morphonolexical structure to drive the generation of speech (8) via :mpl-art: and :art-speech∗ processes (the latter actually being a set of :art-lips*, :art-toungue*, :art-breath*, etc processes). The occurrence of secondary configurations such as this is constrained by the fact that a transformation process can only operate on one coherent representation at a time (although, as is described in the next section, that representations may be a blend of information from multiple sources) and so can only produce a single output representation. Furthermore, the number of possible configurations is limited by the nature of the output codes each subsystem can produce. Figure 4 indicates which outputs are possible from each subsystem. 27 4.2. Blending of Data Streams Because it deals with information flow, ICS is well placed to model multimodal cognition. As shown in Figure 4, visual and acoustic representations can be structurally interpreted (by the ∗vis-obj: and :ac-mpl: processes) before being represented in propositional form (by :mpl-prop: and :obj-prop: processes). At the same time, the higher level, implicational meaning of these representations can be directly brought together (by the ∗ac-implic: and ∗vis-implic: processes), and these can give rise to an internally generated propositional representation (by :implic-prop:). A consequence of this is that the input arrays of the central subsystems will receive several representations, from different sources. Within the subsystem, each transformation process can independently ‘lock on’ to any part of the input array, and so will base its output on a subset of the total information that is available in its own code. The subset that is used can in principle be formed from any part of the input array, and so can comprise information from more than one source. It should be noted that as well as having beneficial effects, this can also result in interference, if a ‘coherent’ representation is actually derived from an irrelevant source. 4.3. Cyclical Configurations Coherent representations can be produced when events in the world are detected through more than one sensory modality, or when single sensory representations are consistent with ongoing central processing. The latter situation often occurs when transformation processes form a cyclical configuration. In the example shown in Figure 4, the :obj-prop: and :prop-obj: 28 processes are involved in a reciprocal exchange of object and propositional representations (notationally abbreviated as a ‘POP loop’). One effect of cyclical configurations is to maintain the stability of processing over time, with new information being interpreted in the context of ongoing cognition. They also allow ‘top down’ influences on the interpretation of sensory data to occur. Two other important cyclical flows involve the :prop-mpl: and :mpl-prop: processes (a ‘PMP loop’) for the semantic processing of verbal information, and the :prop-implic: and :implic-prop: processes (a ‘PIP loop’) for the schematic comprehension and interpretation of propositions. This latter cycle is so important in the formation of internal goals and the regulation of mental states that it is known as the ‘central engine’ of cognition (Teasdale & Barnard, 1993). The visual and acoustic subsystems can not take part in direct, internal cycles because their input arrays do not receive information from the data network, but only from the external world. They can participate in indirect cycles if they are being used to observe or listen to the individual’s own actions or speech. The body state subsystem can also take part in indirect cycles, since the bodily consequences of the outputs produced by the articulatory and limb subsystems are sensed as body state representations. These can be used by the ∗bs-art: and ∗bs-lim: transformations to provide proprioceptive feedback, which is useful in co-ordinating speech and motor output. It can also detect the somatic and visceral outputs from the implicational subsystem, and produce feedback by the ∗bs-implic: transformation. This ‘BIB’ loop mediates affective influences on cognition, such as those of mood, emotion and state dependent effects. 29 4.4. Buffering of Transformations At steps (3) and (4) in Figure 4 the :prop-obj: transformation is not operating directly on information as it arrives on the input array of the subsystem, but is accessing it indirectly, after it has been copied to the image record. This ‘buffering’ of a process allows it to operate on information that is arriving too quickly for normal, direct processing to cope with, and also allows a form of pre-processing to occur in the image record, so that the process actually operates on information that has been integrated over a succession of representations received on the input array. As with cyclical processing, this can provide additional stability of processing, in that short-term ambiguities in data can be overcome, and it also allows a process to continue producing its output at its own steady time-base. Because buffering requires access to the image record, and a subsystem’s image record can only revive information for one transformation at a time, it follows that only one process within a subsystem can operate in buffered mode at a time. In fact, because a buffered process operates on its own time-base, freed from the input rate of information, it determines the rate of flow of information through the rest of a configuration. It is thought that a configuration can only contain one buffered process without these temporal constraints resulting in it becoming unsynchronised and breaking down. This is consistent with the theoretical assumption that buffering is associated with focussed awareness of information (as distinct to the copy process, which results in diffuse awareness), for phenomenologically we are only able to focus on one stream of information at a time (while we remain diffusely aware of others). The 30 theoretical and architectural limitations on buffering provide another important constraint upon the configurations that can co-occur. 4.5. Interleaving and Oscillation The configurations described so far come into being because the flow of information through the overall architecture both requires them to occur and supports their occurrence. If the representations on the input array of any subsystem fail to provide enough information for a process in a configuration to produce an appropriate output, the configuration can collapse; and if other representations provide a stronger input to a process, they can displace the ‘configural’ representation, leading to an unexpected output and a change in the configuration as subsequent processes lock on to other sources of information. Since mental representations can arise from sensory or central data streams, human cognition is thus dependent upon both external and internal control, although there is no controlling ‘agent’ as such: control arises from the dynamic interaction of subsystems operating independently and in parallel. While two (or more) configurations can operate simultaneously if they do not require conflicting resources, it is more likely that at least one shared process, image record access, or buffer is required. In these situations the configurations must interleave, with each information flow taking ‘control’ of the architecture for as long as it can before the demands of the other information flow overcome it. Interleaving can be thought of as a form of ‘multitasking’, where each task is processed in alternation rather than truly simultaneously. The rate of interleaving depends upon the relative ‘strength’ of the configurations, determined by the degree to which their respective information flows support them, and it is 31 possible for a weaker configuration to be frozen out until the stronger configuration fades or concludes. Interleaving occurs when the competing configurations are operating on different information flows, producing different effector or central ‘outputs’. Competition for resources can also occur within a configuration, when, for example, two processes within a subsystem need image record access, or two processes require buffering. In these situations, the ‘shared’ resource has to oscillate between the competing processes. Since they are both part of the same configuration, it is not possible for one process to gain control of the shared resource for very long before the absence of output from the other process limits the stability of the information flow. Oscillations between the processes are thus self-limiting, and less ‘competitive’ than interleaving of configurations. The amount of interleaving and oscillation that a task involves can be used as an indication of its cognitive complexity, with greater requirements for configural changes tending to result in slower overall rates of processing, less capacity for other simultaneous task performance, and an increased likelihood of errors occurring through poor synchronisation of transformation processes and the exchange of representations between subsystems. 5. A MATHEMATICAL MODEL OF ICS As a cognitive model, ICS encapsulates a substantial body of theory about human information processing. Experiences in the application of theoretically based models to problems of user interface design have been reported for models such as PUM (Blandford & Young, 1995), and ICS (Barnard & May, 1993). However, these analyses are primarily concerned with the cognitive 32 aspects of a particular scenario. The techniques themselves are not intended to provide general frameworks for modelling the behaviour of computing systems. Syndetic modelling is intended to provide a single framework to represent the behaviours of both cognitive and computational systems and in so doing allow both software and cognitive perspectives to brought to bear on problems of interaction. In this way, the assumptions and insights of both parties can be represented and considered explicitly. Although the idea of bringing together user and system models is in principle independent of the underlying approaches or representations, we have particular reasons for selecting ICS as the cognitive foundation for syndesis. First, ICS has both the breadth of applicability and depth of theory to support the analysis of the kind of novel and sophisticated technology that is moving out of research contexts into social and industrial application. Its scope of application ranges from display structure, through blending of multi-modal data streams, to issues of affect and emotion that are applicable both to clinical aspects of cognition (e.g. depression, Teasdale & Barnard, 1993) as well as interface elements (Barnard & May, 1995). In this section we take the mathematical concepts that were used to develop a model of the MATIS interface, and with them build a representation of ICS. We will show subsequently that the mathematics supports the composition of these two models into a form in which questions about interaction can be phrased and answered rigorously. The mathematical development is broken into three sub-sections: the basic concepts needed to describe ICS, the state space of the ICS architecture, and the axioms that govern information processing within ICS. Following the 33 presentation of the model, the section concludes with a review of key points raised by this approach. 5.1. Basic Definitions Our model of ICS is based around the main resources described by the model transformation processes and mental representations. To begin the process of formalisation, we define given types (sets) to represent the concepts of subsystem and representation. sys - ICS subsystems, e.g. vis, prop, obj etc. repr - Mental representations In this model we do not address the internal structure of mental representations, although there is a significant body of psychological theory that we could draw on in order to do so. For example, (May, Scott & Barnard, 1995) describe a model in which representations consist of basic units of information organised into superordinate structures. This level of detail would greatly expand the paper without adding much insight into the scope or use of syndetic models. Indeed, one of the advantages of an axiomatic framework is that abstraction can be used to hide details that do not contribute to the understanding or analysis of a system. Thus, while coherence of representations depends on several issues, including the timing of representations, that will not be addressed here, we can still account for the concept of coherence by defining a relation over representations. _≈_ : repr ↔ repr 34 We can then write ‘p ≈ q’ to express the requirement or property that representations ‘p’ and ‘q’ are coherent. A more detailed model could then define this relationship in terms of the structure of the representations. Here, we can require that any representation is consistent with itself (1), and that the relationship is symmetric (2). 1 ∀r : repr • r ≈ r 2 ∀p, q : repr • p ≈ q ⇔ q ≈ p As discussed in Section 4.1.2, representations arriving at a subsystem may be blended with the result that a transformation effectively operates on a datum derived from multiple sources. Again, as our model abstracts away from the constituent elements of representations we cannot give an explicit account of how blending takes place. We can however model its effect on representations by defining a relation, ‘E’, such that ‘p E q’ means that the representation ‘p’ is part of ‘q’; in other words, q is the result of blending p with (possibly) other representations. This relationship defines a partial ordering over representations, i.e. it is reflexive, antisymmetric and transitive. _ E _: repr ↔ repr The next concept to be introduced is that of a transformation between two mental codes of ICS, for example :obj-prop:. Any transformation can be identified by the source and destination subsystems, and so the type ‘tr’ of transformations is modelled as an ordered pair. For convenience, we also define two functions that extract the first (src) and second (dst) components of a transformation. 35 tr =ˆ sys × sys — names of transformation processes src, dst : tr → sys This definition is actually rather loose, in that it admits - for example - (obj,vis) as a transformation as well as (obj,implic). To be rigorous, such non-existent processes would be eliminated by adding a predicate to enumerate all legal transformations. Each transformation process within the ICS subsystems operates on and generates a stream of representations. In most cases, these streams are carried by the internal data network of the architecture, but clearly if cognition is to be located in an environment then it must be possible for streams to both originate (perception) and terminate (action) in the outside world. For convenience, we will write transformations as :src-dst:, ∗src-dst: or :src-dst* depending on whether the stream originates or terminates in the external world (∗tr:, :tr∗) or is completely internal to the data network (:tr:) of the architecture. That is, ‘∗’ denotes the external world and ‘:’ is the data network. In the remainder of the paper, we will use the term ‘stream’ to refer to the input or output of a transformation. A set of transformations involved in some information processing task is called a configuration, while the “chain” of transformations involved in information processing for a particular task is called a flow. For example, Figure 4 shows a flow containing (amongst others) the following chain of transformations: 〈∗vis-obj:, :obj-prop:, :prop-obj:, :obj-lim:, :lim-hand∗〉 The corresponding configuration includes the set of transformations that appear in this sequence. In general, a flow consists of a subset of the transformations that make up a configuration, and a given transformation may occur more than 36 once. Formally, we define the type ‘Config’ to be a set transformations, and the type ‘Flow’ to be a sequence of transformations. Config =ˆ P tr Flow =ˆ seq tr 5.2. The Architecture The state of the ICS interactor captures the flows of information involved in processing activities, and the properties of specific transformations such as stability and coherence which define the quality of processing, or in other words, user competence at particular tasks. The source of data for each transformation is represented by a function ‘sources’ that takes each transformation ‘t’ to the set of transformations from which ‘t’ is taking input. In general only a subset of transformations are producing stable output, and this set is defined by the attribute ‘stable’. The function ‘input’ maps each transformation to the representation that is currently available to it as input. As we will see, this input representation may be derived by blending the output of several other processes. interactor ICS attributes sources : tr → P tr stable : P tr input : tr → repr The representations being generated by a transformation are given by the relation ‘_on_’, where ‘p on t’ means that representation p is available as the 37 output of t. All representations arriving at a subsystem are copied to the image record, and the contents of these records are represented by the attribute ‘_@_’ where ‘p@s’ means that representation ‘p’ is part of the image record of subsystem ‘s’. _on_ : repr ↔ tr _@_ : repr ↔ sys As not all representations are coherent, only certain subsets of the data streams arriving at a system can be employed by a process to generate stable output. The set ‘coherent’ contains those groups of transformations whose output in the current state can be blended. If the inputs to a process are coherent but unstable, the process can still generate a stable output by buffering the input flow via the image record and thereby operating on an extended representation. However, only one process in the configuration can be buffered at any time (this is actually a simplification for the purposes of this paper, since as explained in Section 4.5 oscillation of the buffer can occur) and this process is identified by the attribute ‘buffered’. coherent : tr x tr → B buffered : tr The configuration itself is defined to be those processes whose output is stable and which are contributing to the current processing activity. This processing activity, in turn, consists of a set of flows carrying data through the architecture, and these are represented by the attribute called ‘flows’. config : Config flows : P Flow 38 Four actions are addressed in this model. The first two, ‘engage’ and ‘disengage’, allow a process to modify the set of streams from which they are taking information, by adding or removing a stream. A process can enter buffered mode via the ‘buffer’ action. Lastly, the actual processing of information is represented by ‘trans’, which allows representations at one subsystem to be transferred by processing activity to another subsystem. actions engage : tr x tr disengage : tr x tr buffer trans 5.3. Information Processing in ICS The principles of information processing embodied by ICS are expressed as axioms over the model defined above. Axiom 1 defines coherence of data streams in terms of coherence of the representations available on those streams. axioms coherent(t1, t2) ⇔ dest(t1) = dest(t2) ∧ (1) ∀ p, q : repr • p on t1 ∧ q on t2 ⇒ p ≈ q Stream, t1 and t2 are coherent if and only if they have the same destination, and for any representation p available on t1 and q on stream t2, p and q are coherent. The second axiom defines the concept of a stream’s stability. This requires that the inputs to the transformation generating the stream are at least stable. 39 However, coherent input doesn't guarantee stable output, as the input may only be a partial representation of the data that the process needs to generate output. If the input is unstable, then the process will need to be buffered. A configuration is then the set of processes that are generating output that is both stable and which is used elsewhere in the overall processing cycle. t ∈ stable ⇔ ∀s1, s2 : sources(t) • coherent(s1, s2) ∧ (2) (t = buffered ∨ sources(t) ⊆ stable) A transformation ‘t’ is stable if and only if every pair of streams on which it operates are coherent, and either the transformation is buffered, or the input streams are themselves stable. t ∈ config ⇔ (t ∈ stable ∧ src(t) ∉ {art, lim} (3) ⇒ ∃ s : tr • t ∈ sources(s)) A stream or process ‘t’ is part of the processing configuration if and only if it is stable and, unless it is part of an effector subsystem, there is some other transformation ‘s’ that is using the stream from t. Axioms 4 and 5 concern flows. Any transformation that is part of a flow must be part of the configuration, and similarly if a transformation is in the configuration it must be part of some flow. This is expressed by axiom 4. Axiom 5 captures the ‘chaining’ property of flows. If two transformations are adjacent in a flow, then the first transformation must be one of the sources used by the second transformation. The symbol ‘^’ is sequence concatenation. t ∈ config ⇔ (∃ f : flows • t ∈ ran f) (4) 40 A transformation ‘t’ is in the configuration if and only if there exists some flow ‘f’ that contains t. ∀ s, u : Flow; t1,t2 : tr • s^〈t1,t2〉^u ∈ flows ⇔ t1 ∈ sources(t2) (5) For arbitrary flows ‘s’ and ‘u’, and transformations ‘t1’ and ‘t2’, there is a flow in the system containing t1 followed by t2 if and only if t1 is a source of t2. A process will not (normally) engage an unstable stream, a constraint that is captured in axiom 6 via a deontic predicate. If the output of a process is unstable, it will either engage a stable stream, disengage an unstable stream, or try to enter buffered mode (axiom 7). per(engage(t, src)) ⇒ src ∈ stable (6) A process t is permitted to engage a stream ‘src’ if and only if ‘src' is stable. ∃s : tr • s ∈ stable ∧ s ∉ sources (t) ∧ obl(engage(t,s)) ∨ t ∉ stable ⇒ ∃s : tr • s ∉ stable ∧ s ∈ sources (t) ∧ obl(disengage(t,s)) ∨ obl(buffer(t)) (7) If the stream t is not stable, then either (i) there is a stable stream s that the process isn't using and which it is required to engage, or (ii) the process is using an unstable stream s and is required to disengage this stream, or (iii) the process should enter buffered processing mode. 41 The effects of the buffer, engage, and disengage actions are straightforward and are given by axioms 8-10. [buffer(t)] buffered = t (8) After the architecture buffers the process t, t is in buffered mode. sources(t) = S ⇒ [engage(t, s)] sources(t) = S ∪ {s} (9) If the sources of a transformation t are given by the set S of streams, then after t engages the stream ‘s’, its sources will be S extended with ‘s’. sources(t) = S ⇒ [disengage(t, s)] sources(t) = S – {s} (10) If the sources of a transformation t are given by the set S of streams, then after t disengages from the stream ‘s’, its sources will be the set S minus the element ‘s’. The next two axioms define the effect of information transfer across the architecture. Axiom 11 describes how the input to a given transformation is related to the contents of the various data streams, while axiom 12 defines the action that changes the contents of the streams. src(t) ∉ {ac, vis, bs} ⇒ ∀s : sources(t) • p on s ⇒ p E input(t) (11) If a transformation isn't part of a sensory system, then for any source s with which t is engaged, any representation p that is on s will form part of the representation that is input to the transformation. t ∈ stable ∧ input(t) = p ⇒ [trans] p on t (12) 42 If a process ‘t’ is stable and has input ‘p’, then after transformation a representation derived from p will be on the output stream corresponding to ‘t’. The constraint that the input rule in axiom 11 applies only to non-sensory subsystems is important, as it identifies a key point where a link must be made between the cognitive processes and the environment or system in which they operate. That is, the input to the sensory systems can only be defined after the architecture has been located in some context. It is the counterpart to the constraint in ICS axiom 3 concerning utilisation of output subsystems. Also, note the phrase ‘derived from p’ in the comment of axiom 12. Clearly, ICS processes produce output that is substantively different from the input representation(s). However, without describing the constituent structures etc of representations, we cannot describe the effect of processes on the data. The model presented here focuses on the flow of data through the system and abstracts away from representations. The remaining axiom describes the COPY process that is a part of each subsystem; any representation carried on a data stream will be copied into the image record of the destination system. ([trans] p@s) ⇔ p@s ∨ ∃t : sys • p on :t-s: (13) After transformation, a representation ‘p’ is in the image record at subsystem ‘s’ if and only if p was either in the record beforehand, or there is some subsystem ‘t’ such that p is on the stream from t to s. More axioms have been given than will actually be required to model the two scenarios in this paper. The point of this model, however, is that it is not 43 specific to the particular problems examined here, but can be applied to a range of interface techniques, and could be extended to address aspects of cognitive processing, i.e. utilisation of memory records, in greater detail. 5.4. Key Points The mathematical model of ICS given above is obviously an approximation to the available body of psychological theory. We have not, for example, considered how processes access the contents of their system’s image record when in buffered mode. For the scenarios addressed here, the primary concern is with the cognitive resources and their utilisation within an overall configuration of information processing. The important point is that the mathematical model is general and can therefore be applied or specialised to a variety of domains. The axioms developed above are not specific to any one processing task, and indeed in the remainder of the paper we will utilise particular axioms in reasoning about patterns of interaction with two quite different systems. That such a model could be developed is not in itself surprising. There was already evidence, in the form of two ‘expert’ systems (May, Barnard & Blandford, 1993), that significant principles underlying ICS could be represented within a formal (computable) framework. In contrast, the model developed here utilises the general body of mathematics, rather than the restricted deduction apparatus that underlies an expert system shell. In the conclusion we will have more to say about future work on the mathematical foundations of human information processing. In closing this section, it is worth noting that one further reason for recruiting ICS as the foundation for syndetic modelling is that the theory has a 44 fundamentally declarative interpretation that maps well onto the methods that we had used previously for modelling computing components of interactive systems. In both cases, observables are used to characterise the intended behaviour of some system. Mathematical models are insensitive to whether their subject is computer software and hardware, or cognitive resources, information flow, and transformation. User and system components both impose constraints on the processing of information within the overall system. 6. A SYNDETIC MODEL OF MATIS The specification in Section 3 of MATIS included the capability of the system to handle deictic input. Deixis is a feature of human-human interaction, so one can make an informal case that it represents a potentially useful tool for humancomputer interaction. To explore whether this is in fact the case, we will construct a syndetic model by combining the MATIS specification with the model of ICS, and then make a conjecture about the conditions under which deixis will be possible to a user of the system. We will then show how we can reason within the mathematical theory about the validity of this conjecture. The original specifications were developed independently of each other, and in bringing them together into a syndetic model we extend the original models with additional observables (in this case an action) that captures the interplay between the two agents. For MATIS, we posit a ‘read’ action that allows the user to locate some lexical item, such as the name of a city, on the presentation. In a more substantial system model, this action would be bound to the contents of the query forms (see Figure 2) that were available at any time on the screen. 45 This degree of detail however is not essential for illustrating the role played by syndesis in understanding deixis. interactor MATIS-User MATIS - include the MATIS spec ICS - and the ICS framework actions read : data - observe the MATIS presentation The conjoint behaviour of the two agents is captured by three axioms that span the two sets of observables, in MATIS and ICS. The first axiom defines the condition under which it is possible for the user to read an item of data from the presentation. On the system side, there must exist a field on a query such that the value of the field is the data item. On the user side, the configuration must include a data flow from the visual system, through the object and morphonolexical levels, to the propositional subsystem. axioms per(read(d))⇒ d in MATIS ∧〈∗vis-obj:, :obj-mpl:, :mpl-prop:〉∈flows (1) It is possible to read some data item ‘d’ if d is part of a field of a query in the display and the cognitive configuration enables reading. In this scenario we are concerned with the representations that are being processed within a flow. To capture this idea concisely, we define a relational symbol named ‘on-flow’ as an abbreviation for a condition involving components of the ICS interactor: 46 on-flow : repr ↔ Flow =ˆ r on-flow f ⇔ f ∈ flows ∧ ∀t : ran f • r on t A representation ‘r’ is on a flow ‘f’ if and only if the flow is part of the processing configuration, and for all transformations that are in the range of the sequence defining the flow, the corresponding representation is available as output of those transformations. Axioms 2 and 3 address the cognitive requirements associated with the action of selecting a data item with the mouse, and uttering some part of a query. As items on the MATIS display are lexical structures, the configuration for object search used in the AV scenario is not sufficient. The mpl and prop systems need to be recruited to find lexical objects (words) on the screen and compare them with the users' goals. This will require that the representation of the word is on the flow defined by a search configuration suitable for lexicographical data derived from visual input. For speech, the data flow will begin within a PIP loop and then will be processed via the mpl and art subsystems to produce spoken words. word-search =ˆ 〈∗vis-obj:, :obj-mpl:, :mpl-prop:, :prop-mpl:, :mpl-prop∗〉 speech =ˆ PIP^ 〈:prop-mpl:, :mpl-art:, :art-speech∗〉 Note that the final three transformations in the ‘word-search’ flow define a processing cycle referred to as a PMP loop. That is, propositional information produced by mpl may be used by processes in prop to construct new mpl representations. The PMP loop and PIP loop needed for speech indicate a cyclic 47 interchange of representations between two or more processes, as described in Section 4.3. per(select(d)) ⇒ d on-flow word-search ∧ d in MATIS (2) If it is possible to select an item (with the mouse) then the item must be part of the display, and a representation of the item must be processed within a flow configured for lexicographical search and comparison. per(speak(s)) ⇒ s on-flow speech (3) If it is possible to articulate part of a query then a representation of the phrase must be processed through a data flow that originates as a PIP loop and then results in the production of speech via the mpl system. Since deixis involves operating on two streams of potentially different representations (one dealing with data to be spoken, the other with data involved in lexical search), we conjecture that there might be a difficulty in using the interface if these streams conflict. We construct a hypothesis that in order for the user to speak a phrase ‘s’ and select a data item ‘d’ concurrently, the representations of ‘s’ and ‘d’ must be coherent. This can be expressed formally, as the sequent given below. It reads “From the list of axioms given in the MATIS-User specification, and the predicate per(speak(s)&select(d)), it can be shown that the predicate s ≈ d is true”. Note that the axioms of the MATIS-User specification include those of the interactors that it inherits, i.e. MATIS and ICS. MATIS-User ∧ per(speak(s)&select(d)) ≡〉 s ≈ d 48 Mathematical reasoning takes a variety of forms, depending on the structures involved and the level of rigour required. In this paper we have chosen to use a specification logic (MAL). MAL subsumes first order logic, and most of the reasoning that we will do with specifications here just utilises the inference rules of classical first order logic (Lemmon, 1993). For the first proof, the only additional rule needed relates to the deontic operator (per ) in the conjecture. As an aid to readability and understanding, we have opted for a ‘calculational’ style of presentation, in which the proof is set out much like an algebraic calculation in ‘standard’ mathematics. The mathematical expressions are interleaved with text that explains how each step in the argument is supported; consequently, the argument has the following structure: Assumption (i.e. per(speak(s)&select(d)) (1) Explanation of why assumption justifies statement 2. ≡〉 Statement 2 (2) Explanation of why statement 2 justifies statement 3. ≡〉 … Explanation of why statement n - 1 justifies conclusion. ≡〉 Conclusion (i.e. s ≈ d) (n) To see the analogy with conventional mathematical argument, simply replace the ‘therefore’ symbol (≡〉) in the outline with equality (=). In practice, the presentation of a proof tends to be terser, and required rather more knowledge of mathematical reasoning than has been assumed here. There is no reason why this proof could not be expanded into a more structured and formal account, using the style of (Lemmon, 1993) or indeed expressed in a form suitable for an automated verification tool such as PVS (Owre et al, 1995). 49 The proof appears below. per(speak(s)&select(d)) (1) The first step is the only step that relies on deontic logic; if is permissible to do two actions concurrently, then it is permissible to do each of the actions. ≡〉 per(speak(s) ∧ per(select(d)) (2) The next two steps use Modus Ponens, the inference law that states that from P, and P ⇒ Q, one can deduce Q. This is applied first to axiom 3 of MATIS-User; here ‘P’ is the statement that ‘per(speak(s))’ and ‘Q’ is the consequent part of axiom 3, ‘s onflow speech’. ≡〉 s on-flow speech ∧ per(select(d)) (3) The same process is repeated using axiom 2 of MATIS-User with the hypothesis per(select(d)). Effectively, we have expanded what it ‘means’ for the actions speak(s) and select(d) to be permitted. ≡〉 s on-flow speech, d on-flow word-search ∧ d in MATIS (4) The definition of ‘on-flow’ is now expanded: ≡〉 speech ∈ flows ∧ ∀t : ran speech • s on t ∧ word-search ∈ flows ∧ ∀t : ran word-search • d on t speech is a sequence of transformation processes representing a flow of representations, and one of the transformations within that flow is :prop-mpl: Axiom 4 of ICS states that a transformation is (5) 50 part of a flow if and only if it is part of the overall configuration, and applying this rule by substituting ‘:prop-mpl:’ for the variable (t) in the axiom 4 gives: ≡〉 speech ∈ flows ∧ ∀t : ran speech • s on t ∧ :prop-mpl: ∈ config (6) ∧ word-search ∈ flows ∧ ∀t : ran word-search • d on t An important property of deduction is that if we can ‘focus in’ on part of a problem and work on it in isolation. That is, if we have the statement ‘A ∧ B’, and an axiom or rule of inference allows us to deduce ‘C’ from ‘A’, then we can re-write the original statement as ‘C ∧ B’. Another way of stating this is that, if C follows from A, it also follows from A ∧ B, so if we wish we can focus in on one part of a statement. Here, we focus on the requirement related to the configuration: ≡〉 :prop-mpl: ∈ config ∧ ... (7) Necessary and sufficient conditions for a transformation to be a part of the configuration were given as axiom 3 of the ICS interactor. Substituting ‘:prop-mpl:’ for the variable ‘t’ in that axiom results in the following: ≡〉 :prop-mpl: ∈ stable (8) ∧ src(:prop-mpl:) ∉ {art, lim} ⇒ ∃s : tr • :prop-mpl: ∈ sources(s) ∧ ... We now focus in on the need for stability: ≡〉 :prop-mpl: ∈ stable ∧ ... (9) 51 The ICS theory provides an axiom (number 2) that gives conditions necessary for a stream to be stable. The following sequent results from substituting :prop-mpl: for t in that axiom. ≡〉 ∀s1, s2 : sources(:prop-mpl:) • coherent(s1, s2) (10) ∧ (:prop-mpl: = buffered ∨ sources (:prop-mpl:) ⊆ stable) To make further progress, at this point we need to find appropriate transformations to replace the quantified variables s1 and s2 in statement (10); that is, we need to see if the model makes any statements about the input being used by :prop-mpl:. Here we make use of the flow rule (axiom 5 of ICS), using the predicates about flow introduced in step (5). For example, since speech = PIP^ 〈:prop-mpl:, :mpl-art:, :art-speech∗〉 = 〈:implic-prop:, prop-implic:〉 ^ 〈:implic-prop:, prop-mpl:〉 ^ 〈:mpl-art:, :art-speech∗〉 we can apply ICS axiom 5 by making the following substitutions: s a 〈:implic-prop:, prop-implic:〉 t1 a :implic-prop: t2 a :prop-mpl: u a 〈:mpl-art:, :art-speech∗〉 We can thus conclude that :implic-prop: ∈ sources(:prop-mpl:). Similarly, by using the ‘word-search’ flow, we can conclude that :mpl-prop:∈sources(:prop-mpl:). That is, to understand whether :prop-mpl: will be stable, we need to look at processes :mpl-prop: and :implic-prop: that 52 produce data for it. We now substitute s1 and s2 in (10) by :mpl-prop: and :implic-prop:, giving: coherent(:mpl-prop:, :implic-prop:) (11) ∧ (:prop-mpl: = buffered ∨ sources(:prop-mpl:) ⊆ stable) ICS axiom 1 gives the necessary and sufficient conditions for coherence, and so the conjunct coherent(:mpl-prop:, :implic-prop:) in (11) can be rewritten to: ≡〉 dst(: mpl − prop :) = dst(: implic − prop :) ⇒p≈q ∧ ∀p,q : repr • p on : mpl − prop : ∧ q on : implic − prop : (12) The first conjunct of this, dealing with the destinations of two processes, is trivially true from the architecture, and can be eliminated. That leaves the following requirement: ≡〉 ∀p, q : repr • p on :mlp-prop: ∧ q on :implic-prop: ⇒ p ≈ q (13) Substituting the symbol ‘s’ (the phrase spoken by the user) for the variable ‘p’, and similarly putting ‘d’ in place of ‘q’, in the predicate gives: ≡〉 s on :mpl-prop: ∧ d on :implic-prop: ⇒ s ≈ d In statement (5), we have the predicates • ∀t : ran speech • s on t, and • ∀t : ran word-search • d on t, (14) 53 generated from the definition of ‘on-flow’. These are still available for use; we simply haven't been copying them through each of the steps. Since :mpl-prop: is in the speech flow, we can conclude that ‘s on :mpl-prop:’, and similarly, that ‘d on :implic-prop:’. We can therefore use modus ponens on statement (14) to eliminate the antecedent, and leave the following: ≡〉 s ≈ d (15) Although this presentation of the proof is fairly lengthy, it is quite straightforward. It demonstrates that it is possible to calculate properties of human-computer interaction in a systematic way. When expressed in a suitable form, a proof such as this can be carried out and checked using a theorem prover or proof assistant such as MURAL (Jones et al., 1991) or PVS (Owre et al., 1995). Indeed, the simplicity of the above derivation means that it could probably be discharged with little or no human intervention by most of the current generation of theorem-proving tools. This result shows that a user will not be able to articulate a phrase at the same time as they search for a different result on the display, since the two representations needed as input to the prop-mpl transformation are not the same. Thus the syndetic model shows that, in order to employ the resources defined in the system model, a user of MATIS may have to ‘interrupt’ a spoken request in order to locate a value for deictic reference. This need to switch processing mode will be distracting. In fact, the configuration for deixis requires that the user has a morphonolexical representation of the search target, and as the user is already articulating the ‘this ...’ part of the command, they will probably find it easier just to continue speaking the whole command, rather than switch 54 modality. If the system also requires selection to occur within some temporal window around a deictic utterance (Nigay & Coutaz, 1995), the user may not be able to carry out the context switching and location of an appropriate value in time. Mathematical techniques exist that would allow the explicit representation and analysis of such constraints. Let us summarise the process that lead to this result. Starting with a model of a specific interface, and a general model of a cognitive architecture, we set out to explore a conjecture that the concurrent use of multiple data streams required to achieve deictic reference would place a strong requirement on the user. Reasoning within the syndetic model, we concluded that deictic reference required that the representations being processed on two data streams would need to be coherent; interpreting this result in the context of the models lead us to claim that users would find deictic reference difficult. In reviewing this process it is important to note the role of the mathematics. Describing MATIS and ICS as a set of axioms did not in itself lead to the conjecture about usability, or the subsequent proof. Nor would one expect it to. The use of mathematics here in HCI is no different from its use in any other scientific discipline; it is a tool for representing models of the world and for manipulating those models to test conjectures and carry out calculation. However, this is not to say that the mathematics played no role in discovering the result. The mathematical model makes the role of data streams explicit, and by providing a concise vocabulary for describing the properties and behaviour of these streams, there is a sense in which the formulae afford exploration of properties related to stream-based processing. In this sense the mathematical representation enables discovery of these processes in the same way that 55 powerful and expressive bodies of mathematical theory empower physicists to calculate properties of electromagnetic fields or quantum states. Indeed, the successful development of theoretical models to explain and predict the results of experiments was in part due to the existence of mathematics, such as vector spaces, operators, and differential equations in which the observations could be expressed concisely and clearly. More recently, aspects of computing such as concurrency theory have benefited from the existence of simple mathematical theories such as CCS (Milner, 1989) that allowed the construction and manipulation of models that captured what otherwise are apparently complex interactions. Identifying potential problems is only one aspect of design; the dual issue is how to address a problem once identified. Syndetic models are important here, because they make explicit both the chain of reasoning that leads to problem identification, and the fundamental principles or assumptions on which this chain is grounded. In contrast, purely empirical approached to evaluation can identify that a problem exists, and may localise the context in which it occurs, but without an explicit theory base they lack authority to state the cause of the problem, and consequently do not in themselves provide help in identifying solutions. In the case of MATIS, the problem that we have identified is that if the user is to employ deixis, both the articulated phrase and the target object for gesture must have coherent propositional representations (see lines 14 and 15 of the argument). Now, we know that people are able to employ deixis, indeed, it is because it is such an intrinsic part of human-human communication that interface designers are interested in recruiting it for human-computer interaction. 56 So what has gone wrong? The requirement on coherence of representation follows from the rule that data streams contributing to a processing task be coherent, and therefore stable (this was followed through in lines 7-12). So the problem is that the interface is requiring the parallel use of two data streams in a fundamentally incompatible way. We can trace this back to line 5, where it was required that both speech and word-search are part of the configuration. This requirement, and the associated conditions that ‘s’ and ‘d’ must be on particular streams, leads to the coherence requirement. Could we find different streams that would allow the user to perform effectively the same task, but which avoid the need for coherence between ‘s’ and ‘d’? Within the approximate model of ICS that we have available here, our principle freedom in this context is in selecting flows; a more sophisticated model might encompass assumptions about record content (e.g. user training) and properties of the dynamic oscillation of control, i.e. the use of short term memory and sub-vocal rehearsal in this case (Barnard & May, 1993). As the conflict involves an exchange between the propositional and morphonolexical levels, it is useful to consider whether the processes involved can be by-passed entirely. Now, in humanhuman communication, deixis typically involves pointing at objects; “... that person, ...”, “... this button, ...”, etc. Within ICS, the recognition of (general) visual objects, and the control of limbs necessary for gesturing at objects, is devolved to processes within the VIS, OBJ, and LIM subsystems, and would be a highly proceduralised skill. We could recruit this ability, if, rather than working with purely lexical targets, the targets for deictic reference with the mouse had some spatial or visual characteristic that allowed them to be detected visually. How this might be realised in an interface is of course a matter for design creativity, but one suggestion, given the domain of MATIS (air travel), 57 would be to utilise a map, showing the location of cities served by the system. Assuming that the operator was familiar with the approximate location of cities, pointing to and selecting a city on the map could bypass the need for morphonolexical processing and thereby eliminate (or at least moderate) the conflict identified in the analysis. To conclude this case study, we note that the design issue considered here is quite fundamental and extends beyond the specific example of MATIS: under what conditions might deixis be used as a component of any interface? We have demonstrated that resolution of this issue involves system considerations such as how and when deictic reference is accepted and resolved, and user issues, for example what mental resources are needed to interact with the system via particular modalities, and how those mental resources are constrained. 7. A SYNDETIC MODEL OF GESTURE This section demonstrates that the core component of syndetic modelling (the relatively ‘fixed’ cognitive architecture) is generic, i.e. can be reused across applications. We do so by developing a syndetic model of a second interaction technique that is qualitatively different from the techniques that have been addressed by user or system models in isolation. Gestural interaction involves the use of series of hand positions or ‘poses’ to control actions within a system. To illustrate some of the human issues related to this approach, we will use an example based on the Dynamic Gesture Language (DGL) presented by Bordegoni & Hemmje (1993). An initial syndetic model of this problem was presented in (Duke, 1995b). Our analysis is not a critique; rather, we aim to show how a model of human-system interaction could be used to inform the 58 development of gestural technology, identifying potential problems with interaction techniques before they are embedded into large-scale applications. In DGL, a gesture is defined as a sequence of static postures (poses) characterised by the position and orientation of a user’s hand as measured through a data glove. The completion of a gesture is recognised by factors such as trajectory and posture. One application that could utilise this technology is an editor for 3-dimensional scenes constructed from visual objects. Four gestures that could be relevant in this context are given below: FIGURE 6 ABOUT HERE Visual feedback about gestures is provided to the user in the form of a ‘cursor’ that can either continuously model the pose of the hand or can be set to a specific shape within the gesture that is appropriate to the task the user is trying to perform. For example, a narrow pointer can be more useful than the ‘hand’ cursor for ‘picking’ objects as its rendering obscures less of the scene. Other types of feedback, for example that some object has been selected, depend on the surrounding application. The ‘system’ interactor models the cursor by a value drawn from a given type ‘Image’; all we assume about this type is that it includes a value ‘init’ representing some default cursor shape. In addition, the interactor records the history of poses received by the system through the attribute ‘history’, and provides a mapping that takes any sequence of poses to the corresponding feedback. 59 interactor Gesture-Engine attributes vis cursor : Image history : seq Pose feedback : (seq Pose) → Image Two actions are provided. The first allows the user to form a pose, while the second, ‘render’, represents the system updating the shape of the cursor to reflect the current sequence of poses. actions lim form : Pose render The axioms that govern behaviour of the gesture engine are stated below. axioms [] history = 〈 〉 ∧ cursor = init (1) Initially, the history is empty, and the cursor is set to the default initial shape. history = H⇒ [form(p)]history = H^〈p〉 (2) If the history has some value ‘H’, the effect of forming a pose ‘p’ is to make the history equal to H extended by p. [render]cursor = feedback(history) The effect of the render action is to make the cursor equal to the feedback associated with the current history of poses. (3) 60 The syndetic model is created by introducing both the user and system models into a new interactor and then defining the axioms that govern the conjoint behaviour of the two agents. Three attributes, ‘glove’, ‘goals’ and ‘interp’, are used to ‘contextualise’ the generic ICS model to the specific features of the gestural interface. The first, ‘glove’, represents the posture that the user is making with their hand at any point in time. A user’s intended behaviour is represented by ‘goals’; informally, this is the sequence of poses that the user wants to make, for example to effect a desired command. During interaction, both proprioceptive and visual feedback of pose formation will be available to the user. The latter takes the form of the cursor image; the attribute ‘interp’ is introduced to map the image of the cursor on the display to a user’s understanding of that shape as a hand pose. Here the mapping is expressed as a total function, i.e. all possible hand shapes have a unique interpretation as a pose. This is a simplifying assumption for the purposes of this paper, and could be altered in a more extensive model to accommodate ambiguity or ill-defined shapes. interactor Gesture-User Gest-Engine - include the system model ICS - and the ICS framework attributes glove : Pose goals : seq Pose interp : Hand → Pose The ‘form’ action defined in the gesture engine interactor is driven by the user’s limb subsystem. In order for the user to consciously form a pose, the 61 configuration must be set to transform a propositional representation into musculature control using the following data flow: PH =ˆ 〈:prop-obj:, :obj-lim:, :lim-hand:〉 In order for the object to limb transformation to control movement of the hand, it will also need access to visually derived data concerning the current cursor (hand) orientation, and proprioceptive feedback from the body-state system concerning the current posture of the user’s own hand. These data will be carried on the following flows: VO =ˆ 〈∗vis-obj:, :obj-lim:〉 BH =ˆ 〈∗bs-lim:, :lim-hand∗〉 Axiom 1 describes the constraint on pose formation. It requires that the data flows given above exist in the system, and that a representation of the current cursor shape is available on the display. Axiom 2 states that a user works sequentially through the sequence of poses that make up their current goal. For brevity, we write ‘hand’ for ‘interp(cursor)’. Axioms per(form(p)) ⇒ p on-flow PH ∧ hand on-flow V O (1) ∧ glove on-flow B H ∧ buffered = :prop-obj: In order to form a pose ‘p’, the cognitive configuration must include flows for processing gesture formation and posture feedback (visual and proprioceptive). The user will also need to be buffering (thinking about) the pose they are intending to construct. goals = 〈p〉 ^ G ⇒ [form(p)]glove = p ∧ goals = G (2) 62 If the user has a sequence of goals that begins with a pose ‘p’, then the effect of forming the pose ‘p’ is that the user’s hand/dataglove is in that pose, and the goals become the remaining poses. As in the case of MATIS, a syndetic model can be used to test conjectures about how users might interact with a system. In the gesture user interactor, pose formation is permitted explicitly when a representation of some pose is available on a visually-derived stream. However, the underlying cognitive architecture imposes implicit constraints on the processing of information. In this particular case, we wish to explore whether or not the representation shown on the display (cursor) needs to be coherent with the pose that is to be formed. per(form(p)) (1) The conditions under which a pose can (normally) be formed are set out in axiom 1 of the Gesture-User interactor, so the first step is a simple replacement: ≡〉 p on-flow PH ∧ hand on-flow V O ∧ glove on-flow BH ∧ buffered = :prop-obj: (2) As we're concerned with the requirement for coherence between the intended pose and the display, we focus in on the streams involving these representations, and ignore the proprioceptive stream. ≡〉 p on-flow PH ∧ hand on-flow V O We now expand definitions, here replacing ‘on-flow’ with its defining text. (3) 63 ≡〉 PH ∈ flows ∧ ∀t1 : ran PH • p on t1 (4) ∧ VO ∈ flows ∧ ∀t2 : ran V O • hand on t2 The universal quantifiers (∀) can be eliminated by replacing the bound variables (t1 and t2) with specific transformations from the streams; specifically t1 is replaced by :prop-obj:, and t2 by ∗vis-obj:. ≡〉 PH ∈ flows ∧ p on :prop-obj: (5) ∧ VO ∈ flows ∧ hand on ∗vis-obj: Axiom 4 of ICS requires that any transformation that is in a flow must also be part of the current configuration. We now instantiate this axiom, with the transformation :obj-lim: replacing the variable ‘t’ in the axiom, and simplify the result by removing terms that will play no direct role in the subsequent parts of the calculation. ≡〉 :obj-lim: ∈ config ∧ p on :prop-obj: ∧ hand on ∗vis-obj: (6) We apply axiom 3 of ICS to expand on the meaning of a transformation being part of a configuration. ≡〉 :obj-lim: ∈ stable ∧ src(:obj-lim:) ∉ {art, lim} ⇒ ∃s : tr • :obj-lim: ∈ sources(s) The focus of the proof now moves to the requirement that the data stream from :obj-lim: be stable, so we eliminate the other parts of the problem at this point. ≡〉 :obj-lim: ∈ stable (8) (7) 64 Necessary and sufficient conditions for stream stability are given in axiom 2 of the ICS interactor, which is substituted in at this point, with :obj-lim: replacing the variable ‘t’. ≡〉 ∀s1, s2 : sources(:obj-lim:) • coherent(s1, s2) ∧ (buffered = :obj-lim: ∨ sources(:obj-lim:) ⊆ stable) (9) The sources of :obj-lim: are defined by the architecture, so we choose the two sources that are significant in this problem (from the prop and vis subsystems) and use these to eliminate the universal quantifier by substituting for the bound variables s1 and s2 ≡〉 coherent(:prop-obj:, ∗vis-obj:) ∧ (buffered = :obj-lim: ∨ sources(:obj-lim:) ⊆ stable) (10) We again narrow the focus to a specific part of the requirement above, that the two streams (from prop and vis) to obj be coherent. ≡〉 coherent(:prop-obj:, ∗vis-obj:) (11) The definition for stream coherence is given by axiom 1 of ICS, which we use to expand the previous line, substituting :prop-obj: for t1 and ∗vis-obj: for t2. ≡〉 dest(:prop-obj:) = dest(∗vis-obj:) ∧ ∀r, q : repr • r on :prop-obj: ∧ hand on ∗vis-obj: ⇒ r ≈ q The requirement that the two streams have the same destination is trivially satisfied and can be eliminated. Earlier (step 4) we (12) 65 established that ‘p’ was on flow :prop-obj: and ‘hand’ was on the flow ∗vis-obj:, so we can use these constants to eliminate the remaining universal quantifier by substituting p for r and hand for q. ≡〉 p on :prop-obj: ∧ hand on ∗vis-obj: ⇒ r ≈ q (13) Using the fact that ‘p’ is available on flow :prop-obj: and ‘hand’ is on the flow ∗vis-obj: again, we can satisfy the antecedent, leaving the conclusion: ≡〉 p ≈ hand (14) Thus in order to form a pose, the information about hand position interpreted from the display must be coherent with the pose that is to be formed. If the two data flows become unsynchronised, the result is that the :obj-lim: process cannot remain within the configuration while processing both visual and propositional input. It will disengage from processing one or other of the streams, leading to a breakdown of processing. This situation is similar to the problems that arise when the sound track of a film becomes out of step with the image, and can be highly disruptive. In terms of the gesture engine, it is therefore important that (1) the rate of gesture formation is commensurate with that of the rendering action that updates the display, and (2) that the user is able to interpret the rendering of the hand as a sufficiently accurate model of their actual hand position. The required bounds of accuracy are properly a subject for experimental evaluation; the value of the syndetic model here is that it helps to establish a set of requirements for designing a suitable experiment, and explains why those requirements exist in the first place. 66 In the step from line 2 to 3 of the argument, we deliberately focused attention away from proprioceptive feedback via the stream BH . It is straightforward to see that a similar argument holds if we consider the streams PH and BH ; both of these streams involve the process :lim-hand∗, and we would find, following the argument, that there is a requirement for coherence between the ‘glove’ state perceived by the user, and the pose which they are intending to form. Limb control is a highly proceduralised activity, which usually involves little if any conscious control (i.e. focal awareness), and so one would normally expect little difficulty with this coherence requirement. However, for the first time user of a data glove, the physical sensation of gesturing while wearing a (comparatively) bulky unit may affect the quality of the proprioceptive stream originating in the body-state subsystem. If the quality of this stream degrades, i.e. it becomes unstable, then from axiom 7 of ICS we know that the transformation process (∗bs-lim:) will either try to find a stable source of data, decouple from the unstable source, or enter buffered mode. In the context of this model, only the latter option is available, and it will attempt to operate on an extended representation of posture by accessing its image record, i.e. by buffering. Dually, from ICS axiom 2, we also know that a process that needs to appear in a configuration must either operate on stable input streams, or itself operate in buffered mode. Consequently, :lim-hand∗, which needs the data stream from ∗bs-lim:, will also attempt to enter buffered mode. The architecture only allows a single process to be buffered within a configuration, and the consequence is that utilisation of the buffer will probably oscillate between these processes. This is unfortunate for the would-be user, as axiom 1 of the GestureUser model states that the :prop-obj: transformation will need to be buffered, as users will need to focus on the poses that they are forming. Consequently, for 67 novice users of the interface, the need to concentrate on forming and controlling poses at the motor level will affect their ability to think and reason about the tasks that they are performing within the interface. However, over time, experience with the input device will lead to proceduralisation of the limb control, and hence the difficulty will be alleviated (see Barnard & May, 1993, for an account of how cognitive activity can be scoped by experience). This analysis raises two general questions about the use of syndetic models. First, the analysis carried out here bears striking similarities with that done for MATIS, although the two technologies are different. Both involved the potential for breakdown of processing caused by incompatible representations being carried on data streams that are necessary to a particular cognitive task. One might ask whether this reasoning (which is, after all, comparatively simple in hindsight) might not be more easily captured in a simple model of data streams? The answer of course is yes, given the architecture, one could derive a set of general principles regarding the specific issue of data stream utilisation. Indeed, such an approach might be one part of a useful “discount” method based on the approach. But it is important to understand that being able to abstract such a theory is a consequence of having a general purpose model in the first place. To give an example from mathematics, one can give a very simple rule for computing the definite integral of a polynomial function, once one has understood what the concept of an integral is in the first place; the value of theories such as the Reimann or Lebesque integral is that they also allow you to apply the process to problems that are not just simple polynomials (Voxman & Goetschel, 1981). To return to the context of HCI, while we could write a set of rules or principles for dealing with data streams in ICS, in practice the rules that 68 govern data stream operation are also bound into broader processing concerns such as record contents, timing, etc. The aim of syndetic modelling and the formal description of ICS is to provide a framework in which the principles and theory underlying these phenomena can be described, examined, and then — and only then — lifted into a set of discount methods and approximations for general use. The second question relates to how much of the work of the analysis was done by the model, and how much was driven by general knowledge of ICS, and insight into human information processing. Obviously, any model will only provide insight up to the level of detail that it addresses. In this paper we have given an approximate model of ICS, focusing on aspects such as data stream stability and coherence, and ignoring other issues such as record contents and the structure of representations. Even given this restriction, we were still able to use the model to demonstrate requirements on coherence in two quite different examples. Once that requirement was identified from the model, it was then a question of reflecting on the meaning of this requirement in terms of the problem domain and our broader knowledge about ICS that had not been encoded in the model. This in no way diminishes the value of the model; after all, exactly the same process occurs in any science when one builds models — the role of the model is to answer specific questions, or to explore particular aspects of a problem. Once that has been done, progress in understanding how to deal with an issue, or how to enrich a theory, takes place outside of the model. For example, an electronics engineer who uses a circuit model to explore a particular design does not expect the circuit model to tell her directly how to improve that design; hopefully however it will help her to identify the location 69 (and cause) of any design problems — how those are then addressed depends on her background knowledge of theory, and if the result is a significant change in the knowledge that lead to the model, it may well be incorporated as part of the modelling technique itself. 8. PROSPECTS AND CONCLUSIONS The previous section showed that detailed and sometimes unexpected constraints on user performance can be deduced from this approach. In contrast system models can only describe what the system should do. Any claims or assumptions made about user performance must be validated separately, either through appeal to cognitive theory or directly, through prototyping and experimental evaluation. User models likewise require assumptions about system behaviour. This is a key limitation for both. By expressing user and system constraints in equal terms, syndetic models allow direct (and formal) comparison between the capabilities and limitations of the two parties in an interactive system. As the underlying cognitive and system theories are built into the model, the reason why some problem exists, such as difficulty in expressing deictic queries, can be found. Alternative design solutions can then be driven by theoretical insight, rather than through a potentially expensive ‘generate and test’ cycle of ad-hoc changes. In the case of a system that builds on the technology of MATIS, location of cities might be supported better through a graphical display, for example a map in which spatial location may reduce the need to invoke :prop-mpl: transformations. One of the key advantages of syndetic modelling is that it operates with abstract axiomatic specifications of user and system. It therefore avoids a number of 70 difficulties associated with models that require detailed specifications of user and system. Such detail was required with the model of Kieras and Polson (Kieras & Polson, 1985) discussed earlier. Over time, this kind of framework has undergone significant development, culminating recently in the development of the EPIC architecture (Kieras & Meyer, 1998; Meyer & Kieras, 1997). Like the approach adopted here, this architecture also has data flow properties which encompass multiple sensory and effector processors, and with a single cognitive processor. Its extended capability now supports the simulation of a far more extensive range of behaviours than its precursors, including multiple task performance and working memory phenomena (Kieras, et al., 1998). This and other more recent approaches to integrating user and system representations also retain a commitment to detailed modelling. So for example, the work of Moher and Dirda (Moher et al., 1996) uses coloured Petri nets as a unifying representational formalism for device models, and users' mental models and task plans. The resulting models of this latter type are again at a lower-level of detail than ours, and seem to derive more from the tradition of programmable user models (see for example Blandford & Young, 1995) than from a specific cognitive theory. By avoiding detail, syndetic modelling should support design reasoning in advance of highly specific interface commitments. In the absence of a comprehensive model that accounts for the behaviour of both user and device components of a system, questions such as those given above can only be addressed by obtaining a separate user analysis of the problem domain. The problem then is to connect this insight into the overall system perspective within some integrational framework. For example, design issues and options might be captured within a notation for design rationale, for 71 example QOC (MacLean et al., 1991). User and system assessments of these options can then be expressed as criteria for evaluating options (Bellotti, 1993). However, this approach is critically dependent upon the success of the intermediate representation. Unless this ‘mediating’ expression is at least as rich as the modelling formalism, the process of translation both from the system modellers to the user modellers, and back again, will lose information. Our experience with representing user and system analyses within QOC, as part of the experiment described in (Bellotti et al., 1995), was that the organisation and approach imposed by a mediating representation like QOC was difficult to reconcile with the detailed constraints originating from modelling analyses. We argue that overcoming this problem requires an approach that brings user and system representations into direct contact. Syndetic modelling effects this contact by expressing the resources and constraints that describe cognitive behaviour in the same form as we express the observables that characterise or prescribe system behaviour. From an observational viewpoint, both the user and the design artefact are just systems of observables whose behaviour is amenable to description within a suitable mathematical framework. Syndesis begins to address two other significant problems in multi-disciplinary HCI. First, if user and system models are developed separately and only come into contact through their results, it is not possible to see directly whether the assumptions of each model concur, or whether changing aspects of one model would have consequences for the other. This means that potentially expensive effort may be spent in reasoning about (say) a user’s cognitive demands when in fact system constraints may support claims that these demands will not be imposed. Second, the separation of modelling theory from the modelling 72 analysis actually recorded in a mediating representation makes it difficult to locate or investigate the theoretical reason or context behind a specific problem or recommendation. This is particularly acute if the insight derives from the interplay between user and system constraints. This means that re-design cannot readily draw on the modelling theory, but instead relies principally on the skills of the designer. While the element of craft skill is an important part of innovation, the direct expression of user and system requirements in a syndetic model means that application of that skill can be directed towards the specific problem that underlies the design issue. Like syndetic modelling, Interaction Framework (Blandford, Harrison & Barnard, 1995) also works at a high level of abstraction. Both approaches can capitalise upon the re-use of specific interactional principles. However, whereas Interaction Framework operates with its own event-based representation, the syndetic approach involves commitment to specific cognitive and system models. This provides another key benefit of syndetic modelling: it can cumulatively capitalise upon prior modelling work by making direct use of the apparatus of existing cognitive and system theories. The results of analysis are then expressed in terms that can be used to guide behavioural evaluation or system implementation. Of course, syndetic analyses will also be subject to the limitations of the cognitive and system models upon which they depend. Development of a formal account of ICS has reached the point where we can begin to discuss the details of data transformation, blending, and the ability of processes to extend their repertoire based on experience. Our apparently disparate scenarios have shown some fundamentally similar characteristics. By using appropriate abstractions we are able to draw out the 73 fundamental properties of each system, discarding superficial differences arising from the technologies involved. The structures and properties that we were able to model were clearly dependent on the notation used to express the models. Modal action logic was chosen for this paper because it is a simple, concise means of describing states and actions on states. However, syndetic modelling is not dependent on the use of modal action logic, or on expressing ICS in terms of states and actions. We have begun to investigate how techniques for describing real-time systems, for example the duration calculus (Chaochen, 1993) can be used to model and reason about temporal properties of ICS and interactive systems. We are also beginning to explore how stochastic modelling techniques might allow the development and exploration of significantly broader conjectures concerning cognitive performance within interaction. In summary, syndetic modelling expresses models of systems, users, and their interaction, within a common mathematical framework. Principles are expressed in an abstract form and, by avoiding detail, can be used directly to support design reasoning without requiring simulation of cognitive mechanisms or a fully elaborated system specification. The value of abstraction has been illustrated here by demonstrating how general principles, developed to deal with one design context can find re-use to support reasoning about another design with markedly different surface and domain features. In the longer term, we expect to investigate how reasoning about syndetic models can best be supported through software tools for developing, maintaining, exercising and reasoning about a mathematical text, and how other forms of presentation such as graphics or animation can be used to capture and communicate the insight obtained from these models. 74 NOTES Acknowledgements. We thank A.E. Blandford, T. Green and M.D. Harrison for their helpful comments. Particular thanks are due to the ‘owners’ of the systems used as the source of scenarios in this paper, both for making their systems available to the Amodeus project, and for their time in providing feedback on modelling work that formed the starting point for the material reported here. We would also like to thank the anonymous referees, whose detailed comments on the first draft were of significant help in improving the focus and clarity of the paper. Support. This work was carried out as part of the Amodeus-2 project, ESPRIT Basic Research Action 7040 funded by the Commission of the European Communities. Technical reports from the Amodeus project are available via the World Wide http://www.mrc-cbu.cam.ac.uk/amodeus/ Web at URL 75 Authors’ Addresses. David Duke: Dept of Computer Science, University of York, Heslington, York, YO1 5DD, U.K. Email: duke@cs.york.ac.uk Philip Barnard: MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge, CB2 2EF, U.K. Email: philb@mrc-cbu.cam.ac.uk David Duce: Rutherford Appleton Laboratory, Chilton, Didcot, OX11 0QX, U.K. Email: d.a.duce@rl.ac.uk Jon May: Dept of Psychology, University of Sheffield, Western Bank, Sheffield,S10 2TP, U.K. Email: jon.may@sheffield.ac.uk 76 REFERENCES Barnard, P. and May, J. (1993). Cognitive modelling for user requirements. In P.F. Byerley, P.J. Barnard & J. May (Eds.), Computers, Communication and Usability: Design Issues, Research and Methods for Integrated Services (pp. 101-145). Amsterdam: Elsevier. Barnard, P. & May, J. (1995). Interactions with advanced graphical interfaces and the deployment of latent human knowledge. In F. Paternó (Ed.), Eurographics Workshop on Design, Specification and Verification of Interactive Systems (pp. 15-49). Berlin: Springer-Verlag. Barnard, P. and May, J. (in press). Representing cognitive activity in complex tasks. Human Computer Interaction. Bellotti, V. (1993). Integrating theoreticians' and practitioners' perspectives with design rationale. Proceedings of the INTERCHI'93 Conference on Human Factors in Computing Systems, 101-106. AddisonWesley. Bellotti, V., Blandford, A., Duke, D., Maclean, A., May, J. & Nigay, L. (1996). Interpersonal access control in computer-mediated communications: A systematic analysis of the design space. Human Computer Interaction, 6, 357-432. Bellotti, V., Buckingham Shum, S., MacLean, A. & Hammond, N. (1995). Multidisciplinary modelling in HCI design ... in theory and in 77 practice. Proceedings of the CHI '95 Conference on Human Factors in Computer Systems, 146-153. New York: ACM. Blandford, A. & Duke, D. (1997). Integrating user and computer system concerns in the design of interactive systems. International Journal of Human-Computer Studies, 46, 653-679. Blandford, A., Harrison, M. & Barnard, P. (1995). Using interaction framework to guide the design of interactive systems. International Journal of Human-Computer Studies, 43, 101-130. Blandford, A. & Young, R. (1995). Separating user and device descriptions for modelling interactive problem solving. In K. Nordby, P. Helmersen, D.J. Gilmore & S. Arnsen (Eds.), Human-Computer Interaction: INTERACT'95 (pp. 91-96). London: Chapman and Hall. Bordegoni, M. & Hemmje, M. (1993). A dynamic gesture language and graphical feedback for interaction in a 3d user interface. Computer Graphics Forum, 12(3), 1-11. Buckingham Shum, S., Blandford, A., Duke, D., Good, J., May, J., Paternó, F. & Young, R. (1996). Multidisciplinary modelling for user-centred system design: An air-traffic control case study. Proceedings of HCI’96: 11th British Computer Society Conference on HumanComputer Interaction, 201-219. London: Springer-Verlag. Chaochen, Z. (1993). Duration calculi: An overview. In D. Bjørner, M. Broy & I. Pottosin (Eds.), Formal Techniques in Programming and Their 78 Applications, volume 735 of Lecture Notes in Computer Science (pp. 256-266). Springer-Verlag. Dix, A. (1991). Formal Methods for Interactive Systems. Academic Press. Duke, D. (1995a). Reasoning about gestural interaction. Computer Graphics Forum, 14(3), 55-66. Duke, D. (1995b). Time and synchronisation in PREMO: A formal specification of the NNI proposal. Technical Report OME-116, ISO/IEC JTC1 SC24/WG6. ftp://ftp.cwi.nl/premo/RapporteurGroup/Miscellaneous/ OME-116.ps.gz Duke, D. & Harrison, M. (1993). Abstract Interaction Objects. Computer Graphics Forum, 12(3), C-25 - C-36. Duke, D. & Harrison, M. (1994a). Matis: A case study in formal specification. Technical Report SM/WP17, ESPRIT BRA 7040 Amodeus-2. Available via http://www.mrc-cbu.cam.ac.uk/amodeus/. Duke, D. & Harrison, M. (1994b). A theory of presentations. In M. Naftalin, T. Denvir & M. Bertran (Eds.), FME'94: Industrial Benefit of Formal Methods, volume 873 of Lecture Notes in Computer Science (pp. 271-290). Berlin: Springer-Verlag. Duke, D. & Harrison, M. (1995a). From formal models to formal methods. In R.N. Taylor & J. Coutaz (Eds.), Software Engineering and HumanComputer Interaction: ICSE’94 Workshop on SE-HCI: Joint 79 Research Issues, volume 896 of Lecture Notes in Computer Science (pp. 159-173). Springer-Verlag. Duke, D. & Harrison, M. (1995b). Interaction and task requirements. In P. Palanque & R. Bastide (Eds.), DSV-IS'95: Eurographics Workshop on Design, Specification and Verification of Interactive Systems (pp. 54-75). Wien: Springer-Verlag. Goldsack, S. (1988). Specification of an operating system kernel: FOREST and VDM compared. In R. Bloomfield, L. Marshall & R. Jones (Eds.), VDM'88: VDM - The Way Ahead, volume 328 of Lecture Notes in Computer Science (pp. 88-100). Springer-Verlag. Harrison, M. & Torres, J. (Eds.). (1997). Design, Specification and Verification of Interactive Systems'97. Wien: Springer-Verlag. Hoare, C. (1996). How did software get so reliable without proof? In M.-C. Gaudel & J. Woodcock (Eds.), FME'96: Industrial Benefit and Advances in Formal Methods, volume 1051 of Lecture Notes in Computer Science (pp. 1-17). Springer-Verlag. Jones, C., Jones, K., Lindsay, P. & Moore, R. (1991). MURAL: A Formal Development Support System. Springer-Verlag. Kent, S., Maibaum, T. & Quirk, W. (1993). Formally specifying temporal constraints and error recovery. Proceedings of the IEEE International Workshop on Requirements Engineering, (pp. 208-215). IEEE Press. 80 Kieras, D. & Meyer, D. (1998). An overview of the epic architecture for cognition and performance with application to human-computer interaction. Human Computer Interaction. In press. Kieras, D., Meyer, D., Mueller, S. & Seymour, T. (1998). Insights into working memory from the perspective of the EPIC architecture for modelling skilled perceptuo-motor and cognitive human performance. In A. Miyake & P. Shah (Eds.), Models of Working Memory: Mechanisms of Active Maintenance and Executive Control. Cambridge University Press. In press. Kieras, D. and Polson, P. (1985). An approach to the formal analysis of user complexity. International Journal of Man-Machine Studies, 22, 365394. Lemmon, E. (1993). Beginning Logic (3rd ed.). Chapman and Hall. MacLean, A., Young, R., Bellotti, V. & Moran, T. (1991). Questions, options, and criteria: Elements of design space analysis. HumanComputer Interaction, 6(3&4), 201-250. May, J. & Barnard, P. (1995). The case for supportive evaluation during design. Interacting with Computers, 7, 115-144. May, J., Barnard, P. & Blandford, A. (1993). Using structural descriptions of interfaces to automate the modelling of user cognition. User Modelling and User Adaptive Interfaces, 3, 27-64. 81 May, J., Scott, S. & Barnard, P. (1995). Structuring Displays: A Psychological Guide. Eurographics Tutorial Notes PS95 TN4, ISSN 1017-4656. Geneva: European Association for Computer Graphics. Meyer, D. & Kieras, D. (1997). A computational theory of executive cognitive processes and multiple-task performance: Part 1. Psychological Review, 104, 3-65. Milner, R. (1989). Communication and Concurrency. Series in Computer Science. London: Prentice Hall International. Milner, R. (1993). Elements of interaction - 1993 Turing Award Lecture. Communications of the ACM, 36(1), 78-89. Moher, T., Dirda, V., Bastide, R. & Palanque, P. (1996). Monolingual, articulated modeling of users, devices and interfaces. DSV-IS'96: Eurographics Workshop on Design, Specification and Verification of Interactive Systems (pp. 312-329). Wien: Springer-Verlag. Nigay, L. (1994). Conception et modélisation logicielles des systèmes interactifs. Ph.D. Thèse de l'Université Joseph Fourier, Grenoble. Nigay, L. & Coutaz, J. (1995). A generic platform for addressing the multimodal challenge. Proceedings of the CHI '95 Conference on Human Factors in Computing Systems, 98-105. Addison-Wesley. Owre, S., Rushby, J., Shankar, N. & von Henke, F. (1995). Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS. IEEE Transactions on Software Engineering, 21(2), 107125. 82 Paternó, F. & Palanque, P. (Eds.). (1997). Formal Methods in Human Computer Interaction. Springer-Verlag. Ryan, M., Fiadeiro, J. & Maibaum, T. (1991). Sharing actions and attributes in modal action logic. In T. Ito, T. & A. Meyer (Eds.), Theoretical Aspects of Computer Software, volume 526 of Lecture Notes in Computer Science (pp. 569-593). Springer-Verlag. Saiedian, H. (1996). An Invitation to Formal Methods. IEEE Computer, 29(4), 16 - 17. Spivey, J. (1992). The Z Notation: A Reference Manual (2nd ed.). London: Prentice Hall International. Teasdale, J.D. & Barnard, P. (1993). Affect, Cognition and Change: Remodelling Depressive Thought. Lawrence Erlbaum Associates. Voxman, W. & Goetschel, Jr., R. (1981). Advanced Calculus: An Introduction to Modern Analysis. Marcel Dekker Inc. Wharton, C., Rieman, J., Lewis, C. & Polson, P. (1994). The Cognitive Walkthrough method: A practitioner’s guide. In J. Nielson & R. Mack (Eds.), Usability Inspection Methods (pp. 105-140). Wiley. Young, R. & Abowd, G. (1994). Multi-perspective modelling of interface design issues: Undo in a collaborative editor. In G. Cockton, S.W. Draper & G.R.S. Weir (Eds.), People and Computers IX: Proceedings of HCI'94 (pp. 249-260). Cambridge, England: Cambridge University Press. 83 APPENDIX GLOSSARY OF NOTATION The data types and notation used in this paper are based on the mathematical notation of Z (Spivey, 1992) embedded within the structured ‘theory’ presentation of modal action logic (Goldsack, 1988; Ryan, Fiadeiro & Maibaum, 1991). We assume the existence of the following basic data types: N {Natural numbers: {0, 1, 2, ...} Z {Integers: {..., -1, 0, 1, ...} B {Boolean values: {true, false} Logic Let P and Q be predicates, and x a variable. P∧Q Both P and Q hold P∨Q Either P or Q (or both) hold P⇒Q P implies Q: If P holds, so must Q P ⇔ Q P if and only if Q x : S • P For all values of x in S, P holds x : S • P There exists a value of x in S, for which P holds Modal Action Logic MAL extends classical first order logic with action expressions and a modal operator. Let P be a predicate, and let A be an action: 84 [A] P P must hold after performance of A per(A) Permission: the action may occur obl(A) Obligation: the action must occur Sets Let S and T be sets, P a predicate, E an expression, and ti terms. Let xi be variables, and let D be a declaration, e.g. x1:S, x2:T. ∅ Empty set {t 1 ,...,t n } Set enumeration: the set of t1 through to tn PS Power set: the set of all subsets of S E∈S Membership: the value E is a member of the set S {D | P • E} Comprehension: the set of all values of E, such that P holds given D {D | P} The set of values for D such that P holds S∩T Set intersection: the set of values in both S and T S∪T Set union: the set of values in either S or T S⊆T Containment: S is a subset of T SxT Cartesian product: the set of pairs (x,y) s.t. x ∈ S and y ∈ T Functions and Relations Functions and relations are viewed as sets of pairs; a function is then a relation where every element in the domain of the relation is paired with exactly one 85 element in the range of the relation. Let S and T be sets, and F and G be relations or functions: {} Empty function S→T The set of total functions from S to T; for f ∈ S→ T, dom f = S S→ + T The set of partial functions from S to T; for f ∈ S→ + T, dom f ⊆ S S↔T The set of relations between S and T {x a y} The function that maps x to y dom F Domain: the set {x | ∃y • (x,y) ∈ F} ran F Range: the set {y | ∃x • (x,y) ∈ F} Sequences A sequence is a function whose domain is either empty (the null sequence) or is a set of contiguous natural numbers, e.g. {1,2,...,n} for some n ∈ N . This means that operators like ‘ran’ can also be applied to sequences. Let X be a set, and S and T sequences: 〈〉 Empty (null) sequence seqX The set of sequences whose range is a subset of X 〈x, y, ..., z〉 Sequence enumeration: the sequence containing x, y, ..., z in that order s^t Concatenation : 〈s1, .., sn〉^〈t1 ,.., tm〉 = 〈s1, .., sn , t1 ,.., tm〉 86 Miscellaneous P ≡〉 Q From P it is possible to prove/derive Q x =ˆ E The name x is defined to be the expression E p The perceivable component of attribute p p in q The presentation of p is within that of q 87 FIGURE CAPTIONS FIGURE 1 Three Levels of Description - FIGURE 2 Deictic blending of speech and gesture in MATIS FIGURE 3 Generic structure of an ICS sub-system FIGURE 4 ICS configured for locating visual object FIGURE 5 Brief descriptions of representations processed by ICS Subsystems FIGURE 6 The Dynamic Gesture Language (Bordegoni and Hemmje, 1993) Fig 1 88 89 Fig 2 90 Fig 3 91 Fig 4 92 Sensory subsystems Meaning subsystems VIS PROP visual: hue, contour etc. from the eyes AC acoustic: pitch, rhythm relationships IMPLIC etc. from the ears BS propositional: semantic implicational: holistic meaning body-state: proprioceptive feedback Structural subsystems Effector subsystems OBJ ART object: mental imagery, shapes, etc. MPL morphonolexical: words, lexical forms Fig 5 articulatory: subvocal rehearsal and speech LIM limb: motion of limbs, eyes, etc 93 Navigation This allows the user to change their position within the scene Picking The 'pistol-like' pose is used to select some object Gripping A group of objects can be grabbed and moved with the 'fist' pose Exit Finish a session with the editor Fig 6