Visual Correlates of Disfluency in Swedish
Transcription
Visual Correlates of Disfluency in Swedish
Department of Linguistics and Philology Språkteknologiprogrammet (Language Technology Programme) Master’s thesis in Computational Linguistics May 11, 2006 Visual Correlates of Disfluency in Swedish Mojgan Seraji Supervisor: Adj. Professor Bertil Lyberg, Linköping University Abstract Speech is usually accompanied by different gestures such as head movements and movements of the eyebrows. These gestures seem to have a significant meaning in human-human and human-machine communication. Such information is a key component for the progress in synthesizing naturally looking talking head. This thesis examines the movements of the head and eyebrows to disfluency and how these movements or gestures are related to speech signals in spontaneous speech within a communicative situation. The utilized method in this experiment is “Wavesurfer” for speech and movement analysis. To analyze the correlation of disfluency and head movements, this experiment was done under natural circumstances. I recorded several minutes of communication of two subjects; one man and one woman, and let them talk about an optional topic. Although strength and direction of the movements of the head and eyebrows vary widely from one speaker to another the results turned out that both speakers had a correlation of disfluency to the movements of the eyebrows. Disfluency had also an indirect influence to the head movements. Contents Abstract ii Contents iii List of Figures v List of Tables vii Preface viii 1 Introduction 1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 Background 2.1 Human Speech . . . . . . . . . . . . 2.2 Cognitive Functions . . . . . . . . . . 2.2.1 Broca’s Area . . . . . . . . . 2.2.2 M1-Mouth Area . . . . . . . 2.2.3 Wernicke’s Area . . . . . . . 2.2.4 Auditory Cortex . . . . . . . 2.2.5 Visual Cortex . . . . . . . . . 2.3 Spontaneous Speech . . . . . . . . . 2.4 Conversational Interaction . . . . . . 2.5 Disfluency . . . . . . . . . . . . . . . 2.5.1 Tongue Tips and Slips . . . . 2.6 Research Tools in Speech Science . . 2.6.1 The Narrowband Spectrogram 2.6.2 The Wideband Spectrogram . 2.7 Language and Machines . . . . . . . 2.7.1 Talking Heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 4 4 5 5 5 5 5 6 7 7 7 7 8 9 Experimental Design 3.1 Recording . . . . . 3.2 Measurements . . . 3.3 Apparatus . . . . . 3.3.1 Wavesurfer 3.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 12 12 12 13 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . 4 Results of the Evaluation 4.1 Unfilled Pauses (UPs) . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 The Correlation of Unfilled Pauses to the Head Movements . 4.1.2 The Correlation of Unfilled Pauses to the Movements of the Eyebrows . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Filled Pauses (FPs) . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Correlation of Filled Pauses to the Head Movements . . 4.2.2 The Correlation of Filled Pauses to the Movements of the Eyebrows . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Prolongations (PRs) . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 The Correlation of Prolongations to the Head Movements . 4.3.2 The Correlation of Prolongations to the Movements of the Eyebrows . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Truncations (TRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 The Correlation of Truncations to the Head Movements . . . 4.4.2 The Correlation of Truncations to the Movements of the Eyebrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusions and Future Development . . . . . . . . . . . . . . . . 15 15 18 18 20 20 20 23 23 23 27 27 27 30 Bibliography 32 A Appendix 1 A.1 Subject 1 . . A.1.1 file 1 A.1.2 file 2 A.1.3 file 3 A.1.4 file 4 A.1.5 file 5 A.2 Subject 2 . . A.2.1 file 1 A.2.2 file 2 A.2.3 file 3 A.2.4 file 4 A.2.5 file 5 A.2.6 file 6 A.2.7 file 7 A.3 Subject 3 . . A.3.1 file 1 A.3.2 file 2 A.3.3 file 3 A.3.4 file 4 A.3.5 file 5 33 33 33 33 34 34 35 35 35 35 36 36 36 37 37 37 37 38 38 38 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B Appendix 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 iv List of Figures 2.1 2.2 2.3 2.4 Functions of the brain, adopted from the final project of the "Biotechnology and Its Social Impact (MOL427/WWS462)" . . . . . . . . . . . . Narrowband spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . Wideband spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . Synface, adopted from the SYNFACE project at kth . . . . . . . . . . . 4 8 8 9 3.1 3.2 Spectacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wavesurfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 13 Unfilled pause and head movements in three dimensions, the scale of the y-, z- and x-axis is counted in mm . . . . . . . . . . . . . . . . . . . . 4.2 The movements of the eyebrows before an unfilld pause . . . . . . . . . 4.3 Unfilled pause after the movements of the eyebrows . . . . . . . . . . . 4.4 Filled pause and head movements in three dimensions . . . . . . . . . . 4.5 Filled pause and the movements of the eyebrows . . . . . . . . . . . . 4.6 Prolongation before head movements . . . . . . . . . . . . . . . . . . . 4.7 Head movements after a prolongation . . . . . . . . . . . . . . . . . . 4.8 Prolongation and the movements of the eyebrows . . . . . . . . . . . . 4.9 Truncation and head movements in three dimensions . . . . . . . . . . 4.10 The movements of the eyebrows before a truncation . . . . . . . . . . . 4.11 Truncation after the movements of the eyebrows . . . . . . . . . . . . . 16 17 19 21 22 24 25 26 28 29 30 4.1 B.1 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... på arbetstid UP Å det var ju inte ...” . . . . . . . B.2 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... mitt öra ju. (UP) Och sen så vaknade jag ...” . . B.3 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... på hela kudden (UP) Och då ...” . . . . . . . . . B.4 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... så fick jag då komma till överläkaren UP Schiratski ...” B.5 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... fem doktorandpoäng (UP) Och då tänkte jag ...” B.6 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... tju ett nu (UP) eh ...” . . . . . . . . . . . . . . . B.7 Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... vad man vill så (UP) så får vi se vad det blir ...” . B.8 Unfilled pause and the movements of the eyebrows, when one of the subjects uttered “... högst upp på det här UP huset och där ...” . . . . . . v 41 42 43 44 45 46 47 48 B.9 Filled pause and head movements in three dimensions, when one of the subjects uttered “... Och eh (FP) och så en dag ...” . . . . . . . . . . . . B.10 Filled pause and the movements of the eyebrows, when one of the subjects uttered “... Eh (FP) går på högskolan ...” . . . . . . . . . . . . . . B.11 Prolongations and head movements in three dimensions, when one of the subjects uttered “... Det e om (PR) ”marackesh” ...” . . . . . . . . . . . B.12 Prolongations and the movements of the eyebrows, when one of the subjects uttered “... när hon började i (PR) Malmö så ...” . . . . . . . . . . B.13 Truncations and head movements in three dimensions, when one of the subjects uttered “... den ligger lili TR lite utanför stan inte långt ...” . . . B.14 Truncations and head movements in three dimensions, when one of the subjects uttered “... nej hal TR halvtid det jobbar hon bara ...” . . . . . . B.15 Truncations and the movements of the eyebrows, when one of the subjects uttered “... den ligger lili TR lite utanför stan inte långt ...” . . . . B.16 Truncations and the movements of the eyebrows, when one of the subjects uttered “... ah nä(TR)när hon kommer hem ...” . . . . . . . . . . . B.17 Truncations and the movements of the eyebrows, when one of the subjects uttered “... är ju som kyckli(TR)kyckling i sig smakar ju heller ...” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 49 50 51 52 53 54 55 56 57 List of Tables 4.1 4.2 4.3 4.4 Utterances containing unfilled pauses Utterances containing filled pauses . . Utterances containing prolongations . Utterances containing truncations . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 20 23 27 Preface First and foremost, I would sincerely like to thank my supervisor Professor Bertil Lyberg, for his guidance, encouragement and support during this study. He has been an endless source of inspiration and his valuable suggestions and knowledge in this area have been of great help in my thesis. Also I want to express my special gratitude to his doctoral students Mustapha Skhiri and Sonia Sangari who helped me a lot in an exciting way with the interviews and the recording parts of this pilot study. viii 1 Introduction When we talk despite all ages, cultures, backgrounds we have, we usually move the head, show various facial expressions and use body language. These gestures aid the understanding of the communication and they also convey a lot of additional information, i.e. visual information about the speaker such as the speaker’s mood. These gestures, in other words, nonverbal communications are mostly related to the subject we are talking about and the situation at hand. For example, some movements such as body language facilitate turn-taking in a conversation. Hence, gestures play an important role in language production and comprehensions. (Kelly, 2001) Many facial expressions and head movements are also linked to the semantic and prosodic structure of a text. For example, a stress on a word is often accompanied by a nod of the head or rising the voice at the end of a phrase can be linked to a rise of the head, possibly combined with rising eyebrows. (Hans Peter Graf, 2002) Many researches have been done in this field to make animated talking agents (or talking head) to look more naturally in their nonverbal communication. 1.1 Aim The aim of this study is to better understand the correlation between movements of the head and eyebrows and disfluency. A secondary aim is to investigate how these movements are related to speech signals in spontaneous speech within a communicative situation. In this pilot project only two subjects have participated in the interviews. I have also used other data collections of one additional subject who had taken part in another study related to head movements. Therefore, upcoming statements in this paper is based on these three subjects (two men, one woman) I have observed. All data analysis in this research can be used in construction of animated talking agents. 1.2 Outline of the Thesis This thesis covers following chapters: chapter 1 presents a brief introduction to the concept of nonverbal communications and also introduces the purpose of this study. Chapter 2 presents the relevant background information which discuss human speech, cognitive functions, spontaneous speech, conversational interaction, disfluency, research tools in speech science, language and machines. In chapter 3, the design of this experiment follows and it introduces the interview selection, the measurements and how I utilized the apparatus “Wavesurfer”. Last chapter; chapter 4, introduces the results of the evaluation which consists of two main studies, i.e. the 1 correlation of disfluency to the head movements and the correlation of disfluency to the movements of the eyebrows. 2 2 Background 2.1 Human Speech The fundamental view of human speech is based on the concept of natural sounds. Early human used to imitate of natural sounds they heard around themselves. For example: the CAWCAW sound, which have been using in the early man’s history to refer to an object flew by. Even in all modern languages there are some words which have almost the same pronunciations as naturally occurring sounds. Some example in English can be cuckoo, splash, bang, boom, ding-dong, rattle, buzz, hiss, screech and forms such as bow-wow and the view of this type has been called as the “bow-wow theory” or the “yo-heave-ho theory” that refer to a person who is under a physical effort especially when that physical effort involve several people. (Yule, 1985) There are also other original sounds coming from natural cries of emotion, such as pain, anger, joy and expressive noises which people make of emotional reaction, such as Wow, Ugh and Oops contain sounds which are not used in their language but are counted as source sounds or natural sounds. Human accompany much of their speech with physical gestures such as pointing and raising of the arm, bend at the elbow and so on. All these have to do with how humans use language to interact with each other, socially or emotionally; how they show friendliness, or hostility, or annoyance, pain or pleasure. (Yule, 1985) There is an idea regarding the origins of the sounds of language that there is a link between physical gesture and the production of oral sounds. It seems completely reasonable that physical gestures, involving the whole body, can indicate a lot of emotional situations and intentions. Physical gestures, using body, hands, face and head, are methods or processes of nonverbal communication which still are used by modern human with developed linguistic skills. Hence, it is said that physical gestures were developed as a method for communications. On the other hand, the gestures involving mouth, the movement of tongue, lips and so on were recognized as oral gestures. The movement of the tongue (oral gesture) in a “goodbye” message and the waving of the hand or arm (physical gesture) give both a similar message. (Yule, 1985) In deed, we can use mime or specific gestures to transmit an information in order to a variety of communicative purposes. 2.2 Cognitive Functions Speech is, without doubt, one of our most important abilities. We use various features of language to produce and understand linguistic messages. The ability of using language is located in two regions of the human brain that are known to play important roles in speech production and processing: Broca’s area and Wernicke’s 3 Figure 2.1: Functions of the brain, adopted from the final project of the "Biotechnology and Its Social Impact (MOL427/WWS462)" area. These areas were identified through the study of brain damage patients in nineteenth century. Patients with damage to Broca’s area exhibit an inability to produce grammatical sentences, although they can use single words properly. Patients with damage to Wernicke’s area, in contrast, produce well-formed but meaningless sentences. (Geschwind, 1965) These areas are localized only in the left hemisphere of the brain. Brain activity is involved in hearing a word, understanding it, then saying it, and all these processes follow a definite pattern. When a word is heard and comprehended via Wernicke’s area, this signal is transferred to Broca’s area to do some preparations for producing it. Finally, a signal is sent to the motor area to physically articulate the word. 2.2.1 Broca’s Area Broca’s area described in 1861 by Pierre Paul Broca and is responsible for delivering a list of words and parts of words to produce meanings, in other words, it is responsible for semantic processing. This area is not exactly a speech area, but is associated with the process of articulation of speech. It controls not only spoken, but also written and signed language production. Patients with lesion of this area can understand language and conversation, even complex concepts, but cannot talk coherently. (Keith A. Johnson, 2000) 2.2.2 M1-Mouth Area This is the area of the brain is linked to control the physical movements of the mouth and articulators in producing speech. This part is a part of motor cortex, and controls the muscles of the face and mouth. The rest of motor cortex controls other parts of the body’s movement. It is located near Broca’s area and works in speech tasks along with Broca’s area. (Keith A. Johnson, 2000) 4 2.2.3 Wernicke’s Area Wernicke’s area is an area of semantic processing. It is associated with memory functions, hearing function and object identification, as well as language comprehension, in written or spoken language. Wernicke’s area works with Broca’s area, Wernicke’s area takes care of incoming speech and Broca’s area controls outgoing speech. (Keith A. Johnson, 2000) 2.2.4 Auditory Cortex This area of the brain manages recognizing and receiving sound. When people speak or read words aloud, there is evidence that they listen to themselves as they are speaking in order to make sure they are speaking correctly. (Keith A. Johnson, 2000) 2.2.5 Visual Cortex This area is also known as the striate cortex and it has responsible for vision. The visual areas of the brain are the first parts of the brain which are activated in reading and object naming. Above this visual area there is another region which is associated with object naming and word reading, and is thought of as supplementary to the primary visual cortex. (Keith A. Johnson, 2000) 2.3 Spontaneous Speech Spontaneous or unprepared speech is the most natural form of an interactive communication. This phenomena unlike written text, is online i.e. when speakers say something which is needed to be repaired, there is a little opportunity to take it back. In contrast, written text provides always the chance to be revised and rephrased and that is one of the major difference between spontaneous speech and written text. In spontaneous speech speakers use unfilled pauses (or silent), filled pauses (such as err, eh, uh, uhm), truncated words, restarts, mispronunciations, “editing terms” like oops, sorry, no, I mean and so on. These are referred to as disfluency also dysfluency, nonfluency, disturbance and discontinuity. 2.4 Conversational Interaction Conversation is an activity where two or more people speak by turns. Although speaking turns are not pre-allocated in conversation and it is clear that turn distribution is systematic. But typically, only one person speaks at a time and the others are expected to keep silence between speaking turns. (Despite the fact that it is not true in some cultures.) If more than one speaker tries to talk at the same time, one of them usually stops. Therefore participants wait until one speaker shows that he or she has finished by signaling a completion point, for example, by pausing at the end of a phrase or a sentence. But if the speaker wants to keep his or her own turn should avoid having pause at the end of sentences. Speaker should make sentences run on by using connectors like and, and then, so, but or by using hesitation markers or filled pauses such as er, em, uh, ah. 5 If other participants want to talk, they usually indicate it by different ways both verbal and nonverbal behaviours such as making short sounds usually repeated while the speaker is talking, shifting their body or using facial expressions to signal that they have something to say. These signals are defined as turn-taking signals. In this situation the participants can show their character as “rudeness” (the speaker cut another speaker) or “shyness” (the speaker keeps waiting for an opportunity to take a turn and it seems to not occur). (Yule, 1985) Many attempts to make naturally looking talking heads have focused on turntaking signals to process natural language behaviour in conversational interaction. 2.5 Disfluency Several approaches have been studied in disfluency or hesitation phenomena in normal speech, for example Freud discussed disfluency from a psychological point of view as something that opens our inner character or individuality. Within stuttering research, speech therapists, psychologists and speech pathologists have stated the difference between pathological speech, like stuttering (or stammering), and normal disfluencies of human languages. Many researches have also been done in disfluency from a gender perspective, connected to gestures and body language. When we talk about some topics, we have always a number of choices of words and the ways of expressing them. The more formal, structured and discipline, the fewer the options. That is our choice about how we talk and how fluent we are. Eklund (2004) noted that disfluency production is largely dependent on psychological and individual factors. He has also noted that disfluency to some degree is under speaker control. Some disfluencies such as silent or unfilled pauses and filled pauses (uh, ah, er, and um) are based on the hypothesis of that these pauses or breaks in speech flow provide the time for the production of speech to search for the next word or phrase. Eklund (2004) has observed the disfluency phenomena in the following fields: Unfilled Pauses (UPs) Silent parts in fluent speech, an example would be “I want a ..... go to school”. Unfilled pauses are not always disfluencies, sometimes we use pauses to mark sentence boundaries (Deese, 1978) and all silences shorter than 250 ms, are not counted as unfilled pauses, they are more like silences of longer duration. (Goldman-Eisler, 1968) Filled Pauses (FPs) Also called “vocalized pauses”, such as “eh, em, er, um, ah, ..”. Prolongations (PRs) Some phones or segments which are longer than in normal fluent speech, like “I’m comiiiiiing”. Truncations (TRs) Cut-off words or syllables, in other words, interrupted words or syllables, e.g. “li..li..little”. Mispronunciations (MPs) Words with the wrong pronunciation and it is almost similar to the term “slip-of-the-tongue” (see next section), such as “black bloxes” (for “black boxes”). Repairs (REPs) This term is a kind of self-corrections, sometimes including substitutions (I want to drink a cup of coffee tea), repetitions (can I can I have a cup of tea), insertion (I want to have a cup of tea hot tea). Explicit Editing Terms (EETs) Words or phrases like “sorry”, “you know”, “opps” and so on. 6 2.5.1 Tongue Tips and Slips As language users, we all experience occasional difficulty in getting the brain and speech production to work together. The tip-of-the-tongue phenomenon is when we know the word but it just won’t come out on the tongue. Researches have shown that speakers generally have a correct phonological outline of the word, they know the initial sound of the word and even the number of syllables in the word. It indicates that our “word-storage” can be partially organized on the phonological information and some words in that “word-storage” can be easily retrieved than other words. For example, speakers produced secant, sextet and sexton, when they were asked to name a type of instrument; sextant. A similar type of speech error is described as slip-of-the-tongue, also is known as a Spoonerism, after the Rev. William A. Spooner, an Oxford dean, who was famous for his tongue-slips.(Yule, 1985) Spoonerism results when a sound is carried over from one word to the next, in other words, an intended word is replaced by another word such as the thine sing (for “the sign thing”), black bloxes (for “black boxes”) or noman numeral (for “roman numeral”). These examples give us a clue to the normal working of the human brain functions. 2.6 Research Tools in Speech Science One of the research tools which provides an analysis of the acoustic speech signal is spectrogram. Moreover, spectrogram (or sonogram) is a visual description of an acoustic signal and give an analysis of different components of speech. In addition to spectrogram, pitch contour and waveform of the acoustic signal are other ways to make sound waves visible. There are two different kinds of spectrograms, the narrowband spectrogram and the wideband spectrogram and both show different things. 2.6.1 The Narrowband Spectrogram A narrowband spectrogram displays horizontal bands (consider Figure 2.2) which represent the harmonics of the glottal source. The darker bands represent the harmonics that are closest to peaks of resonance in the vocal tract. The lighter bands represent harmonics whose frequencies are far away from the resonance peaks. The bandwidth of the filter used to generate narrowband spectrograms is usually somewhere between 30 and 50 Hz. Narrowband spectrograms are traditionally being used for measuring the fundamental frequency and intonation. (Gloria J. Borden, 1980) 2.6.2 The Wideband Spectrogram Wideband spectrograms are broad bands of energy that depict the formants; which are the actual peak of resonance at a particular moment. They are dark bands on a wideband spectrogram which correspond to a vocal tract resonance. Different vocal tract shapes will produce different formant patterns. Formants are conclusive for understanding of vowels. In contrast, formants in consonants are less clear and have lower intensity. Wideband spectrograms are used in spectrogram reading because they give us more information about what is going on in the vocal tract. Speech scientists are more 7 Figure 2.2: Narrowband spectrogram Figure 2.3: Wideband spectrogram interested in the changing resonance than the harmonic of the glottal source. Therefore, wideband spectrogram is most appropriate alternative to measure formants compared to the narrowband spectrogram. Wideband spectrogram in Figure 2.3 shows the horizontal bands of energy that represent the formants are composed of individual vertical lines. The blank spaces of the wideband spectrogram indicate silence, including pauses and some sort of silent gaps that are generated by voiceless stop closures. The bandwidth of a filter used to generate wideband spectrograms is between 300 and 500 Hz. Researches who want to measure the duration or the time of acoustic occurrences use wideband spectrograms. (Gloria J. Borden, 1980) 2.7 Language and Machines In 1738, Jacques de Vaucanson produced amazing mechanical duck which could perform the act of drinking water and eating grain. The containing was digested and then excreted via a mysterious chemical process in its stomach. (Yule, 1985) This is one example of human ambitions that have always tried to create a model of natural or- 8 Figure 2.4: Synface, adopted from the SYNFACE project at kth ganisms, and natural articulation of speech sounds have always been one interesting aspect to be modeled by machines. Many devices of talking machines were modeled from the function of human vocal tract to produce phrases and sentences, pronunciation of individual words, intonation and pausing as well as syntactic rules for the formation of natural language sentences. However, according George Yule, the development of synthetic speech, would only produce a model of speech articulation not a model of “speaking”. Having something to say is an attribute of the human’s mental processes and attempting to model that attribute is, the modeling of intelligence. 2.7.1 Talking Heads Audio-visual speech synthesis, is based on automatic generation of voice and facial animation. The visual speech synthesis uses 3D (three dimensional) polygon models which are parametrically articulated and deformed. Researchers main interest in head and facial movements is to synthesis talking heads which look as naturally as possible. Munhall (2004) found in his research that head movements contribute people to understand an uttered message more accurately. Cassel (2000) also points out in his article that a user of an animated talking agent with nonverbal communication understand better and is more cooperative than an animated talking agent with no nonverbal communication. One of the main application of audio-visual speech synthesis is within the sphere of hearing impaired, in fact, people who have hearing difficulties to communicate by telephone. These people usually use lip-reading during conversations. However, it dose not work over the telephone, but this is done in the SYNFACE project which uses a computer program. A computer that is attached to user’s telephone will show an artificial face. When the user makes a phone call she or he is able to listen and at the same time watch the artificial face to lip read it and follow all information. Figure 2.4 shows a SYNFACE on a computer screen. Users can choose different faces by clicking on the buttons under the face and there is also another control under the face to change the volume of the sound. On the right, there is a phonebook and a keypad for dialing. The user can also attach a face to a telephone number in his or her phonebook in order to choose the same face every time the user dial that number. 9 As means to make communications easier, SYNFACE can also be used on laptop computers and mobile phones. 10 3 Experimental Design To analyze the movements of head and eyebrows in a conversation, this experiment has been done under natural circumstances. Two native Swedish speakers have participated in the recording of an interview. In order to make the dialogue as spontaneous as possible, the subjects have been allowed to talk about an optional topic. To get a wider analysis of spoken dialogue, I have also used other data collections belonging to one additional subject who had taken part in another study related to head movements. Several minutes of the interview of each subject were listened and various types of disfluency were localized, measured and analyzed. In total, 22 minutes of recorded material were analyzed. 3.1 Recording The participants interviewed one by one in a sound-isolated studio. They wore special spectacles with five hemispherical markers on it, for recording of the head movements. These hemispherical markers have a diameter of 4 mm and is made of a reflecting material. Two markers were also attached to the subject’s eyebrows (one on each eyebrow) for the purpose of recording the movements of the eyebrows. The spectacles lack the upper part so that the movements of the eyebrows can be recorded by the system (figure 3.1). Moreover, the studio is equipped by a microphone which registers the sound and four IR cameras (cameras with infrared light) for capturing the movements of the Figure 3.1: Spectacles 11 markers with a frequency of 60 Hz. In other words, the cameras flash infrared lights1 to the markers and then these lights will be reflected into the cameras to register the movements. IR cameras can register the light in three dimensions (y-, z-, and xaxis). Each camera catch the light from a certain angle and that can be adjusted by moving the camera. All movements data are sent via four special video processors to a Mac computer for storing the data and post preparation. All equipments are from the manufacture “Qualisys” and both the software which is used in Mac computer and the system are called “MacReflex”. The recording time of the movements is limited. The movement data is in a special format; TSV (Tab Separated Values) format, in order that, these formats can be easily read by Wavesurfer. The sound is recorded and sent to a PC for storing and post preparation. The sound is recorded in two channels, a sound signal and a sync signal. The sync signals are used for synchronizing the sound and the movements, these signals can be filtered out by Wavesurfer. The accuracy of the measurement of the movements is about 0.1 mm in three dimensions. 3.2 Measurements The co-ordinates of the markers are measured in order to get the values of the vertical(y), the inclined- (z) and the horizontal (x) movements of the head and also the vertical (y) movements of the eyebrows. The vertical directions (y) exhibit the motions of “up or down”, the inclined directions (z) exhibit the motions of “front or back” and the horizontal directions (x) display the motions of “left or right” (Consider Figure 3.2.) The co-ordinates of the markers at the eyebrows are measured only in vertical movements which is “up or down” (y-axis) and then are transformed to a coordinate system. The acoustic signals which are synchronized with the movements of the markers are used as means to see the correlation of disfluency and the movements of the head and eyebrows. 3.3 Apparatus 3.3.1 Wavesurfer Wavesurfer is a signal processing program which is utilized as a tool for analysis of speech signals. Wavesurfer is enormously useful for doing the tasks in speech research and education, like speech analysis and speech transcription. In Wavesurfer both acoustic signals and movement signals which have already been synchronized can be studied in different windows at the same time. The sound is represented by spectrogram, waveform and pitch contour and the movements are represented by three windows (each for one type of movements i.e. one for vertical direction or y-axis, one for inclined direction or z-axis and finally one for horizontal direction or x-axis) with curves. Rising of a curve indicates movement “up”, “front” 1 Infrared is the region in the electromagnetic spectrum that falls between radio waves and visible light. Infrared can’t be seen, but it can be felt as heat given a strong energy source. A camera equipped with infrared-sensitive film can take pictures of “warm” objects in low-light environments. Most people are familiar with infrared as implemented in remote control devices. 12 Figure 3.2: Wavesurfer or “right” and falling of a curve indicates movement “down”, “back” or “left”. Figure 3.2 represents an unfilled pause and the head movements. 3.4 Observations In this pilot study I have looked at four types of disfluency in relation to movements of the head and eyebrows, i.e. unfilled pauses, filled pauses, prolongations and truncations. I listened to all sound files in order to trace disfluency and simultaneously studied the related TSV files in three directions (y-, z- and x-axis) for head movements and in one direction (y-axis) for the movements of the eyebrows in different windows in Wavesurfer. The mentioned types of disfluency were localized and marked in Wavesurfer and were then measured to see how long that type of disfluency lasted. Finally, the correlation between disfluency and the movements of the head and eyebrows were analyzed. I had access to 10 TSV files related to the movements of the eyebrows of one subject for the observation of the motions of the eyebrows and 13 22 TSV files of the head movements of three subjects for the observation of the head movements. The result of this analysis will be presented in the following chapter. 14 4 Results of the Evaluation The speech was transcribed orthographically word-by-word and then after listening to the all conversations, the correlation of disfluency (according to the disfluency category described in the following) to the movements of the head and eyebrows were analyzed. The scales of y-, z- and x-axis of the head movements are counted in mm and the scale of y-axis of the movements of the eyebrows is counted in a tenth of mm. 4.1 Unfilled Pauses (UPs) An unfilled pause refers to a silent part of fluent speech, when a speaker turn silent for shorter or longer periods of time. Sometimes these silences can barely perceived but they can be also very long indeed. (Eklund, 2004) Unfilled pauses are the most problematic type of the disfluency category because all unfilled pauses are not always disfluencies, sometimes we use pauses to mark sentence boundaries (Deese, 1978) and all silences shorter than 250 ms, are not counted as unfilled pauses, they are more like silences of longer duration. (Goldman-Eisler, 1968) Table 4.1 displays some examples of unfilled pauses which occurred to the subjects during the interview: Table 4.1: Utterances containing unfilled pauses Subject Subject 1 Subject 1 Subject 2 Subject 2 Utterance ... högst upp på det här UP huset och ... ... at the top of this UP house and ... ... jag ska prata lite tydligare UP om mat ... ... I will talk more obvious UP about food ... ... dom har UP beställt tid ... ... they have UP made an appointment ... ... så fick jag då komma till överläkaren UP Schiratski ... ... then I got to come to the doctor UP Schiratski ... Duration 0.551 ms 1.647 ms 1.26 ms 1.022 ms However, unfilled pauses were the most common type of the disfluency category in this pilot study and I found 121 unfilled pauses compared to 71 filled pauses, 45 prolongations and 17 truncations. The quite common location for unfilled pauses was right before the head of the phrase, in other words, this hesitation phenomena occurred mostly before the important items in a phrase. Unfilled pauses also occurred inside words, an example in this case would be: • “... studentUPlägenhet ...”, “... studentUPflat ...”. 15 Figure 4.1: Unfilled pause and head movements in three dimensions, the scale of the y-, zand x-axis is counted in mm Another types of unfilled pauses were, when they occurred between grammatically complete forms, for example: • “... så fick jag då komma till överläkaren, UP Schiratski ...”, “... then I came to the doctor, UP Schiratski ...”. The last types of unfilled pauses that I discovered in this experiment were UPs that speakers employed in order to make a plan for pursuing their meaning, like: • “... hon bor hemma UP och vet inte vad hon vill göra ...”, “... she lives at home (or she lives with us) UP and she does not know what to do ...”. 16 Figure 4.2: The movements of the eyebrows before an unfilld pause 17 4.1.1 The Correlation of Unfilled Pauses to the Head Movements The localized areas of unfilled pauses were studied as means to determine whether there were any head movements related to the unfilled pauses. In this experiment I found out that there was no significant relation between them. However, the speakers held back and froze their gestures when they got disfluent and then head movements occurred later on. In other words, first there was an unfilled pause while the speakers held still and then the head movements occurred. Figure 4.1 displays the head movements and an unfilled pause for one of the speakers uttering the sentence “... Karlsson och Granström var UP assistenter och hade labbarna ...”, “ ... Karlsson and Granström were UP assistants and took care of the labs ...”. As is shown in this figure, first an unfilled pause takes place while the speaker freezes his gestures and then strong head movements occur. (Head movements in y- and z-axis) 4.1.2 The Correlation of Unfilled Pauses to the Movements of the Eyebrows The analysis of the correlation of unfilled pauses to the movements of the eyebrows showed that there were strong movements before unfilled pauses occurred, i.e. first the movements of the eyebrows happened and then the motions went down to unfilled pauses. The movements of the eyebrows in the relation with unfilled pauses can introduce the case of focal accent, i.e. the speaker may signal a focal accent in a sentence. In this case, unfilled pauses can be completely dependent on the semantic content. Figure 4.2 illustrates the movements of the eyebrows when one of the subjects uttered the sentence “... hon ville ha en annan lägenhet som UP nu har hon fått ...”, “... she wanted to have another apartment as UP she now has got ...”. In this figure, first strong motions of the eyebrows appear, which can signal a focal accent, and then the movements are followed by an unfilled pause, which is displayed in Figure 4.3. 18 Figure 4.3: Unfilled pause after the movements of the eyebrows 19 Table 4.2: Utterances containing filled pauses Subject Subject 1 Subject 1 Subject 2 Subject 2 4.2 Utterance ... hon eh FP hade tur för när ... ... she eh FP was lucky because when ... ... emigration och eh FP vad heter den kursen eller ... ... emigration and eh FP what is called that course or ... ... det var lite annorlunda på den tiden eh FP och jag ... ... it was a little different at that time eh FP and I ... ... så ska vi beställa tid till eh FP överläkaren ... ... so we are going to make an appointment with eh FP the doctor ... Filled Pauses (FPs) Filled pauses or vocalized pauses describe “eh, em, er, ah, um”. Filled pauses were the second most common type of disfluency category. (71 filled pauses were found in this experiment.) Table 4.2 shows some utterances which include filled pauses that subjects produced during the interview. Mostly, filled pauses signal that there are many options available to the speaker but no commitment has yet been made by the speaker and sooner or later he or she will choose one of the available alternatives. 4.2.1 The Correlation of Filled Pauses to the Head Movements The filled pauses were analyzed to find out if there was any relation between these pauses and head movements. I discovered no crucial relation between filled pauses and head movements. On the other hand, filled pauses were employed as a sort of place-holders for the next word or to continue on, to the next sentence. Somehow, filled pauses signaled that the speakers had many options available of which they elected one to continue their speech. Speakers employed these hesitation sounds to indicate uncertainty or to maintain control of the conversation while thinking of what to say next. Consequently, filled pauses did not add any new information to the conversations. Figure 4.4 displays the head movements in three dimensions and a filled pause when one of the subjects uttered the sentence “... så att eh FP men jag kunde ...”, “... so eh FP but I could ...”. There is clearly no relation between filled pauses and head movements. 4.2.2 The Correlation of Filled Pauses to the Movements of the Eyebrows In this experiment I found no significant relation between filled pauses and the movements of the eyebrows. In Figure 4.5, the movements of the eyebrows and a filled pause are displayed for one of the subjects uttering “... språket är ju mera eh FP det är ju inte just ...”, “... the language is more like eh FP it is not just ...”. 20 Duration 0.239 ms 0.571 ms 1.943 ms 1.258 ms Figure 4.4: Filled pause and head movements in three dimensions 21 Figure 4.5: Filled pause and the movements of the eyebrows 22 4.3 Prolongations (PRs) Prolongations describe phones or syllables which are longer than in normal fluent speech. Prolongations were the third most common type of disfluency category. (45 prolongations discovered in this study) Table 4.3 presents some examples of prolongations which occurred to the subjects during the interview: Table 4.3: Utterances containing prolongations Subject Subject 1 Subject 1 Subject 2 Subject 2 Utterance ... köper man dem iii PR bitar ... ... you can buy them iiin PR pieces ... ... utan det äää PR sexton grader ... ... but also it iiis PR sixteen degree ... ... det var fullt iii PR väntrummet ... ... it was full iiin PR the waiting room ... ... Schiratski som dååå PR tittade på det här ... ... Schiratski who theeen PR looked at this ... Duration 0.385 ms 508 ms 0.400 ms 0.633 ms Prolongations can be seen as another way of hesitating without being silent. In this case, speakers hesitated by drawling phones to continue their speech without being silent. 4.3.1 The Correlation of Prolongations to the Head Movements The analysis of the correlation of prolongations to the head movements showed that there was no relation between them. Instead, the speakers held back and froze their motions when they got disfluent and then a head movement took place. In other words, first a prolongation occurred while the speakers held off, without any acting and then head movements appeared. Figure 4.6 illustrates the head movements and a prolongation for one of the speakers uttering the sentence “... 40 poäng i alltsååå PR språkdelen i fonetik ...”, “... 40 points in that is to saaay PR language field in phonetics ...”. This figure shows no correlation of prolongations to the head movements. As is shown in Figure 4.7, head movements appear after a prolongation, which is displayed in the previous figure (Figure 4.6.) 4.3.2 The Correlation of Prolongations to the Movements of the Eyebrows I found no crucial relation between prolongations and the movements of the eyebrows. Figure 4.8 displays the movements of the eyebrows and a prolongation for one of the subjects uttering “... hon läser ju dååå PR det blir mycket ...”, “... she studies theeen PR it becomes too much ...”. 23 Figure 4.6: Prolongation before head movements 24 Figure 4.7: Head movements after a prolongation 25 Figure 4.8: Prolongation and the movements of the eyebrows 26 4.4 Truncations (TRs) Truncations are also called cut-off words or syllables, in other words, they are not fully executed or finished but they will be completed later, after an unfilled pause. Some truncations occur due to interlocutor interruptions but in this pilot study I have observed the truncations which speakers produced by themselves without any interruptions from the interlocutor. Here, the total number of truncations was 17. Table 4.4 introduces some utterances which include truncations that subjects produced during the interview: Table 4.4: Utterances containing truncations Subject Subject 1 Subject 1 Subject 2 Subject 2 4.4.1 Utterance ... den ligger liliTRlite utanför stan inte långt ... ... it is located a liliTRlittle outside of the town not far ... ... det blir mycket att hon joTRjobbar mot ... ... it becomes too much that she woTRworks against ... ... Ericsson som då haTRhade hållt på med telekommunikation ... ... Ericsson which at that time waTRwas working with telecommunications ... ... jag fick ju ggTRgå alltså jag fick faktiskt gå på den här kursen ... ... I got to ggTRgo as a matter of fact I got to take this couse ... The Correlation of Truncations to the Head Movements To find out whether there were any head movements related to the truncations, the localized truncation areas were studied and I discovered no specific relation between them. The correlation of truncations to the head movements was quite similar to the correlation of unfilled pauses and prolongations to the head movements, the speakers froze and held back their gestures when they got disfluent and then at the end of the truncations, head movements occurred. In other words, at the beginning of the truncations speakers held still and at the end of the truncations and even after that, head movements occurred. Figure 4.9 displays the head movements and a truncation for one of the speakers uttering the sentence “... den ligger liliTRlite utanför stan inte långt ...”, “... it is located a liliTRlittle outside of the town not far ...”. As is shown in this figure, at the beginning of the truncation speaker freezes her gestures but at the end of the truncation quite strong head movements appear. 4.4.2 The Correlation of Truncations to the Movements of the Eyebrows Contrary to the results concerning the correlation between truncations and head movements, there were strong eyebrow movements before truncations occurred, i.e. first, the movements of the eyebrows appeared and then the movements went down with the truncation. These movements in relation to truncations can present the case of focal accent, i.e. the speaker may signal a focal accent in a sentence. In this case, truncations, like unfilled pauses, can be totally dependent on the semantic content. 27 Duration 1.668 ms 0.887 ms 0.335 ms 0.875 ms Figure 4.9: Truncation and head movements in three dimensions 28 Figure 4.10: The movements of the eyebrows before a truncation 29 Figure 4.11: Truncation after the movements of the eyebrows Figure 4.10 illustrates strong movements of the eyebrows when one of the subjects uttered the sentence “... nej hal TR halvtid det jobbar hon bara ...”, “... no hal TR half time she just works ...”. First, strong motions of the eyebrows occur which may signal a focal accent and then the movements end up with a truncation, which is displayed in the next figure (figure 4.11 ). 4.5 Conclusions and Future Development To sum up, I have analyzed the correlation of some types of disfluency such as unfilled pauses, filled pauses, prolongations and truncations to the movements of the head and eyebrow in spontaneous speech. Unfilled pauses, prolongations and truncations did not seem to have a crucial connection to the head movements. On the other hand, the speakers froze their gestures when they got disfluent and head movements occurred later on. In the case of truncations, the head movements started at the end of the truncations and even after 30 that. Filled pauses were not either directly related to the head movements, but they played a role as place-holders for the next word or to continue on, to the next sentence. Filled pauses were also employed to maintain control of a conversation while thinking of what to say next. Filled pauses and prolongations did not have any relation to the movements of the eyebrows. However, unfilled pauses and truncations had indeed a significant meaning in the relation with the movements of the eyebrows. There were strong movements of the eyebrows before unfilled pauses or truncations occurred. The movements of the eyebrows in relation with unfilled pauses and truncations can be linked to the concept of focal accent, i.e. the speaker may signal a focal accent in the conversation and in this case, unfilled pauses and truncations can be completely dependent on the semantic content. The experiment described here is done in a limited domain. The next step of research in this field would be to employ several more subjects to perceive if it is possible to generalize these results in spontaneous speech, especially concerning the correlation of unfilled pauses and truncations to the movements of the eyebrows. Another interesting step of research in this field can be also to retrieve other visual information, for instance eye movement: is there any simultaneous correlation of disfluency to eye movements? 31 Bibliography Cassel, Justine. Verbal communication: Using approximate sound propagation to design an inter-agents communication language, 2000. Deese, James. Thought into speech. american scientist, 1978. Eklund, Robert. Disfluency in Swedish human-human and human-machine travel booking dialogues. Institute of Technology Linköping University, 2004. Geschwind, N. Disconnexion syndromes in animals and man, brain, 1965. Gloria J. Borden Katherine S. Harris, Lawrence J. Raphael. SPEECH SCIENCE PRIMER. Library of Congress Cataloging-in-Publication Data, 1980. Goldman-Eisler, Frieda. 1968. Psycholinguistics: Experiments in spontaneous speech, Hans Peter Graf, Eric Cosatto Volker Storm Fu Jie Huang. Visual prosody: Facial movements accompanying speech. http://www.cstr.ed.ac.uk/downloads/publications/2002/paper.vtts.pdf, 2002. Keith A. Johnson M.D., J. Alex Becker Ph.D. Atlas: The whole brain. http://www.med.harvard.edu/AANLIB/home.html, 2000. Kelly, Spencer D. Does gesture play a special role in the brain’s processing of language?, 2001. Munhall, Kevin G. Visual prosody and speech intelligibility, 2004. Yule, George. THE STUDY OF LANGUAGE. Press Syndicate of the University of Cambridge, 1985. 32 A Appendix 1 When interviewer has taken part in the conversation, it has marked with an I. File related to the subject number 3 is belong to another study related to the head movements which I have also employed in this thesis in order to obtain a wider analysis of spontaneous speech. A.1 A.1.1 Subject 1 file 1 Så *skratt, I: SÅ VI PRATAR OM FRITIDEN. Hum (UP) ju (PR) men det här väl så då att min fritid (UP) går mycket då till den stugan em (FP) om (UP) jaha vad sa jag berätta mera? Jag kan berätta om mina barn. I: Ja. Eh (FP) dem är två stycken och eh (FP) min äldsta är ju (PR) tjugofy(TR)tjugofy ah hon fyllde tjugo fyra åring (UP) bor i malmö, (UP) I: OKEJ. Em (FP) går på högskolan där och har just fått lägenhet. I: AH BRA. Ah ja jätte bra. Hon eh (FP) hade tur för när hon började i (PR) Malmö så eh (FP) så hade dem byggt om ett stort sjukhus till student(UP)lägenheten eh (FP) men så hade hon fått den lägenhet dä(TR)där högst upp på det här (UP) huset och där (UP) läckte det in vatten. I: JASÅ. Så att där har hon kört på nu när (UP) hon vill ha en annan lägenhet. Som (UP) nu har hon fått nytt i stan, den här (UP) lägenheten den ligger lili(TR)lite utanför stan inte långt eller utanför stan den (UP) låg inte mitt i centrum. I: VAD LÄSER HON? Men (PR) eh (FP) nu då så fick hon en ny lägenhet det är ju med att (UP) det läckte lite då (UP) hon körde mycket på det. *skratt. I: LÄSER HON ALLTSÅ PLUGGAR HON? Ah precis. I: VAD PLUGGAR HON? På eh eh (FP) emigration emigration och eh (FP) vad heter den kursen eller program? eh (FP) den heter eh (FP) sve det var språk och emigration. I: SPRÅK OCH EMIGRATION? DET HÄR HAR JAG INTE HÖRT FÖRUT MEN DET ÄR LIKSOM SPÄNNANDE. Ah precis så att eh (FP) hon läser ju då eh (FP) det blir mycket att hon jo(TR)jobbar mot... A.1.2 file 2 Eh det är det eh (FP) språket är ju mera eh (FP) det är ju inte just att du läser engelska eller så utan det är ju eh (FP) kommunikativ språk har dem hum. I: JA, DET HÄR LÅTER SPÄNNANDE. Hum, hon trivs jätte bra, nu som hon har läst till (PR) filkand då men hon ska fortsätta till magister. I: DU HAR JU ETT BARN TILL. Ah jag har ett barn till. Hon bor hemma (UP) och vet inte vad hon vill göra *skratt. Hon är tju hon fyller tju ett nu. (UP) Em (FP) och hon lä(TR)läser eh (FP) på halvfart i norrköping (UP) på (PR) eh (FP) I: VAD LÄSER HON? religionsvetenskap. I: 33 KUL. och sen har hon fått en anställning på posten som (UP) brevbärare. I: I NORRKÖPING? Nej här i stan. Ja sen jobbar (PR) borta vid ”gekholmen”. (UP) *skratt. I: MEN HON BOR HEMMA. Hon bor hemma, hum, hon tycker att det är jätte bra jätte bra. I: VISST. Sa du visst I: JA VISST VARFÖR INTE? *skratt. Ah ja så att jag vet inte men nu har jag sagt åt att hon måste tag i livet. (UP) Efter sommarn (UP) det går inte å gå omkring å (PR) liksom inte vet vad man vill så (UP) så får vi se vad det blir med det ah. I: DET ÄR SVÅRT NU FÖRTIDEN FÖR UNGDOMAR. Ja det är det. (UP) hon haft jätte tur vet du som har (UP) hon jobbat eh (FP) jag tror att hon jobbar mer än 50 procent på posten. I: VA BRA. Hum och få just att få en anställning utan å annars tar dem i bara in där (UP) när det behövs så som extra men nu har hon fått en anställning, så att det e det e jätte bra. I: H ELTIDSANSTÄLLNING KANSKE. Nej hal(TR)halvtid det jobbar hon bara um (FP) men det är ju ah ... A.1.3 file 3 Anställning i (UP) posten eh (FP) I: HON TRIVS PÅ JOBBET? Ja det är klart det är bra. I: CYKLAR HON MYCKET? Cyklar mycket? Hum det gör hon mycket ramlar mycket omkull varje dag *skratt. Drar hon upp sina knäna. Antingen cyklar omkull eller annars har hon ramlat genom trapp. I: JAG HAR SETT ALLTSÅ DEM DET FINNS VÄL SMÅ BILAR LIKSOM SOM DE KÖR KANSKE DET E SMIDIGARE Å HA EN SÅN I STÄLLET FÖR CYKEL. Ja precis. Hon (PR) jobbar ju mera (PR) i (PR) (UP) vad heter det i (PR) ”gekholmen” å (UP) ”berga” heter det (UP) så de mycket såna hus då (PR) springer dem upp igenom trappan då får dem bara cykel. I: JA VISST. (UP) Okej (UP) ska vi prata om något annat. Ja (PR) I: OM JOBBET. Jag ska prata lite tydligare (UP). Om mat (UP) *skratt. Vi ska prata om mat. Jaha. *skratt. Eh (FP) I: ÄR DU, ÄTER DU KÖTT? Ja jag e kött, jag äter kött, däremot min dotter då min äldsta dotter hon e vegetarian. Mycket svårt å jag trodde att det skulle gå över när hon var stor men det har inte gått över. Så hon äter aldrig kött. I: DET E SVÅRT ATT LAGA MAT, HITTA PÅ NÅGOT. Precis det e ju mycket (PR) vi köper som (UP) färdig korn (UP) såna du vet svampar ser ju ut som köttfärs egentligen som man köper fryst men det är svamp å så köper man dem i (PR) bitar det är också korn ... A.1.4 file 4 Det finns i (PR) bland det vegitariska man måste krydda jätte mycket I: ANNARS SMAKAR DET INGENTING. Nej nej precis men som kött om eh (FP) köttfärs sås till exempel med korn då (UP) så är det (UP) helt okej (UP), men man få ta e lök å svamp å så I: ÄTER NI SAMMA MAT MED HENNE? Ah nä(TR)när hon kommer hem så gör vi faktiskt det. I: ALLTSÅ NI, INGEN KLAGAR PÅ? *skratt Nej nej det e jag tycker att det smakar ungefär som som kött. Ja, det gör det och just (UP) dem kött bi eller de bitarna är ju som kyckli(TR)kyckling i sig smakar ju heller inte så mycket om man tar bara kycki(TR)kyckling om man inte krydda. I: MEN HON VET ATT HON MÅSTE VARA FÖRSIKTIG LIKSOM ATT TA PROTEIN BEHÖVER HON. Ah precis men det e något lite si å så med det. Jah det är det är sånt som man har tjatat om (UP) mycket. Det finns många som ja det finns mer folk som vegetariska och sen finns det mera alternativ alltså mer eh att välja på. Ju det finns jätte det är jätte stor skillnad hon började när (UP) det var väl mest att det var mode när hon var 15, 16 år. Hum, men nu är det egentligen eh oj nu kommer ju massa folk å tittar 34 massa student. I: GYMNASISTER FRÅN NYKÖPING DE HAR STUDIEBESÖK, DE BARA SER LIKSOM Å BERTIL KOMMER ATT BERÄTTA OM LABBEN. Okej. Ja. Ha. *skratt. I: JAG VET INTE HUR LÄNGE SKA VI PRATA ... A.1.5 file 5 Vi ska fortsätta å prata ett tag lite. I: JA PRECIS VI SKULLE VARA HÄR OCH PRATA EN LITE STUND TILL. En lite stund var det faktia. *skratt. (UP) Vad ska vi prata om igen? Ju vi kan prata om att jag ska till ”marocko” . Det ska jag göra om 14 dar. I: VA SPÄNNANDE. Så jag åker o(TR)om två vecker idag. Hum, eh (FP) I: JAG HAR ALDRIG VARIT DÄR MEN JAG HAR SETT PÅ TV. Ikväll det är i i packat och klart. Det e om (PR) ”marackesh” (UP) å det ska du åka. I: HUR LÄNGE SKA DU STANNA DÄR? En vecka. I: ÄR DET ”MAROCKO” SOM DU SKA TILL ELLER VILKEN STAD? Ti ”agadir” ja (UP) så vi trodde att det skulle bli sol och mycket värme (UP) så och vi tittar på karta varje dag min kompis (UP) vi är 4 styck eller vi är 8 stycken som åker. I: Å VA HÄRLIGT. Ah (UP) Och det är inte sol utan det är (PR) sexton grader och kallt och regn har det värt 14 dar nu (UP). I: MEN DET KAN MAN INTE, DU SKA ÅKA OM TVÅ VECKER. Ja precis. Ah, men vi säger d(TR)det får bli det kolturella den här gången (UP). Vi bor precis vid havet I: OJOJOJ. Fast det e väl inte värmt men (PR) vi har som temprerad bas(TR)bassäng. I: MAN KAN HITTA PÅ. Ja. I: EN VECKA ÄR INTE SÅ LÅNG. Nej, precis. Nej. Egentligen så skulle man väl åka längre men. I: OM TVÅ VECKER BLIR DET ALLTS, VI NÄRMAR OSS VÅREN. Ja. I: 21 MARS ÄR DET, FÖRSTA VÅREN. Ja vår dag jämning ja. I: DET NÄRMAR SIG BÄTTRE SÅ DET HOPPAS. Vi får hoppas på det bästa. I: JAG HÅLLER TUMMARNA. Ja. *skratt. A.2 A.2.1 Subject 2 file 1 Okej då kör vi igång här ifrån Ericsson igen som då ha(TR)hade hållt på med telekommunikation. Jo, jag fick ju gg(TR)gå alltså jag fick faktiskt gå på den här kursen på arbetstid. (UP) Å det var ju inte, ja det var väl en eller två gånger i veckan som jag stack på eftermiddagen Och det var Gunnar Fant som höll kursen (UP) i huvudsak och Karlsson och Granström var (UP) assistenter och hade labbarna. (UP) Så att eh (FP) och eh (FP) så hade han ju lite gästföreläsare också då Gunnar Fant. Men på den tiden fanns ju inte syntes och igenkänning, utan (UP) det handlade om (UP) hur tillämpar man fyrpolteori eller hur allt sitt (UP) matematiska arsenal på att räkna på käften. (UP) Det var liksom det som (UP) kursen. Sen så var det då perception och en av han som (UP) pratade om perception det var Å-G Möller och han var på fysiologen på Karolinska. (UP) Och det är rätt kul med den här killen, han han börja som (UP) fonetiker (UP) och då tyckte han att han kunde för lite... A.2.2 file 2 ..så tyckte han kunde för lite om det medicinska, så nu är han medicinare (UP), ja *skratt*. Och sen så (PR) höll han på me med med och följde (UP) talet från (UP) det perifera hörselorganet alltså öronen och så upp till hjärnan. Och så mä(TR)mätte 35 signalen me(TR)med elektroder på vägen alltså små kapilär elektroder. Dom hade dom på katter. Man lyfte på locket till hjärnan och så körde ma in och så följde man (UP) hela vägen upp alltså. Och det var (UP) otroligt spännande tyckte jag, han var ju lika bra svänga upp ån eh (FP) ekvationer som aldrig. jag var otroligt imponerad av den där karln alltså. Så det var lite annorlunda på den tiden. eh, (FP) och jag kommer ihåg att att jag missa eller vad säger jag jag missa tentamen i det här, eller jag körde inte, jag missa den. Därför att (UP) jag fick problem med mitt öra. A.2.3 file 3 ..öra. Och eh eh (FP) och så en dag, men sen när jag kom till Stockholm då försökte jag ringa, ja det var ju omöjligt att få nån tid för att kolla mitt öra ju. (UP) Och sen så vaknade jag en natt och då var det blod på hela kudden. (UP) Och då eh (FP) då ringde jag (UP) sjukvårdsupplysningen och fick ju naturligtvis ett annat nummer till (PR) eh (FP) ja. Och då kommer jag fram till en tant då som sa: (UP) jaha, har ni försökt att sätta bomull i öronen? *skratt* Det blödde ju ut. (UP) Så iallafall så fick jag en tid att komma direkt på morgonen alltså klockan åtta. Och eh (FP) den här tanten, eller vad säger jag den här sköterskan då som när jag kom dit där så var det fullt i (PR) väntrummet på öronkliniken där ju så sa hon: (UP) Dom människorna där, dom har (UP) beställt tid, du har inte beställt tid du får vänta hela dan *skratt* och så fick jag, så fick jag. A.2.4 file 4 Det roligaste var då, då tittade den här läkaren: va fan har hänt med det här örat är det eh (FP) är det eh (FP), ja vi får nog sätta in pencillin, ja det rann ju inte blod då va. Vi får nog sätta in pencillin för det kanske är någon infektion och så vidare. Och vi ska nog, eh (FP) vi får sätta in pencillin så får vi se hur det blir och sen så ska vi beställa tid till eh (FP) överläkaren. (UP) Och så fick jag då komma till överläkaren, (UP) Schiratski, som då (PR) titta på det här: såna här öraon kan man inte hålla på med och gå med. (UP) Det här måste vi operera. Han var säkert, han höll på med sin doktorsavhandling *skratt* och eh (FP) då eh (FP) så att eh (FP) men jag kunde ju själv bestämma ändå va. men det var detta som gjorde att jag kunde inte gå upp i tentamen i talöverföring. Så jag jag fick upp på (UP) omtentan, så vi var två stycken som tenterade talöverföring. A.2.5 file 5 ...på det här. Och det är ett stort steg när man jobbar på industrin faktiskt. (UP) Så då eh (FP) gick jag och fundera och sånt eh (FP) om man kunde gå andra kurser och det visade sig att man kunde, läste man (UP) 40 poäng i alltså (PR) eh (FP) språkdelen i fonetik så fick man fem doktorandpoäng. (UP) Och då tänkte jag att då kan jag börja där va. Och eh (FP) och då var det så va eh (FP) efter operationen, jag fick ju operera ett öra då jo. ja det är ju också en speciell seans. Han ville jag skulle operera, det här måste man operera och så vidare och rekonstruera och greja va. Och sp kom ju då (PR) dan innan dom opererar så kollar dom ju (PR) eh (FP) att man kan överleva operationen. Lyssnar på hjärtat och allt sånt där och tittar dom i örat och då säger då den här underläkaren: varför ska vi operera det här, det ser ju fint konserverat ut (UP) och då säger överläkaren: jag känner fallet sen tidigare. 36 A.2.6 file 6 Och det var ju faktiskt så , det är så snyggt gjort att (UP) vanliga eh (FP) läkare tror ju inte att det är opererat. Alltså han har ju ändå (UP) bytt eh (FP) hörselben och hela skiten alltså. Ah (FP) ah (FP) så det ja eh (FP) och det var ju (UP) precis i den skarv då man börja göra detta, tidigare så gjorde man bara en radikal håla. Så att det inget skulle kunna bli infektioner, alltså ing. alltså ingen trumhinna. Bara låt liksom det läka ihop som ett hål in va. (UP) Men här ser det ut som med en riktig trumhinna och så var där en stigbygelplattan var kvar och så var det eh (FP) (UP) hammaren var borta. Städet fanns kvar å hängde och dingla. Så tog han och satte städet (UP) direkt mot trumhinnan och så mot stigbygelplattan och så fixerade det här. (UP) i alla fall och då när jag var konvalecent då tänkte jag ja jag mådde ju jättebra jag varju sjukskriven en hel månad, då gick jag på fonetik (UP) i Stockholm. A.2.7 file 7 ...Som man gjorde på den tiden. Utan att anmäla sig utan att anmäla, sa han då att (UP) sa han: ja dom håller på (PR) tentera idag, (UP) första kursen. jaha sa ja, när är det omtentamen? ja, det var vekcan efter. Ja då anmäler jag mig till den, sa jag. (UP) Och så gick jag upp och skrev omtentamen, alltså den gången (UP) eh (FP) ja. Och så började jag. Så jag läste fu(TR)full hastighet i fonetik (UP) vid sidan om jobbet så att säga. (UP) Men på den tiden kunde man läsa på kvällstid. (UP) Dom hade alla kurser på, alltså många ämnen hade dom parallellt hela kursen på kvällstid. (UP) Och dom hade då både halvfart och helfart. (UP) I: VAR DU DOKTORAND DÅ? Nej nej, jag var tvungen att snickra ihop min filkand ju. Annars så annars så skulle jag bli doktorand på fonetik då hade man vart tvungen ta dispans, börja läst nåt annat. Men (PR) utan jag hade min (PR) ryska va. A.3 A.3.1 Subject 3 file 1 I: DU KAN JU.. Nu ska vi prata på *skratt*, jag kan berätta om (PR) eh (FP) när vi åkte till Österrike här nu senast i (PR) eh (FP) vecka sex. En resa som jag i vanliga fall ordnar lite senare under våren. eh (FP) Vi hyr in en buss och så åker vi ner ett gäng, förhoppningsvis upp till 30-40 personer, kompisar, släktingar såna saker. Nu eh (FP) valde vi att lägga det lite tidigare på året för att det har varit lite dåligt med snö sådär sent i april. Så att för att få lite bättre snö nere i dalen la vi det tidigare. Vi blev inte riktigt så många som vi räknat med det här året, utan vi fick åka tåg istället från Linköping. I: GAMLA BAKANTA ELLER? Ja jamensan, nej men det är väldigt mycket kompisar, vi försöker samla ihop, det är jag och en kompis som ordnar det här eh (FP). Eftersom vi var så få så bokade vi ingen egen buss utan vi körde tåg då ner till Malmö. Det är väl dina hemtrakter va skulle jag gissa. I: HELSINGBORG. ja i närheten *skratt*. Nej men Och i Malmö så bytte vi då till och hoppade på en buss som gick ner reguljärt istället ner till (UP) Sel Am See, där vi bor. 37 A.3.2 file 2 Ja (PR) nu var vi bara tolv så då var det inte så. Men (UP) i vanliga fall hyr vi in hela ja, det stämmer. eh (FP) Så kom vi ner och det var en (PR) solig och vacker dag. en fantastisk fin dag, vi kom ner på eftermiddagen. Och då visade det sig att det var bara den andra fina dagen man haft på flera veckor egentligen. Så (UP) det kom lagom till vi kom ner (UP). Och sen åkte vi skidor dan efter då och det va precis lika fint väder både den dan och även (UP) dom samtliga kommande dagar faktiskt som följde. (UP) så att vi varierade oss så mycket som möjligt i dom olika (UP) skidområdena som finns där nere. det finns ju ganska mycket i Sell Am See trakten. Har du vart där själv? I: NEJ ALDRIG. Aldrig. Har du åkt skidor da? (UP) I: JA DET HAR GJORT. mm I: MEN DET ÄR MÅNGA ÅR SEDAN. Nej men då får du testa det här, för det är en riktig höjdare. Svenska fjällen är inte helt fel. Nej men här nere så finns det ju många olika områden, det finns Sell Am See, du har (UP) Kitsbühl, det ligger bara bara (UP) ett par mil därifrån, ja kanske bara en mil. Det är ju en klassisk.. A.3.3 file 3 Saalbach där har till exempel har varit VM-91 är, en jättetrevlig ort att var i och åka skidor. Saalbach hinterglenn. MarieAlm är ett stort område som också ligger inom (PR) eh (FP) väldigt nära avstånd från detta. Och så finns det en glaciär som heter (UP) Kithssteinhohl som ligger ovanför den lilla byn Kaprun som alltid är snösäkert. Väldigt trevligt område. Så att eh (FP) vi varierade oss ganska så bra där nere. Och (PR) höll oss till dom här olika områdena, å trivdes bra med det. Badderstein ligger också där i härheten. Och där var vi också. I: FINNS DET NÅN BRA BAR? va sa du? I: FINNS DET NÅN BRA BAR OCKSÅ? Nån bra bar. ja, det brukar många *skratt* det brukar va många som efterfrågar det, nej men vi vi fokuserar faktiskt på skidåkningen. Det är skidåkningne som är det viktigaste i det hela och (UP) vi hoppas att eh (FP) dom som följer med är riktiga skidentusiaster också. Att dom eh (FP) delar det stora intresset som vi har för den biten av det hela också. Se skadar det aldrig med en öl heller efter en (PR) eh (FP) hel dags skidåkning. Utan det blir ganska bra. Men det är också, det finns ju också väldigt mycket olika goda sorter också. A.3.4 file 4 Det var vecka sex. Det var en (PR) fantastisk vecka. eh (FP) Härlig skidåkning. å (PR) I:LÅNGT FÖRE SPORTLOVET. Ja, ett par veckor före sportlovet blev det. Vi hade faktiskt en kille med (UP) som (PR) ska åka vasaloppet här nu i helgen som kommer. Så att han hade me sig längdskidorna också då. Gav sig upp på 3000 meter. Vid ett ett höghöjdsspår som ligger där uppe. Och föresatte sig att han skulle åka 50 varv i den här slingan som gick, det blev väl 8 till slut men å andra sidan var slingan betydligt längre än vad han trodde så att han klarade av en hel del mil däruppe. så vi får se här nu hur det går för honom nästa vecka. Å sen hann ja va hemma en (PR) fyra dar, sen åkte jag iväg igen. (UP) Till alperna till samma ställe faktiskt. En ny resa till (PR) eh (FP) precis samma hotell och allting sånt också. Med ett gäng nya människor. (UP) Och det här var en rätt så anorlunda upplevelse den vekan. Det var ungefär lika bra väder, vi åkte till ungefär samma områden men för första gången på 38 väldigt många år där nere så fick jag uppleva en riktigt allvarlig skada. Inte jag själv utan en... A.3.5 file 5 Vi hade väl (UP) eh (FP) tyckt att det var väldigt väldigt bra, förutsättningar för skidåkning den här dan så vi föresatte oss att vi skulle köra (UP) stenhårt. Och inte vänta eller pausa på nån. Utan det var jag och den här killen som åkte upp och ner i en och samma backe som var väldigt bra just för tillfället och (PR) eh (FP) vi hade väl precis hoppat av liften och gett oss av neråt. Å jag var före, han låg straxt efter mig. När jag kom ner till (PR) eh (FP) liftstationen å skulle åka upp så vände jag mig om bara för att kolla att han var med, men det var han inte utan 200 meter upp i backen var det två stycken åkare och skidor spridda överallt som låg runt där. I: HADE DOM KÖRT PÅ HAN? Dom hade krockat i en eh mycket kraftig kollision. För att (UP) dom åkte fort båda två (UP). Så jag åkte upp och kollade hur det var, han Per som han heter han låg och hade väldigt ont i knät (UP) visade det sig. Och han ville inte flytta på det och den andra killen var orörlig. Jag var orolig för att han var medvetslös först men han (PR) var (UP) vid medvetande men hade, visade det sig, brutit le. lårbenet. 39 B Appendix 2 40 Figure B.1: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... på arbetstid UP Å det var ju inte ...” 41 Figure B.2: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... mitt öra ju. (UP) Och sen så vaknade jag ...” 42 Figure B.3: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... på hela kudden (UP) Och då ...” 43 Figure B.4: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... så fick jag då komma till överläkaren UP Schiratski ...” 44 Figure B.5: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... fem doktorandpoäng (UP) Och då tänkte jag ...” 45 Figure B.6: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... tju ett nu (UP) eh ...” 46 Figure B.7: Unfilled pause and head movements in three dimensions, when one of the subjects uttered “... vad man vill så (UP) så får vi se vad det blir ...” 47 Figure B.8: Unfilled pause and the movements of the eyebrows, when one of the subjects uttered “... högst upp på det här UP huset och där ...” 48 Figure B.9: Filled pause and head movements in three dimensions, when one of the subjects uttered “... Och eh (FP) och så en dag ...” 49 Figure B.10: Filled pause and the movements of the eyebrows, when one of the subjects uttered “... Eh (FP) går på högskolan ...” 50 Figure B.11: Prolongations and head movements in three dimensions, when one of the subjects uttered “... Det e om (PR) ”marackesh” ...” 51 Figure B.12: Prolongations and the movements of the eyebrows, when one of the subjects uttered “... när hon började i (PR) Malmö så ...” 52 Figure B.13: Truncations and head movements in three dimensions, when one of the subjects uttered “... den ligger lili TR lite utanför stan inte långt ...” 53 Figure B.14: Truncations and head movements in three dimensions, when one of the subjects uttered “... nej hal TR halvtid det jobbar hon bara ...” 54 Figure B.15: Truncations and the movements of the eyebrows, when one of the subjects uttered “... den ligger lili TR lite utanför stan inte långt ...” 55 Figure B.16: Truncations and the movements of the eyebrows, when one of the subjects uttered “... ah nä(TR)när hon kommer hem ...” 56 Figure B.17: Truncations and the movements of the eyebrows, when one of the subjects uttered “... är ju som kyckli(TR)kyckling i sig smakar ju heller ...” 57