Linguistic analysis of keystroke logging data
Transcription
Linguistic analysis of keystroke logging data
Prresentatio on nce: Referen Van Waaes, L., & Leijten, M. (20 016). Linguiistic analysiis: Analyzing keystrokee logging fro om a linguistiic perspectiive. Presentation at Woorkshop: Ussing Keystro oke Loggingg in Writing Researcch, Boston, MIT. http:///www.inpu tlog.net/MIT_worksho op.html Trainingsschool on Keystroke Logging | Antwerp Linguistic analysis INPUTLOG 7.1 From character level analyses to word level analyses a research tool for logging and analyzing writing process data Linguistic analysis Analyzing keystroke logging data from a linguistic perspective: Dutch versus English expository texts Mariëlle Leijten & Luuk Van Waes 1 Linguistic Analyses 2 Flow linguistic analyses The concept explained Aggregate letter to word level Parsing the S-notation Enriching process data with linguistic information 3 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be 4 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be Trainingsschool on Keystroke Logging | Antwerp Aggregate letter to word level Part of speech tagging and chunking 1 Extract word, word groups and sentences Tokenize sentences There is a man sleeping in an easy chair. EX V DT NN V IN DT JJ NN NP EX V DT NN V IN 5 Enrichment with process data 1 Part of speech tagging and chunking 2 There is a man sleeping in an easy chair. EX V DT NN V IN DT JJ NN O- Before Word Pause -1, -2 Thre<<ere is a_man sleapp<<ping in an easy chair. B- 140 593 -1 -2 NP B- The first pause before a word (-1) The second pause before a word (-2) B- B-NP I-NP 6 DT JJ NN B- B-NP I-NP I-NP Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be Aggregated before word pause: 733ms 7 8 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be Trainingsschool on Keystroke Logging | Antwerp Enrichment with process data 2 Enrichment with process data 3 Word production Within Word Pause 7207 7145 Thre<<ere is a man sleapp<<ping in an easy chair. Thre<<ere is a man sleapp<<ping in an easy chair. 546 499 Production time of word [EndTime of last Character of Word – StartTime first character of word] The sum of the pauses within a word [WitinWordPause 1 + WitinWordPause 2 + WitinWordPause N] 9 Enrichment with process data 4 Read more After Word Pause +1 Thre<<ere is a_man_sleapp<<ping in an easy ch... 140 +1 10 234 +1 The first pause after a word (+1) Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2015). Analyzing writing process data: A linguistic perspective. In G. Cislaru (Ed.), Writing(s) at the crossroads: the process-product interface (pp.277-302). Amsterdam/Philadelphia: John Benjamins Publishing Company. ISBN: 978 90 272 5802 1. DOI: 10.1075/Z.194 Macken, L., Hoste, V., Leijten, M., & Van Waes, L. (2012). From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information. Paper presented at the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey. Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012). From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data. Paper presented at the European Association for Computational Linguistics, EACL - Computational Linguistics and Writing (CL&W 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering, Avignon. 12 11 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be Trainingsschool on Keystroke Logging | Antwerp Linguistic perspective in L1 and L2 writing Introduction Analyzing keystroke logging data from a linguistic perspective Linguistic proficiency is important factor in writing Describe cognitive costs of formulation process Inter-word pausing dynamics Word patterns Research technique: linguistic analysis Semi-automatic analysis in Inputlog 7.1 Combination of linguistics and processes 14 13 Method Method Quasi-experiment ~ within subjects design 48 students of Master in Multilingual Professional Communication 2 Expository writing tasks Data collection: Inputlog 5 Data preparation and analysis: Inputlog 7 Relevant analysis: Description of last holiday (2’ planning + max 8’ writing) Distraction task Description of last weekend (2’ planning + max 8’ writing) Summary analysis (threshold 30 & 2000ms) Pause analysis (threshold 30 & 2000ms) S-notation Linguistic analysis 15 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be 16 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be Trainingsschool on Keystroke Logging | Antwerp Final text (product) Method Both groups needed about 8:00 minutes to describe their holiday/weekend. S-notation Linguistic Analysis (manually corrected) Error rate 14% Dutch and 12% English Dutch English Mijn laatste vakantie was naar Tenerife. Dit was van 16 t.e.m. 21 september. Ik ben hier met mijn vriend XXXX naartoe gegaan. We zijn bijna 3 jaar samen en vonden het dus wel eens tijd wordne om samen op reis te gaan. Op Tenerife was het prachtig weer, in tegenstelling tot België. in ons kleine landje hebben we deze zomer vooral wolken en regen gezien. Onze vlucht vertrok heel vroeg. Als ik me goed herinner, zijn we opgestaan om 3u om op tijd op het vliegveld te geraken. We waren dus nogal moe toen we aankwamen op het eiland. De warmte die ons tegemoet kwam toen we van het vliegtuig stapten, veranderde dit meteen. We waren ongelooflijk blij dat we eindelijk een week zouden kunnen genieten van de zon, de zee en het strand. We hebben slechts twee uitstpajes gedaan, omdat ons budget beperkt was. We moesten als student alles zelf betalen en de reis zelf kostte al redelijk wat. uiteindelijk hebben we ervoor gekozen om de vulkaan, de Teide, te bezichtigen en om een boottochtje te maken om dolfijnen en walvissen te spotten. Een bijzondere gebeurtenis is er niet echt geweest, behalve dat we in de zee waren aan het zwemmen en Hans plots een rog onder ons zag zwemmen. We waren beide erg verschoten en liepen zeer snel het water uit. This weekend was a long weekend, because Friday was a holiday and we didn't have to got to school. On Friday I haven't really done anything. I went to a party on Thursdaynight, so I was tired and all I've done that day, was watching television with my sisters. She had just downloaded the film Pocahontas and it was such a long time since I had since this film. On saturday I realised I had to do my homework, otherwise I wouldn't get it all done in time. That night I went to my boyfriend's. It was really cosy at his place because he had put on the fireplace. He recently got a new cat and it's so little, so I played with the kitten for a very long time. It wasn't planned, but I stayed over, because my boyfriend din't want to drive me home. Sunday, I had to get up at 8 o'clock, because my boyfriend had to go to Brussels with his familiy. That afternoon, I went to the swimming pool because I had to be there to assisist during a competition. I give swimming lessons on Friday and this was 'my children's' first competition. They were all very nervous, but everything worked out well. average: 256 words average: 239 words 36.5 words per minute 32.9 words per minute 18 17 Fragmentation General pause results Probability of a pause longer than 30ms within and between words Pause threshold > 2000ms L1 Example of a Dutch text Mijn laatste vakantie was naar Tenerife. Dit was van 16 t.e.m. 21 september... P-burst 21 p-bursts 112 characters 26.1 seconds Example of an English text This weekend was a long weekend , because Friday was a holiday and P-burst 25 p-bursts 81 characters 22.8 seconds 19 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be ≠ = In general students pause shorter in L1 than in L2. This confirms previous findings. Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be 20 L2 Trainingsschool on Keystroke Logging | Antwerp Linguistic Analysis The concept explained read more: manual Inputlog ~ article(s) Part of speech tagging and chunking Aggregating letter to word level Parsing the S-notation Enriching process data with linguistic information There is a man sleeping in an easy chair. EX V DT NN V IN DT JJ NN (PoS-tags, Lemma’s, chunks, Frequencies, ...) NP EX V DT NN V IN 21 Word classes (Part-of-Speech) Word classes (Part-of-Speech) Mean number of words per class (based on two tasks) Mean pause duration before word classes L1 25 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be 24 DT JJ NN L2 L1 In general pauses increase by 26% when writing in L2 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be 26 L2 Trainingsschool on Keystroke Logging | Antwerp Word classes (Part-of-Speech) Word classes (Part-of-Speech) Proportional increase of initial word pause for each word class (English versus Dutch) Mean pause duration before prepositions, pronouns and conjunctions L1 L2 L1 L2 * * * Students have significant longer pauses before prepositions, pronouns and conjunctions in English as opposed to Dutch 29 27 Patterns Patterns The house My house L1 L2 L1 31 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be 32 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be L2 Trainingsschool on Keystroke Logging | Antwerp Patterns | alzheimer project Evidence for parallel processing My house Taken a presentation by Thierry Olive (Antwerp trainingschool on keystroke logging March 2016) L1 L2 H CI Children 1st clause H 2nd clause CI Adults 1st clause 2nd clause 34 35 Conclusions Patterns In the house L1 L2 1. Students in L2 produce shorter texts, write in shorter bursts and pause longer within and between words. 2. Students in L2 especially pause longer before pronouns, preposition and conjunctions. 3. Constituents follow a different pause distribution in different linguistic contexts (e.g. PREP-ART-N ≠ ART-N) Take home message: Diversification of pauses between words are necessary to fully understand the cognitive effort of text production. 36 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be 38 Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be Trainingsschool on Keystroke Logging | Antwerp More information(@uantwerpen.be) Mariëlle Leijten University of Antwerp Research Foundation – Flanders Luuk Van Waes University of Antwerp www.inputlog.net 39 Mariëlle Leijten Flanders Research Foundation (FWO) University of Antwerp – marielle.leijten@uantwerpen.be Luuk Van Waes University of Antwerp Luuk.vanwaes@uantwerpen.be