Lecture 2nd May - AG Wissensbasierte Systeme

Transcription

Lecture 2nd May - AG Wissensbasierte Systeme
Collaborative Intelligence
- Lecture SS 2016 -
Prof. Dr. Andreas Dengel
WM/04.02 S. 92
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?#""1""77"#$%&"
WM/04-05 S. 92
Collaborative Intelligence focuses on the support of
knowledge workers within socio-technical networks
Chapter 1:
Search & Classification
Chapter 2:
Attention-based Collaborative Intelligence
Chapter 3:
Recommender Systems
Chapter 4:
Proactive Multi-Channel Information Extraction
Chapter 5:
Usability in Collaborative Systems
Chapter 6:
Social Media Monitoring, Discovery & Forecast
WM/04.02 S. 93
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?@""1""77"#$%&"
WM/04-05 S. 93
This allows a general description of an index system
An index system I maps a set of terms T on a set of documents D
Assumption is a homogenous representation via an inverted index
I: T ! D
As a result of the mapping we expect a group of documents which contents
wise correlate with the terms of the query
For the evaluation we may use the measures Recall and Precision
WM/04.02 S. 94
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?A""1""77"#$%&"
WM/04-05 S. 94
Basic anatomy of a search system*
Query languages
Query builders
Metadata
Controlled
vocabulary
?
User
Query
Search
Interface
Search
Engine
content
Ranking and clustering
algorithms
Interface design
Results
Users do ask, browse, or search again
Until the succeed or give up
For evaluating of the relevance of documents there is room for
improvement,
WM/04.02 S. 95 i.e.,
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?B""1""77"#$%&"
WM/04-05
S. 95
* Source:
Rosenfeld & Morville, 2002
All methods aim at the modification of either index,
term vector or query
Approach
Index term
consolodation
Stemming
Term vector
expressiveness
Weighting the terms
Query support
Thesaurus
Generalization Specialization
x
Removing stop words
x
x
Distances
x
x
x
Grammar and Dictionary
x
Quorum-Level-Search
x
x
Relevance Feedback
x
x
Contextual Search
x
x
Note that most of the
are implicit
WM/04.02 S.
96
methods
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?&""1""77"#$%&"
WM/04-05 S. 96
How can we weight terms
for the indexing
of large document collections?
Remember:
„ It is here proposed that the frequency of word occurrence
in an article furnishes a useful measurement of word
significance “
Luhn, 1958
WM/04.02 S. 97
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?C""1""77"#$%&"
WM/04-05 S. 97
Let's start with a first attempt to define the term weights of a
document corpus
FIRST APPROACH
The weight wt,d of a term t for a document d is defined by the quotient of
its frequency tft,d in d and the number of documents nD in the entire
document collection D in which the term occurs
Wt,d =
tft,d
nD
The term weight wt,d is high if there are only few articles capturing a
term but the term has a high frequency in a document d
Although there is an inherent danger using such a weighting, we take it
for a first experiment
WM/04.02 S. 98
Note that as result, typical
stop words have a low weight
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"?D""1""77"#$%&"
WM/04-05 S. 98
Assuming we would have a large set of documents and
have the goal to index them - how to proceed?
d
Example
Tendenz zur Lästigkeit
Die institutionelle Kompetenzschwäche Michael Naumanns
und wie er sie nutzen kann. Was der Kulturbeauftragte darf und was nicht.
Staatstragende Überlegungen von Elke Gurlit
Niemand wird bestreiten, dass Gerhard Schröder mit der Etablierung des
Bundeskulturbeauftragten ein Coup gelungen ist. Nicht nur die staatliche
Kulturpolitik, sondern auch das Räsonieren über Kultur hat in den letzten Monaten
einen enormen Bedeutungszuwachs erfahren. Die tägliche Naumann-Meldung
gehört zum unverzichtbaren Repertoire des Feuilletons. Man gewinnt fast den
Eindruck, Michael Naumann handele als Beauftragter unterbeschäftigter
Kulturredaktionen. Zum besseren Verständnis der Stellung des Kulturbeauftragten
lohnt ein Blick auf das Beauftragtenwesen, das sich in der Bundesrepublik
flächendeckend ausgebreitet hat. Wir kennen. [...]
Corpus:
200 German daily
newspaper articles
First of all, the index system has to count the occurrence of terms in
documents
WM/04.02 S. 99
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"??""1""77"#$%&"
WM/04-05 S. 99
Ranking based on term weight
Term
tft,d
nD
wt,d
Bundeskulturbeauftragter
Kulturbeauftragte
Kompetenzschwäche
Naumann
Kulturhoheit
Bundesbeauftragte
parlamentarisch
Lästigkeit
Staatssekretär
Kulturpolitik
Beauftragter
[...]
alle
oder
nach
so
wieWM/04.02 S. 100
dass
5
14
3
11
2
2
12
2
9
5
5
1
3
1
5
1
1
6
1
5
3
4
5.0000
4.6667
3.0000
2.2000
2.0000
2.0000
2.0000
2.0000
1.8000
1.6667
1.2500
191
191
197
195
199
200
0.0052
0.0052
0.0051
0.0051
0.0050
0.0050
1
1
1
1
1
1
Wt,d =
tft,d
nD
Threshold
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$$""1""77"#$%&"
WM/04-05 S. 100
For all terms with wt,D > 1 we can index the text example
d
Example
Tendenz zur Lästigkeit
Die institutionelle Kompetenzschwäche Michael Naumanns
und wie er sie nutzen kann. Was der Kulturbeauftragte darf und was nicht.
Staatstragende Überlegungen von Elke Gurlit
Niemand wird bestreiten, dass Gerhard Schröder mit der Etablierung des
Bundeskulturbeauftragten ein Coup gelungen ist. Nicht nur die staatliche
Kulturpolitik, sondern auch das Räsonieren über Kultur hat in den letzten Monaten
einen enormen Bedeutungszuwachs erfahren. Die tägliche Naumann-Meldung
gehört zum unverzichtbaren Repertoire des Feuilletons. Man gewinnt fast den
Eindruck, Michael Naumann handele als Beauftragter unterbeschäftigter
Kulturredaktionen. Zum besseren Verständnis der Stellung des Kulturbeauftragten
lohnt ein Blick auf das Beauftragtenwesen, das sich in der Bundesrepublik
flächendeckend ausgebreitet hat. Wir kennen [...]
WM/04.02 S. 101
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$%""1""77"#$%&"
WM/04-05 S. 101
But when is a term an index term?
As a matter of fact, the terms should be
selected by their weights
However, using a threshold is only one option
we have
The selection of index terms may be:
on the basis of their rank (e.g. the first five)
by a comparison, e.g. relative frequency
e.g. terms such as “Bundeskulturbeauftragter”,
“Kompetenzschwäche” or “Lästigkeit” have relative frequency of
WM/04.02 S. 102
1/200 = 0,005
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$#""1""77"#$%&"
WM/04-05 S. 102
What do we have
to consider when we deal
with a corpus of documents?
WM/04.02 S. 103
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$@""1""77"#$%&"
WM/04-05 S. 103
There are different ways of how to calculate weights,
each of them causing different effects
SECOND APPROACH
The documents used to create an index system may vary in size
If the weight is only based on the frequency of a term in the document
the evaluation will be distorted
Problem with our
In large documents index terms appear more often,
first approach of
thus the documents are overrated
weighting
Hence, it is necessary to normalize the term frequency tfd,t of the term t
with respect to the quantity nt,d of terms in the document d
ntft,d =
WM/04.02 S. 104
tft,d
nt,d
Note:
The measure is
now related to a
single document
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$A""1""77"#$%&"
WM/04-05 S. 104
However, just frequency does not help to find the best
matches in a document collection
The normalized term frequencies ntft,d are comparable for the same
terms in different documents
However, it does not help in distinguishing between more
relevant and less relevant documents
Note:
Terms do often correlate thematically with a document but are usually
not specific enough for a precise differentiation of the document content
The application of pure frequency measures on the basis of weights
solely leads to a high recall and a low precision
This is because the documents used to create an index system may be
very specific (some terms are very frequent in a collection but
concentrate on a few documents only)
WM/04.02 S. 105
This may lead to an improvement of the precision
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$B""1""77"#$%&"
WM/04-05 S. 105
Low frequency terms define and differentiate the document
content more significantly than high frequency terms
With respect to the document content an index term is the more distinct
the more it appears within the document and the less it appears in general
(inverse document frequency)
The inverse document frequency is expressed by the logarithm of the
quotient of the total number of documents N and the document
frequency dft,D of the term t (Number of the documents in the set of
documents D in which t appears)
idft,D = log *
N
dft,D
The log is uses to restrict
the space of relative
frequency values
The inverse document frequency allows to increase accuracy through
reassessing
the normalized term frequency of a term
WM/04.02 S. 106
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$&""1""77"#$%&"
WM/04-05 S. 106
There are two assumption on which the determination of useful
descriptors for text is based on
Observation
1
The weight of an index term relates to a single document
The very best descriptors are frequent with respect to the total
length of the document
2
The weight of an index term relates to the document corpus
Good descriptors can only rarely found in the document collection,
which leads to a differentiation effect
We may combine the measure of the relative document frequency and
the inverse document frequency to a new weighting
WM/04.02 S. 107
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$C""1""77"#$%&"
WM/04-05 S. 107
This results in a final measurement
THIRD APPROACH
Weight wt,d,D of a term t is defined by the product of its normalized frequency
ntft,d in d and the inverse document frequency idft,D based on a document
collection D
wt,d,D = ntft,d * idft,D
d
Example
Tendenz zur Lästigkeit
Die institutionelle Kompetenzschwäche Michael Naumanns
und wie er sie nutzen kann. Was der Kulturbeauftragte darf und was nicht.
Staatstragende Überlegungen von Elke Gurlit
Niemand wird bestreiten, dass Gerhard Schröder mit der Etablierung des
Bundeskulturbeauftragten ein Coup gelungen ist. Nicht nur die staatliche
Kulturpolitik, sondern auch das Räsonieren über Kultur hat in den letzten Monaten
einen enormen Bedeutungszuwachs erfahren. Die tägliche Naumann-Meldung
gehört zum unverzichtbaren Repertoire des Feuilletons. Man gewinnt fast den
Eindruck, Michael Naumann handele als Beauftragter unterbeschäftigter
Kulturredaktionen. Zum besseren Verständnis der Stellung des Kulturbeauftragten
lohnt ein Blick auf das Beauftragtenwesen, das sich in der Bundesrepublik
flächendeckend ausgebreitet hat. Wir kennen [...]
WM/04.02 S. 108
Assumption:
nt,d = 1427
"Staatssekretär" is 3 times in
the document and in 5
different articles
collection has 200 articles
This results in a weight of
w“staatssekretär“,d,D =
3/1427* log(200/5) = 0,0077
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$D""1""77"#$%&"
WM/04-05 S. 108
Term
wt,d,D = ntft,d * idft,D
Kulturbeauftragte
parlamentarisch
Naumann
Bundeskulturbeauftragter
Kulturpolitik
staatlich
Beauftragte
Kompetenzschwäche
kulturell
institutionell
:
Staatssekretär
WM/04.02 S. 109
0.0412
0.0295
0.0284
0.0186
0.0147
0.0144
0.0137
0.0111
0.0100
0.0098
d
Tendenz zur Lästigkeit
Die institutionelle Kompetenzschwäche Michael Naumanns
und wie er sie nutzen kann. Was der Kulturbeauftragte darf und was nicht.
Staatstragende Überlegungen von Elke Gurlit
0.0077
Niemand wird bestreiten, dass Gerhard Schröder mit der Etablierung des
Bundeskulturbeauftragten ein Coup gelungen ist. Nicht nur die staatliche
Kulturpolitik, sondern auch das Räsonieren über Kultur hat in den letzten Monaten
einen enormen Bedeutungszuwachs erfahren. Die tägliche Naumann-Meldung
gehört zum unverzichtbaren Repertoire des Feuilletons. Man gewinnt fast den
Eindruck, Michael Naumann handele als Beauftragter unterbeschäftigter
Kulturredaktionen. Zum besseren Verständnis der Stellung des Kulturbeauftragten
lohnt ein Blick auf das Beauftragtenwesen, das sich in der Bundesrepublik
flächendeckend ausgebreitet hat. Wir kennen [...]
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%$?""1""77"#$%&"
WM/04-05 S. 109
But how may we express
the relevance of a term
for a document?
WM/04.02 S. 110
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%$""1""77"#$%&"
WM/04-05 S. 110
The term relevance differs according to the term frequency
The term relevance describes its capability to find appropriate
documents within a collection and to disregard inappropriate
documents
The higher the IDF value the higher the relevance of a term for a
document
Based on this, the frequency of a term may have a distinct influence on
the differentiation of documents
Relevant terms are those in the medium frequency range
WM/04.02 S. 111
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%%""1""77"#$%&"
WM/04-05 S. 111
How useful are terms
to differentiate documents?
WM/04.02 S. 112
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%#""1""77"#$%&"
WM/04-05 S. 112
It is possible to determine the usefulness of a term
for differentiation
Define a similarity coefficient SC for all document vectors of an index
system
1
SC =
N(N-1)
N
N
""
t=1
k=1, k#t
S(dt , dk )
Note: S refers to the
similarity of two
documents (later more)
After excluding the term i from the document vectors another similarity
coefficient SC/i may be calculated
The differentiation coefficient DCi describes changes of two similarity
coefficients that have been caused by the index term i
WM/04.02 S. 113
DCt = SC – SC/i
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%@""1""77"#$%&"
WM/04-05 S. 113
The differentiation coefficient can be incorporated
into the weighting of the terms
Its particular impact on the index system can take effect through a novel
weighting
wt,d,D = ntft,d * DCt
or
wt,d,D = ntft,d * idft,D * DCt
In this case term weights are evaluated on the basis of the
differentiation potential of the vectors
The novel weighting allows the optimization of both recall and precision
in the results
WM/04.02 S. 114
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%A""1""77"#$%&"
WM/04-05 S. 114
So how does
an index system work?
WM/04.02 S. 115
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%B""1""77"#$%&"
WM/04-05 S. 115
The work with an index system can be divided
in two subtasks
The first subtask deals with building the inverted index:
1
Identification of every word in a set of documents
2
Elimination of all stop-words which do not have a value
regarding the differentiation of documents
3
Generating the root word for each index term (stemming)
4
Calculating the weight of each root word
5
Representation of each document through all root words
and the associated weights
WM/04.02 S. 116
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%&""1""77"#$%&"
WM/04-05 S. 116
The second subtask of an index system deals with query processing:
Precondition is an existing inverted
index which is employed by the
individual processing steps
1
Manual or automated formulation of a query
2
Generating the root words of the query terms (stemming)
3
Detecting the set of appropriate documents
4
Output of the result according to relevance
WM/04.02 S. 117
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%C""1""77"#$%&"
WM/04-05 S. 117
Basis for a reasonable term weighting is a homogenous
domain
d1
Example
Computer werden im Information Retrieval eingesetzt. Es existieren Verfahren auf
Computern für automatisches Retrieval. Moderne Computer ermöglichen ein
effizientes Retrieval
d2
Nutzer von Systemen zum Information Retrieval wurden befragt. Viele Nutzer
waren mit der Funktionalität des Retrieval zufrieden. Die vorhandenen Systeme
zum Information Retrieval genügen den Anforderungen der Nutzer. Es existieren
eine Reihe von Systemen auf Computern
d3
Die Entwicklung neuer Systeme für das Information Retrieval wird von vielen
Nutzern begrüsst. Die Entwicklung zielt auf neue Methoden des Retrievals mit
Computern ab. Systeme zum effizienten Retrieval nach Information befinden sich
derzeit in der Entwicklung.
d4
Das Information Retrieval wird in Datenbanken durchgeführt. Verschiedene
Datenbanken haben eine Oberfläche für den Nutzer, die ein zielgerichtetes
Retrieval in Informationsräumen ermöglicht. Verschiedene Systeme für ein
Retrieval in Datenbanken stehen derzeit dem Nutzer zur Verfügung.
d5
Task:
Automated indexing of
those five documents
Die Entwicklung von Systemen zum Retrieval in Informationsräumen ist für viele
Nutzer von Datenbanken interessant. In Informationsräumen kann man navigieren
und somit das Information Retrieval unterstützen. Der Informationsraum wird
WM/04.02
S. 118auf Computern visualisiert.
dreidimensional
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%D""1""77"#$%&"
WM/04-05 S. 118
In the first step high frequency words are eliminated
The stop-word list contains all kinds of words except for nouns
(exceptions: procedure, requirement, row, method, availability,
functionality, surface)
Analyzed nouns are reduced to nominative singular
In the second step the term weight is calculated on the basis of TF/IDF
In the third step the quality of an index term is evaluated by defining
threshold values
In the given example we can do without an upper threshold
value because of the volume of the stop-word list
WM/04.02 S. 119
(... optional homework exercise)
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%%?""1""77"#$%&"
WM/04-05 S. 119
Are there alternative approaches
to deal with the length
of a document when evaluating
the captured text?
WM/04.02 S. 120
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#$""1""77"#$%&"
WM/04-05 S. 120
Let us consider an example
Document d1: {data, multimedia, computer, retrieval, retrieval}
Document d2: {similarity, multimedia, retrieval}
Document d3: {data, computer, data}
data
multimedia
retrieval
similarity
computer
d1 = { 1/5,
1/5,
2/5,
0,
1/5}
d2 = {
0,
1/3,
1/3,
1/3,
0}
d3 = { 2/3,
0,
0,
0,
1/3}
WM/04.02 S. 121
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#%""1""77"#$%&"
WM/04-05 S. 121
Vectors describe multi-dimensional spaces for representing
documents and queries
Based on their term weights queries and documents are represented as
vectors in the vector space
e.g.: q = data, retrieval
wdata,d
q
Interpretation:
The smaller the angle between the
query vector and the document
vector the higher the relevance of
the document
d3
Note:
d1
wretrieval,d
d2
A cosine value of zero means that
the query and document vector are
orthogonal and have no match
The so called vector space model is the most commonly used model
WM/04.02 S. 122
today
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%##""1""77"#$%&"
WM/04-05 S. 122
The angle between two vectors is determined by the so
called cosine measure
Instead of the angle, the cosine of the angle is easier to calculate:
The similarity of a query and a document is expressed by the cosine
measure, i.e. as a correlation of the vectors quantified by the cosine of
the enclosed angle !
Employing a scalar product it is possible to calculate the length of a
vector as well as the angle between two vectors
While determining the angle, the length of a vector can be neglected
S(di, q ) = cos ! =
di x q
di * q
WM/04.02 S. 123
Note:
n
q =
"q
2
i=1
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#@""1""77"#$%&"
WM/04-05 S. 123
The vector space model reveals some obvious strength
and weaknesses
positive
Its calculation is straight-forward because of the simple model based on
linear algebra
It considers the term weights instead of Boolean values
It allows computing a continuous degree of similarity between queries
and documents
It allows ranking documents according to their possible relevance
negative
Long documents are poorly represented because they have poor
similarity values
Search keywords must precisely match document term, i.e. problems
with substrings (! false positives) or missing contextual relevant
documents because of different terminology (! false negatives)
WM/04.02 S. 124
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#A""1""77"#$%&"
WM/04-05 S. 124
Hence, not only a query but an entire document can
function as a query
The similarity S(d1, d2) of two documents d1 and d2 is calculated on the
basis of the scalar product of the respective term weights wt,d1,D and wt,d2,D
divided by the cosine between the two vectors
n
S(d1, d2 ) =
"w
i=1
ti,d1
n
"
i=1
* wti,d2
n
wti,d12 *
"
wti,d22
i=1
Interpretation of the method:
Low frequency terms reduce the similarity of documents since they
appear rarely
High frequency terms increase the similarity since they appear in many
documents
Terms with a medium range frequency tend to organize a collection into
clusters differentiated by content
WM/04.02 S. 125
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#B""1""77"#$%&"
WM/04-05 S. 125
How are these concepts
used in web search?
WM/04.02 S. 126
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#&""1""77"#$%&"
WM/04-05 S. 126
Web search poses unique challenges to information retrieval
Methods that work well on well controlled document collections often do not
produce good results on the web
Vector space model returns documents that most closely
approximate the query
On the web, this strategy often returns very short documents that are
the query plus a few words (see weaknesses of VSM).
Major challenges for web search engines include
Ultra large scale (60 Billion ++ web pages to index)
High throughput (several hundred Million queries per day)
Extreme variation in document contents (language, vocabulary, format, !)
Companies deliberately manipulating
search engines for profit
Use of external meta-information
WM/04.02 S. 127
The Original Google Paper
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"
6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#C""1""77"#$%&"
WM/04-05 S. 127
PageRank: Bringing Order to the Web
Makes use of the link structure of the Web to calculate a quality ranking for
each page, called the PageRank.
A probability distribution used to represent the likelihood that a person
randomly clicking on links will arrive at any particular page.
It considers the importance of each page that casts a vote, as votes from
some pages are considered to have greater value, thus giving the linked
page greater value.
$
PR(Ti ) '
)
PR(A) = (1! d) + d & #
&%Ti "L( A) C(Ti ) )(
Note that PageRanks form a
probability distribution of
webpages, so the summation
of all webpages will be 1.
PR(A) " PageRank of a webpage A
PR(Ti) " PageRank of a webpage Ti pointing to A
C(Ti) " Number of outbound links for webpage Ti
L(A) " Set of webpages linking to A
WM/04.02 S. 128
d
" Damping factor, a value between 0 and 1, is the probability that a random surfer
will stop clicking
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#D""1""77"#$%&"
WM/04-05 S. 128
How to apply these techniques
for document classification?
WM/04.02 S. 129
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%#?""1""77"#$%&"
WM/04-05 S. 129
Classification means the association of objects with
known categories
Classification in information retrieval addresses the association of
documents to contents wise defined document classes
The purpose of document classification is automated routing of documents
and organized filing of documents
Unknown
documents
Categorization
system
K1
How is the set of the
classes K = [K1, ..., Kn]
formed?
K2
K3
WM/04.02 S. 130
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@$""1""77"#$%&"
WM/04-05 S. 130
The difference to the approach is given by an altered
mapping function
The domain is formed by a set T of terms; the range of values is now
reduced to a defined number of classes K
Now the terms refer to a set of classes instead of documents as before
I‘: T ! K
Mapping of an index
system for classification
The challenge is to extract all terms relevant for splitting the set of
documents into classes
WM/04.02 S. 131
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@%""1""77"#$%&"
WM/04-05 S. 131
If a document is to be assigned to a class its term vector is given to the
system I‘‘ which identifies the respective class
I‘‘: d ! K
Mapping of an index
system for the
classification of
documents
The difference in function causes an altered design of the inverted matrix
The basis is a set of documents that has already been classified
The term vectors created on this basis are not stored separately for
each document anymore but are combined to class vectors
WM/04.02 S. 132
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@#""1""77"#$%&"
WM/04-05 S. 132
Index terms of document vectors belonging to a class are transferred into a
term vector
Example: the combination of document vectors and class vectors
d1
d2
d3
Document vectors
t1
of class 1
3
5
0
t1
0
0
2
t2
1
0
4
t2
7
2
4
t3
0
9
2
t3
0
3
4
Note:
During a query the class of a
document is defined by
comparing the document vector
and the class vector
dM-2 dM-1 dM
k1
kN
t1
8
2
t2
5
13
t3
11
7
Document vectors
of class N
Inverted index of
the class vectors
In contrast to common index systems, which are only capable to evaluate already
indexed
documents,
now the examination of documents unknown to the system
WM/04.02
S. 133
is also possible
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@@""1""77"#$%&"
WM/04-05 S. 133
Here again, similarity measures determine the assignment
of documents to classes
The assignment of a document to a class is determined by the similarity
of its document vector d and the respective class vector k
The more similar those vectors are, the more likely it is that
the document belongs to this class
In order to determine the similarity of two vectors different evaluation
functions S are available, measuring the distances between vectors
WM/04.02 S. 134
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@A""1""77"#$%&"
WM/04-05 S. 134
Normalizing the evaluation with respect to the size of the
vectors ensures the comparability of the results
Normalization of the evaluation is done by determining the angle
between two vectors in the vector space
In order to do so, the vectors are considered parts of a highdimensional vector space which is given by the amount of all terms
within the system
The sinus of the angle between document vector and class vector in
this space determines their similarity
S(d, k) = 1-
""
t$d
s$k
"
t$d
WM/04.02 S. 135
2
tft,d * wt,k
tft,d 2 *
"
s$k
* at,s
wt,k 2
The disadvantage of this function
is its complexity
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@B""1""77"#$%&"
WM/04-05 S. 135
The categorization results can be expressed
in different ways
Singular categorization („best match“)
Assorted categorization („ranked match“)
1.
2.
3.
0,6
0,2
0,1
Validated categorization („measured match“)
The
different types to express the results complicates the comparability of
WM/04.02 S. 136
the systems
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@&""1""77"#$%&"
WM/04-05 S. 136
Some more
sophisticated methods
also include the consideration
of word context!
WM/04.02 S. 137
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@C""1""77"#$%&"
WM/04-05 S. 137
Terminological Conceptualization of Information Objects
based on term similarity (syntactic distance)
REMEMBER
Determine term similarity through a pre-computed statistical analysis of strings
captured in a document
Association matrices quantify term correlations based on how frequently they
co-occur (Term Co-Occurrence Matrix)
The correlation cij between terms ti and tj is expressed by their joint
occurrence within a document
WM/04.02 S. 138
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@D""1""77"#$%&"
WM/04-05 S. 138
However, how is the situation at a workspace of a
knowledge worker
In office environments people classify documents according to their
preferences, i.e. they generate folders as categories and name them
Resulting taxonomies correspond to subjective concepts of the world
but !
! have no unique meaning
vacation
WM/04.02 S. 139
vacation
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%@?""1""77"#$%&"
WM/04-05 S. 139
The perception of documents is subjective
In office environments people classify documents according to their
preferences, i.e. they generate folders as categories and name them
Resulting taxonomies correspond to subjective concepts of the world
but !
! have no unique meaning
! do not allow perspective considerations
How?
What?
?
Where?
!
Who?
WM/04.02 S. 140
Lecture
Knowledge Management
TU KL
Dengel
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A$""1""77"#$%&"
WM/04-05 S. 140
The perception of documents is subjective
In office environments people classify documents according to their
preferences, i.e. they generate folders as categories and name them
Resulting taxonomies correspond to subjective concepts of the world
but !
! have no unique meaning
! do not allow perspective considerations
! are not integrative
Files System
Email-Folders Bookmarks
File System
Personal Favorites
Local Files
Keynotes
Inbox
Outbox
Hybrid
Classification
Personal Memory
Semantic Desktop
WM/04.02 S. 141
Contacts
Miles
Zhang
Novotel Melbourne
Springer Homepage
HCM 2006
KSEM07
EDOC 2006
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A%""1""77"#$%&"
WM/04-05 S. 141
Using term co-occurrence, the documents stored in
folders allow to dynamically learn content profiles
clouds
water
wind
blue sky
wave
ocean
sand
snorkeling
palm tree
coral reef
vacation
shell
Barbados
Profiles represent subjective
perceptions of content and do have
a descriptive character for the
content of a folder
The profile of newly created or incoming documents are compared with the
folder profiles and categorized accordingly
Storing a document in a folder causes a dynamic adaptation of the profile
during the course of time
Profiles can be used for categorizing new documents and for query expansion
WM/04.02
S. 142
while
searching
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A#""1""77"#$%&"
WM/04-05 S. 142
Multi-dimensional management may generate “on-the-fly”
metadata to describe context
The “Email”-Folder is just
part of the view addressing
document types
The application of several taxonomies enable
the multi-dimensional management of content
So, for example, content can be administered
by virtual folders on an overlying level
synchronously to Explorer, email-system and
browser
Content is simultaneously assigned to
different folders (criteria or categories)
WM/04.02 S. 143
Folder names (categories) and views (super
categories) provide meta data for indexing
content
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A@""1""77"#$%&"
WM/04-05 S. 143
Views may be individually centered or group centered
Organizational view of the
business structure
Individual views
(my personal views)
View of running projects
WM/04.02 S. 144
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%AA""1""77"#$%&"
WM/04-05 S. 144
How to evaluate
document classification systems?
WM/04.02 S. 145
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%AB""1""77"#$%&"
WM/04-05 S. 145
Confusion matrices or contingency charts enable to summarize
correct and incorrect class assignments
A confusion matrix can be used for comparing desired results and
classification results
Example: single binary classification
Ground Truth
Classifier
K
¬K
K
a
b
¬K
c
d
The matrix represents the four possibilities of the binary classification
WM/04.02 S. 146
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A&""1""77"#$%&"
WM/04-05 S. 146
Extension to n classes in order to evaluate classifier with
more than two classes
Since it may occur that a document cannot be assigned to any class, the
new class R (reject) is introduced
Ground Truth
Ai,j indicates the
number of documents
assigned to class Ki
which – according to
the ground-truth
information - belong
to class Kj
WM/04.02 S. 147
Classifier
K1
K2
K3
...
Kn
K1
A1,1
A1,2
A1,3
...
A1,n
K2
A2,1
A2,2
A2,3
...
A2,n
K3
A3,1
A3,2
A3,3
...
A3,n
Kn
An,1
An,2
An,3
...
An,n
R
R1
R2
R3
...
Rn
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%AC""1""77"#$%&"
WM/04-05 S. 147
On the basis of clear yes/no decisions the results can
be divided into four types
1
2
3
4
True Positives: The number Ai of
documents that are correctly
assigned to class Ki
False Positives: The number Bi of
documents that are falsely assigned
to class Ki
False Negatives: The number Ci of
documents that do belong but have
not been assigned to class Ki
True Negatives: The number Di of
documents that do not belong and
have not been assigned to class Ki
WM/04.02 S. 148
Ground Truth
Classifier
K1
K2
K3
...
Kn
K1
A1,1
A1,2
A1,3
...
A1,n
K2
A2,1
A2,2
A2,3
...
A2,n
K3
A3,1
A3,2
A3,3
...
A3,n
Kn
An,1
An,2
An,3
...
An,n
R
R1
R2
R3
...
Rn
The values Ai, Bi, Ci, and Di
correlate with the values a, b,
c, d for binary classification
assuming that Ki is the class K
and ¬K the unification of all
classes Kj with i " j and the
reject-class R
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%AD""1""77"#$%&"
WM/04-05 S. 148
Contingency charts are perfect data structures to deal with
classification results
Simple calculation of recall and precision
Recall ( Ki ) =
Ai
Ai + Ci
Precision ( K i ) =
Ai
Ai + Bi
Recall or precision values are often combined with an additional
parameter %
Calculation of the so called F-Measure F%:
2
(% + 1) * Preci si on(Ki ) * Recal l (Ki )
% 2Preci si on(Ki) + Recall (Ki )
F% =
2
%
(
+ 1) * A i
=
2
2
WM/04.02 S. 149(% + 1) * Ai + Bi+ % * Ci
% is
between[0.."[ and
indicates the effect
of the respective
measure on the
evaluation
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%A?""1""77"#$%&"
WM/04-05 S. 149
Example for low recall and high precision
Klassen
Klassifikatiosnaufgabe
Klassifiziert
WM/04.02 S. 150
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B$""1""77"#$%&"
WM/04-05 S. 150
Example for high recall and high precision
WM/04.02 S. 151
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B%""1""77"#$%&"
WM/04-05 S. 151
In addition cost-benefit measures can be applied
Enables to evaluate the benefit of each correct class assignment differently
in order to give a higher priority to particular document types
Represent user preferences (e.g. the costs of a false classification in the
incoming mail scenario)
In the most common setting the benefit ben is defined for a correct
classification or the cost are defined for an incorrect classification
n
c/b-Measur e ben,cost () =
i= 1
Ai * ben - Bi * cost
WM/04.02 S. 152
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B#""1""77"#$%&"
WM/04-05 S. 152
Charts can be used to administer costs and benefits
of many classes simultaneously
Differentiated evaluation of individual cases:
Ground Truth
Classifier
K1
K2
K3
!
Kn
K1
Ben1
Cost1,2
Cost1,3
!
Cost1,n
K2
Cost2,1
Ben2
Cost2,3
!
Cost2,n
K3
Cost3,1
Cost3,2
Ben3
!
Cost3,n
Kn
Costn,1
Costn,2
Costn,3
!
Benn
For every single class Ki the benefit Beni is defined for a correct
classification
For every possible mistake of the document of the ground-truth class
Ki, in the
class Kj i#j the costs Costi,j are defined
WM/04.02
S. 153
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B@""1""77"#$%&"
WM/04-05 S. 153
... some additional techniques?
WM/04.02 S. 154
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%BA""1""77"#$%&"
WM/04-05 S. 154
User profiles are used for information filtering
User
profile
Unknown
documents
Relevant
documents
Information
Retrieval
System
WM/04.02 S. 155
Not relevant
documents
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%BB""1""77"#$%&"
WM/04-05 S. 155
Passage retrieval solves a typical problem of information
retrieval
Search for a number, a name, a result, a place, etc!
How many employees are working at Facebook?
When is the next plenary meeting of Telekom AG?
Who is the chairman of the board of the Deutschen Bahn?
Where ... ?
What is a ... ?
While conventional document retrieval delivers a whole document as an
answer to a query, passage retrieval identifies relevant sentences or
passages in the document collection
Realization for example with the aid of the vector space model
WM/04.02 S. 156
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B&""1""77"#$%&"
WM/04-05 S. 156
Passage Retrieval aims at providing the most relevant
passages from a document collection
Weights of Terms
in Document
WindowFunction f(x)
W: Size of the Window
Density Distribution
Maximal Value
WM/04.02 S. 157
Answer Passage
[ ]
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%BC""1""77"#$%&"
WM/04-05 S. 157
Chapter 2
Attention-based
Collaborative Intelligence
WM/04.02 S. 158
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%BD""1""77"#$%&"
WM/04-05 S. 158
'E/:+(9<(/"4E*":<()"-9*4(/08")+=+()-"4("<(9*<(-<;":49<F,9<4("
,()"4(";4/(<9<F+",99+(9<4("94"E()+*-9,()",()"0+,*("
WM/04.02 S. 159
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"
!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%B?""1""77"#$%&"
WM/04-05 S. 159
Eye tracking is one option for getting better insights into
contextual behavior
In many cases, eye
tracking experiments are
still done by using a special
head-installed device
WM/04.02 S. 160
Sources: www.cure.at, www.egr.vcu.edu
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&$""1""77"#$%&"
WM/04-05 S. 160
GH+"(+5"/+(+*,9<4("4I"+8+"9*,;J+*-",*+"0</H9"5+</H9",()"
E(,K9*E-<F+"
WM/04.02 S. 161
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&%""1""77"#$%&"
WM/04-05 S. 161
Up to now, gaze data from eye trackers is widely used for
usability applications or in “passive“ tool for behavior analysis
WM/04.02 S. 162
Sources: www.andreas.com, www.agencytimes.net,
www.stz-medienforschung.de, www.usibilty.at
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"
!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&#""1""77"#$%&"
WM/04-05 S. 162
However, what is the difficulty
in attention recognition?
WM/04.02 S. 163
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&@""1""77"#$%&"
WM/04-05 S. 163
Your Daily Illusion !
WM/04.02 S. 164
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&A""1""77"#$%&"
WM/04-05 S. 164
Eye Movements
saccadic suppression!
WM/04.02 S. 165
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&B""1""77"#$%&"
WM/04-05 S. 165
1
6
So how understanding more
about textual relevance in context
WM/04.02 S. 166
Source: www.childrebookblock.com
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&&""1""77"#$%&"
WM/04-05 S. 166
Reading is based on ocular movements divided into
fixations and saccades
Some fundamental facts:
!"#$%&%'#()*$%+)%(#&,+$-%&%)#$.#$/#%)+0#$.012%
."#%#1#%3*4#3#$.)%)"*5%."&.%$*.%#4#(1%
5*(,%+)%67&.#,8%94#(1%*$/#%+$%&%5"+0#2%&%
(#-(#))+*$%:&$%#1#%3*4#3#$.%."&.%-*#)%;&/<%
+$%."#%.#7.=%+)%3&,#%.*%(#>#7&3+$#%&%5*(,%
."&.%3&1%"&4#%$*.%;##$%?@001%@$,#().**,%."#%
6().%A3#8%B"+)%*$01%"&''#$)%5+."%&;*@.%CDE%
*?%."#%67&A*$)%,#'#$,+$-%*$%"*5%,+F/@0.%
."#%.#7.%+)8%B"#%3*(#%,+F/@0.%."#%"+-"#(%."#%
0+<#0+"**,%."&.%(#-(#))+*$)%&(#%3&,#8%
Fixations appear, when the eye gaze
pauses in a certain position - normally
lasting between 200 and 400 ms
Saccades are the jumps of the gaze
between fixations taking 10-20 ms
Reading does not happen in exact
linear saccadic movements,
sometimes we need control
fixations, that move towards the text
directions
Regression rate depends on the
subjective difficulty of a text
WM/04.02 S. 167
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&C""1""77"#$%&"
WM/04-05 S. 167
Some fundamental facts:
The distance of two fixation points is about 8 characters
The perceptual span, that is the size of the visual window, where the reading occurs, is
asymmetric (Moving Window Technique)
Humans also read some text left and right of a fixation point
For reading a text:
3 to 4 letters
to the left
up to 15 letters
to the right
(alignment depends on the language)
The asymmetry is caused by the fact that the information to be captured is within the
WM/04.02
S. 168
region right
of the
fixation point (for readers in Hebrew it is vice versa)
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&D""1""77"#$%&"
WM/04-05 S. 168
While reading we follow the text in order to understand the
captured message
Some fundamental facts:
Left
Right
Book
5 sec.
Time
WM/04.02 S. 169
Horizontal Eye Position
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%&?""1""77"#$%&"
WM/04-05 S. 169
The perceptual sensibility during and after fixation is different
Some fundamental facts:
Perceptual Sensibility
100%
There is a kind of blindness
while moving the eye (which is
called saccadic suppression)
Almost all information from the
eye is made available during a
fixation
50%
Rare and unexpected words
have to be fixated longer than
common and known terms
(Spillover Effect)
before saccade starts
200
100
after saccade starts
0
100
Time (ms)
WM/04.02 S. 170
200
300
Humans read with
expectations, i.e. expected
words often are not fixated
anymore
Results from two studies*:
* Source: Goldstein, 2002
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C$""1""77"#$%&"
WM/04-05 S. 170
However, reading behavior is different and may be categorized
depending on purpose, form and reading process
Reading categorization:
Form
Silent
Reading
Purpose
Oral
Recreatory
Psychological Process
Motivated
Distinguishing all four
modes via eye tracking is
very hard since there are
different reader types
Observatory
Assimilative
Reflective
Creative
(Noting)
(Understanding /Remembering)
(Evaluating)
(Employing)
Skimming addresses a quick movement of the eyes across the
page, picking up the occasional observation or idea
! when the assignment is not too important
Not (that) relevant
Reading has process every sentence, and then try make use of
the salient
arguments
WM/04.02 S. 171
! when we know we’ll later profit from the material
Relevant
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C%""1""77"#$%&"
WM/04-05 S. 171
We use eye tracking for
distinguishing reading and
skimming behavior!
WM/04.02 S. 172
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C#""1""77"#$%&"
WM/04-05 S. 172
For reading mode detection we build on an eye tracker from
Tobii with an integrated sensing
Setting:
Eye tracker emits infrared light (invisible for humans)
Eye ground reflects the light back to the eye tracker
A camera sensitive to infrared light detects these
reflections and computes the focus point of attention
Unobtrusive, relatively precise (1° of visual angle)
May be adjusted in short time and even works
for users with glasses
WM/04.02 S. 173
Sources: www.tobii.com, www.dfki.de
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C@""1""77"#$%&"
WM/04-05 S. 173
For reading mode detection we use various filters along a
processing chain
Eye Tracker
2")0-%3$)%0$45-66-%3$
7","+.&%$8-9,"*$
?&-%,$&@$2"3)*0$
'&6/A,).&%$
8-9,"*$
Computation of
Memory for
saccade
features
point of regard
%&$
'()*)+,"*-#.+$
#+)%/),($0","+,"01$
!"#$
Reading and
skimming
differentiation
8-C).&%$7","+.&%$
8-9,"*$
Memory
for points
of regard
%&$
8-C).&%$"%0"01$
:-%";<="*)3-%3$8-9,"*$
!"#$
Horizontal
averaging
Fixation
clustering
:-%"$>),+(-%3$8-9,"*$
4)++)0"$'9)##-B+).&%$
8-9,"*$
OCR-based bounding
box detection and
line matching
Classification of
saccade based on
last two fixations
WM/04.02 S. 174
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%CA""1""77"#$%&"
WM/04-05 S. 174
Noisy gaze data from the eye tracker
Eye Tracker
Point of Regard
Computation Filter
Fixation Detection
Filter
Saccade
Classification Filter
Reading and Skimming
Detection Filter
Line-Averaging Filter
Line Matching Filter
WM/04.02 S. 175
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%CB""1""77"#$%&"
WM/04-05 S. 175
For recognizing fixation we apply a so-called dispersion approach
Gaze locations produced by the eye (50 Hz, real time):
- Slow Motion 9
30 pixel
2
3
1
8
5
6
4
7
Outliers
50 pixel
10
Fixation with Drift
11
14
12
13
New Fixation
New fixation is detected if 4 successive nearby gaze locations are accumulated
Gaze points are considered nearby when they fit
together in a circel of 1° diameter (30 pixel)
The circle is grown to be robust against drifting to 50 pixel
correspond to a duration
between 80 and 100 ms, which is
the minimum fixation duration
according to the literature
Fixations are determined based on gaze location and gaze order
WM/04.02 S. 176
G. Buscher, A. Dengel and L. van Elst, High Level Eye Movement Measures for Relevance Assessments of Information Items,
Proceedings CHI 2008, Florence, Italy (Apr. 2008).
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C&""1""77"#$%&"
WM/04-05 S. 176
Noisy gaze data from the eye tracker
Eye Tracker
Point of Regard
Computation Filter
Fixation detection and saccade classification
Fixation Detection
Filter
Saccade
Classification Filter
Reading and Skimming
Detection Filter
Line-Averaging Filter
Line Matching Filter
WM/04.02 S. 177
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%CC""1""77"#$%&"
WM/04-05 S. 177
In use cases we were able to distinguish different saccade
features that may be used to characterize the reading mode
Move forward
Read forward movements
Skim forward movements
Distinguished via distances between
fixation points along the text
orientation (number of characters)
Long skim jumps movements
Move backward
Short regressions
Long regressions
Distinguished via distances between
fixation points towards the text
orientation (number of characters)
Reset jump
Go to new line
Move elsewhere
Unrelated move
WM/04.02 S. 178
All other movements
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%CD""1""77"#$%&"
WM/04-05 S. 178
Based on this observation saccade features may be
classified and appropriate scores may be associated
Note that these values may
differ depending on the type
of reader
WM/04.02 S. 179
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%C?""1""77"#$%&"
WM/04-05 S. 179
Reading Detection – Example
Tobii 1750
eye tracker
Reading
Skimming
Plausibility of reading
sr = 62
Plausibility of skimming
ss = 51
WM/04.02 S. 180
Reading
behavior
detected
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D$""1""77"#$%&"
WM/04-05 S. 180
Noisy gaze data from the eye tracker
Eye Tracker
Point of Regard
Computation Filter
Fixation detection and saccade classification
Fixation Detection
Filter
Saccade
Classification Filter
Reading and Skimming
Detection Filter
Reading identification and saccade sequence alignment
.
Line-Averaging Filter
Line Matching Filter
WM/04.02 S. 181
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D%""1""77"#$%&"
WM/04-05 S. 181
Results can be represented by gaze-based document
meta data*
Line-matching by mapping with line segmentation results (plus OCR)
Store reading information as document annotations in a semantic Wiki
[Rayner 1998], the eye shows a very characteristic behavior composed of fixations
and saccades. A fixation is a time of about 250ms on average when the eye is
steadily gazing at one point. A saccade is a rapid, ballistic eye movement from one
fixation to the next. The mean left-to-right saccade size is 7-9 letter spaces. It
depends on the font size and is relatively invariant concerning the distance
between the eyes and the text.
Annotation (Read)
Delete
author: Georg
start date: 07.12.2009 10:46:08
End date: 07.12.2009 10:46:12
length: 226 chars
mean fixation duration: 217ms
mean saccade length: 9.4 chars
regression ratio: 13.9%
task: write report
An enormous amount of research has been done during last one hundred years
concerning eye movements while reading. When reading silently, as summed up
in [Rayner 1998], the eye shows a very characteristic behavior composed of
fixations and saccades. A fixation is a time of about 250ms on average when the
eye is steadily gazing at one point. A saccade is a rapid, ballistic eye movement
from one fixation to the next. The mean left-to-right saccade size is 7-9 letter
spaces. It depends on the font size and is relatively invariant concerning the
distance between the eyes and the text.
G. Buscher, A. Dengel, L. van Elst, F. Mittag, Generating and Using Gaze-Based Document Annotations, in Proceedings CHI 2008,
WM/04.02 S. 182
Florence, Italy (Apr. 2008).
G. Buscher, A. Dengel and L. van Elst, Query Expansion Using Gaze-Based Feedback on the Subdocument Level,
Proceedings SIGIR ‘08, 31st Annual Int’l ACM SIGIR Conference, Singapore, (July 2008), accepted for publication
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D#""1""77"#$%&"
WM/04-05 S. 182
The various measures may
be used to determine the
relevance of read text!
WM/04.02 S. 183
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D@""1""77"#$%&"
WM/04-05 S. 183
Not all of the features are valid for defining a measuring
the text relevance
+
Fixation duration
Fixation count
vs
Average saccade length
vs
+
Regression rate
+
Viewing time
-
Reading vs. skimming behavior
+
Length of coherently read text
WM/04.02 S. 184
vs
+
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%DA""1""77"#$%&"
WM/04-05 S. 184
There is high variability of most eye movement measures
both within as well as between readers
Since it is difficult to build methods estimating relevance of read text based on
absolute values, gaze measures are individually personalized
Procedure:
! Determine distribution of a
measure for an individual user by
analyzing all of her/his recorded
eye movement data during
reading (forward saccade lengths)
Upper and lower whiskers define a
user-specific interval where outliers
are excluded
! Compute upper/lower whiskers
(limits) concerning the measure's
value distribution, e.g.
lower whisker = max(min, lq - 1.5 * iqr)
! Normalize absolute values of the
eye movement measures with
respect
to the
individual whiskerWM/04.02
S. 185
intervals
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%DB""1""77"#$%&"
WM/04-05 S. 185
In an experiment we could prove that the more intensive a
given text is read, the more useful it is for the reader
Percentage of read text for documents broken down by relevance judgments
WM/04.02 S. 186
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D&""1""77"#$%&"
WM/04-05 S. 186
Based on the relevance measure we may attach so-called
attention paths to best practices
Assuming we would be in the task context of a knowledge worker:
Document 4
Task X
Document 3
Document 2
Document 1
next
next
prev
next
WM/04.02 S. 187
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%DC""1""77"#$%&"
WM/04-05 S. 187
Using eye tracking for text-based
information retrieval
WM/04.02 S. 188
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%DD""1""77"#$%&"
WM/04-05 S. 188
Query Expansion /
Reformulation Engine
Ranked
Result List
Re-ranking Mechanism
“Island”
Documents
Personalized
Documents
Annotated
Documents
User Model
Implicit
Feedback
ImplicitRelevance
/Explicit Relevance
Feedback
User Context, etc.
User Context, etc.
Eye Tracker
User Observation
Non-personalized Data
Retrieval Engine
Objective Method
Query
Retrieval Knowledge
Sim. Measure & Background
Personalized Data
Document
Corpus
User-Centered Method
User
a query
andtohas
to view
and
filter the
Therecreates
are various
option
allow
for the
consideration
of
result
list in order
to find retrieval
the relevant documents
user-centric
(subjective)
WM/04.02 S. 189
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%D?""1""77"#$%&"
WM/04-05 S. 189
Reading and attention data allows the implementation of
implicit relevance feedback
Explicit relevance
feedback based
on Rocchio is
effective but
requires extra
efforts
Implicit feedback
is an alternative
for automatic
recognition of
relevance
WM/04.02 S. 190
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?$""1""77"#$%&"
WM/04-05 S. 190
Gaze-based methods provide a remarkable improvement of
20% information gain compared to classical approaches
!"#$%&'%(#)*%
I(#+?4*-#0!
Term Extraction
J
".+/0121+34+
:3.96.9+
".+/0121+34+
56072.8+7090+
K
Individualized Result List
G#()*$&0+H&A*$%
;0&/<%;*7%
Query Expansion
)##$!
)##$
and read
"#$%&'(!)$*+,$,(!-.!
"#/'(!0'12$(!-!.!
WM/04.02 S. 191
G. Buscher, A. Dengel and L. van Elst, Query Expansion Using Gaze-Based Feedback on the Subdocument Level,
Proceedings SIGIR ‘08, 31st Annual Int’l ACM SIGIR Conference, Singapore, (July 2008).
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?%""1""77"#$%&"
WM/04-05 S. 191
This approach may be also used for
improving classifier learning
WM/04.02 S. 192
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?#""1""77"#$%&"
WM/04-05 S. 192
'==0<;,9<4("LM,:=0+N".4;E:+(9"O0,--<P;,9<4("
!;0112<:0923.+21+73.6+/=+
>0.?0;;=+>3@2.8+73:?>6.91+
2.93+0+43;765+
!;0112<:0923.+21+/0167+3.+
1?/A6:92@6+B65:6B923.+34+:3.96.9+
!"#$%&'&()*+,-&./01%&"2&"/103,%&
"$*+1&:;</,%"'%!#",/"-%=&36$%
6)/>)%
,<)%?@ABC&
W35+>0N2.8+D65XD21+76:2123.+561B6:92.8+
:0968352Y0923.G++0+?165EEE+
EEE +56071+13>6+B0110861+
EEE +1N2>1+3@65+39D651+
EEE +1N2B1+B0591+9D09+056+.39+2.9656192.8+35+
56;6@0.9+
0::3572.8+93+9D6+D65XD21+40>2;20529=+C29D+9D6+
13?5:61G+D65XD21+2.965619G+EEE++
$;;+73:?>6.91+C29D2.+3.6+43;765+:3.902.+
965>2.3;38=+CD2:D+21+:D050:9652192:+435+0+:;011+
".;=+:3.12765+9D316+B0591+34+9D6+73:?>6.9+
435+:;0112<65+;605.2.8G+CD2:D+056+5607+/=+9D6+
WM/04.02 S. 193
?165+
Europe
UK
IWF
London
Risk
Johnson
Brussels
Cameron
Brexit
Euro
Jobs
Independence
Brexit
Referendum
nt
e
m
e
v
o
r
p
m
45% I
-E+%?1:D65+0.7+$E+F6.86;G++,,)-,"&-.!/0)$+1&234)-,%56/00"7)#%8)/#-"-9G+H53:6672.81+F$IJKG+(.9L;+($H&+M35N1D3B+3.+F3:?>6.9+$.0;=121+I=196>1G+,050G+
O0B0.G+(***+!3>BE+I3:269=+H5611+PI6BE+QJJKRG+BBE+KSTUV++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?@""1""77"#$%&"
WM/04-05 S. 193
! but there are also
new applications in infotainment
WM/04.02 S. 194
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?A""1""77"#$%&"
WM/04-05 S. 194
Imagine there were input devices which could allow text
to know if and how it is read
Text 2.0 is an innovative interaction
mode between humans and
computer
It is build on the idea that the
computer knows on which text line,
sentence, or word a person looks
It supplements the text by hidden
“attentive mark-ups” that are
activated during reading, i.e.
recognizing a specific reading mode
WM/04.02 S. 195
Reveals new business options, .e.g.
in online marketing and
advertisement
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?B""1""77"#$%&"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"
WM/04-05 S. 195
Text 2.0 provides a simple-to-use framework for
constructing gaze-attentive and -responsive applications
Data Clustering for
fixation recognition
Effect generation
Data filtering and
normalization
Reading the text via
an eye tracker
Determination of
saccade lengths
Matching with
hidden mark-ups
Real time WM/04.02
export S.of196
fixation recognition data and reading mode into HTML
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?&""1""77"#$%&"
WM/04-05 S. 196
Text 2.0 is one of 32 selected recent megatrends selected
by von TrendOne
WM/04.02 S. 197
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?C""1""77"#$%&"
WM/04-05 S. 197
Q+"+:=048+)"9H+"<)+,"I4*":4K<0+"+8+19*,;J+*-"5<9H"
H+,)1:4E(9+)")<-=0,8"
WM/04.02 S. 198
'E+'3=0>0G+FE+I3..908G+$E+F6.86;G+'E+Z091?70G+ZE+(C0>?50G+0.7+[E+[216G%+%D"E)$%F)/6",*%G)/$.D&3-,)$%H)E,%H#/-06/,"&-%I*0,)4%@0"-9%?*)%
J/K)%L-M3,G+H53:6672.81+(\(+QJ]VG+]U9D+(.9L;+!3.4E+3.+(.96;;286.9+\165+(.96540:61G+^0240G+(1506;+PW6/E+QJ]VRG+BBE+_QUT__VE++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%?D""1""77"#$%&"
WM/04-05 S. 198
Q+"+:=048+)"9H+"<)+,"I4*":4K<0+"+8+19*,;J+*-"5<9H"
H+,)1:4E(9+)")<-=0,8"
WM/04.02 S. 199
'E+'3=0>0G+FE+I3..908G+$E+F6.86;G+'E+Z091?70G+ZE+(C0>?50G+0.7+[E+[216G%+%D"E)$%F)/6",*%G)/$.D&3-,)$%H)E,%H#/-06/,"&-%I*0,)4%@0"-9%?*)%
J/K)%L-M3,G+H53:6672.81+(\(+QJ]VG+]U9D+(.9L;+!3.4E+3.+(.96;;286.9+\165+(.96540:61G+^0240G+(1506;+PW6/E+QJ]VRG+BBE+_QUT__VE++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"%??""1""77"#$%&"
WM/04-05 S. 199
Q+"+:=048+)"9H+"<)+,"I4*":4K<0+"+8+19*,;J+*-"5<9H"
H+,)1:4E(9+)")<-=0,8"
WM/04.02 S. 200
'E+'3=0>0G+$E+F6.86;G+ME+I?Y?N2G+0.7+[E+[216G%@0)#%+,,)-,"&-%N#")-,)$%+394)-,)$%F)/6",*%&-%1&234)-,0%@0"-9%/%I)).,<#&39<%GD1%/-$%/%
;)/#/O6)%?*)%H#/2P)#G+1?/>29967+93+(IZ$&+QJ]_G+I=>BE+3.+Z2`67+0.7+$?8>6.967+&60;29=G+$76;0276G+$?1950;20+P":9E+QJ]_RG+BBE+QUUT_JJE++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$$""1""77"#$%&"
WM/04-05 S. 200
M6+6>B;3=67+9D6+2760+435+>3/2;6+6=6T950:N651+C29D+
D607T>3?.967+721B;0=+
WM/04.02 S. 201
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$%""1""77"#$%&"
WM/04-05 S. 201
R+0,9<4(-H<="K+95++(",E9H4*-S*+,)+*T"9H+"*+,0"54*0)T",()","
)4;E:+(9";,("K+")+-;*<K+)"F<,"9H+"7+:<49<;"G*<,(/0+"
(>082.0923.+
UE*"+(F<*4(:+(9";4(-<-9-"4I"
<9+:-T"I,;9-",()"+F+(9-"9H,9",*+"
V*+,0W",()")+9+*:<(+"4E*"0<F+-"
XVCD09+21+832.8+3.WY"
'3++M=*+--"9H4E/H9-T"5+"E-+"
-8:K40-T"4*";H,*,;9+*-"9H,9":,8"
K+"E()+*-944)"K8"49H+*-""
XVCD09+21+:3?:D67+35+
6`B;2:096WT"+>/>"F<,",")4;E:+(9Y"
F3:?>6.9+
WM/04.02 S. 202
&+,)<(/",")4;E:+(9"=E9"
;4(9+(9-"94/+9H+*",()";*+,9+"
F+*8"<()<F<)E,0"<:,/<(,9<4(-"
XVCD09+(+D0@6+2.+>2.7WY"
&60;29=+
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$#""1""77"#$%&"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"
WM/04-05 S. 202
QH8"-4:+")4;E:+(9-"54*J"K+99+*"9H,("49H+*-",9"H40)<(/"4E*"
,99+(9<4(",()";4(F+8<(/"<(I4*:,9<4(Z"
G8=4/*,=H8"E-,/+",()")+-</("<-"
+--+(9<,0"I4*"9+M9"E()+*-9,()<(/"
XJ+*(<(/T"9*,;J<(/T",()"0+,)<(/Y""
GH+",))<9<4(,0"+:=048:+(9"4I"
/*,=H<;-"-<:=0<I8"9H+"E()+*-9,()<(/"
4I","9+M9":+--,/+"
[+/<K<0<98"4I"9+M9"<-"<(\E+(;<(/"
*+0E;9,(;+",()"4I9+(")+;<)+-"5H+9H+*"
5+"=E*-E+"*+,)<(/"<9"
WM/04.02 S. 203
74:+"*+-+,*;H"K,;JE=-"9H,9"5+",*+"
(,9E*,008")*,5("94"<:,/+-"5<9H"
=*4=4*9<4(-",==*4,;H<(/"/40)+("*,9<4"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$@""1""77"#$%&"
WM/04-05 S. 203
68"E-<(/"9H+"240)+("R,9<4"<("9+M9"0<(+"-=,;<(/T"5+"<(F+-9</,9+)"
9H+"<(\E+(;+"4I"9+M9"0<(+"]E,0<98"4("9H+"\E+(;8"4I"*+,)<(/"""
F3:?>6.9+'=B6+]+PF']R+
PC29D+-3;76.+&0923+;2.6+1B0:2.8
F3:?>6.9+'=B6+Q+PF'QR+
PC29D3?9+-3;76.+&0923+;2.6+1B0:2.8
WM/04.02 S. 204
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$A""1""77"#$%&"
WM/04-05 S. 204
^4*"4E*"+M=+*<:+(9-"5+";4:K<(+)")4;E:+(9"*+9*<+F,0"
5<9H"/,_+"I+,9E*+"+M9*,;9<4("5H<0+"*+,)<(/"
&6:357+*=6+'50:N2.8+'520;1+
F3:?>6.9+&69526@0;+
W2`0923.1+
I0::0761+
&68561123.1+
WM/04.02 S. 205
F090+$.0;=121
+
-0Y6+W609?56+*`950:923.++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"
6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$B""1""77"#$%&"
WM/04-05 S. 205
R+-E09-"=*4F+)"9H,9"]E,0<98"4I"9+M9"0,84E9"-9*4(/08"<(\E+(;+-"9H+"
=+*;+=9E,0",K<0<9<+-"4I","*+,)+*"
&68561123.+50923+
'D6+0@65086+<`0923.1+34+9520;1+
3000
0,3
2000
0,2
1000
F']
F'Q
0
0,1
F']
F'Q
0
'D6+0@65086+34+56072.8+92>6++
&696.923.+6@0;?0923.+
100
300
200
50
100
F']
F'Q
0
F']
F'Q
0
WM/04.02 S. 206
S. S. Mozaffari, S. Bukhari, and A. Dengel, Using The Wearable Eye Trackers to measure of reading performance by applying the golden ratio parameter
for line spacing, Proceedings WeSAX15,, IEEE International Conference on Multimedia and Expo Workshops, Torino, Italy (July 2015)+
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$&""1""77"#$%&"
WM/04-05 S. 206
+
%?9+CD09+0/3?9+96`9++
;682/2;29=ab+
+
WM/04.02 S. 207
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$C""1""77"#$%&"
WM/04-05 S. 207
`,J<(/"4E9"0+99+*-")4+-"(49":+,("9H,9"54*)-",*+"(+;+--,*<08"
+,-8"94"*+,)"4*";4:=*+H+()"
R+,),K<0<98"<-"9H+"+,-+"5<9H"5H<;H"9+M9";,("K+"*+,)"
O4:=*+H+(-<4("<-","J+8"I,;94*"<("9+*:-"4I"*+,),K<0<98T",-"<-"K+<(/",K0+"94"]E<;J08"
044J"0+99+*<(/",9"'a."E()+*-9,()"
%?9+D3C+:0.+C6+>601?56+9D6+;682/2;29=+34+96`9b+
++8+"9*,;J<(/",;;E*,;8T")<b+*+(9"*+,)<(/"-980+-T",)+]E,9+"I+,9E*+-"
...
...
...
WM/04.02 S. 208
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$D""1""77"#$%&"
WM/04-05 S. 208
c("4*)+*"94":,(,/+"9H+-+";H,00+(/+-"5+")<)"(49";4(-<)+*"-<(/0+"
54*)-"KE9"9H+"+(F<*4(:+(9"4I"54*)-"
Q+"=,*9<9<4(+)"9H+"+(9<*+"=,/+"<(94","/*<)"-9*E;9E*+",()",==0<+)","5<()45"9+;H(<]E+"
Actual Area
Consideration Area
G<0+"-<_+"X,;9E,0",*+,Y",==*4M<:,9+08"-<:<0,*"94"9H+"54*)"-<_+"
WM/04.02 S. 209
O4(-<)+*,9<4(",*+,"+M9+()-"94"+,;H"-<)+"K8"9H+"+-9<:,9+)"9*,;J<(/"+**4*"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#$?""1""77"#$%&"
WM/04-05 S. 209
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
h
d ...
.
WM/04.02 S. 210
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%$""1""77"#$%&"
WM/04-05 S. 210
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
h
d ...
.
WM/04.02 S. 211
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%%""1""77"#$%&"
WM/04-05 S. 211
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
h
d ...
.
WM/04.02 S. 212
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%#""1""77"#$%&"
WM/04-05 S. 212
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
h
.
..
R+,)<(/"O4E(9"XROYe"
WM/04.02 S. 213
c+'D6+.?>/65+34+B01161+:;0112<67+01+56072.8+2.96516:92.8+9D6+:3.127650923.+0560+34+9D6+92;6E+
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%@""1""77"#$%&"
WM/04-05 S. 213
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
h
...
R+,)<(/"O4E(9"XROY"
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
WM/04.02 S. 214
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%A""1""77"#$%&"
WM/04-05 S. 214
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
h
...
.
R+,)<(/"O4E(9"XROY"
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
WM/04.02 S. 215
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%B""1""77"#$%&"
WM/04-05 S. 215
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
T1
T2
T3
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
230 313 135
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
170 70 120
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
0.6 -4.0 2.1
R+,)<(/"O4E(9"XROY"
-
1
2
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
-
1
-
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
2
-
-
…
WM/04.02 S. 216
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%&""1""77"#$%&"
WM/04-05 S. 216
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
T1
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
T2
T3
…
0.6 0.8 0.
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
0.9
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
0.3
0…1
0.1 0.4
0.1 0.5
R+,)<(/"O4E(9"XROY"
-
0.3
0.6
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
-
0.5
-
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
0.6
-
-
WM/04.02 S. 217
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%C""1""77"#$%&"
WM/04-05 S. 217
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
T1
T2
T3
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
0.2
0.6
0.2
0.9
0.8
1.0
0.2
0.2
0.1
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
0.2
0.9
0.4
0.3
0.1
0.0
0.5
0.4
0.1
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
0.5
0.3
1.0
0.2
0.1
0.4
0.7
0.5
0.7
R+,)<(/"O4E(9"XROY"
0.3
0.9
0.3
-
0.9
0.6
0.1
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
-
0.2
0.5
0.2
0.9
0.2
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
0.2
0.6
0.8
1.0
0.2
0.6
}
…
F.a. users
WM/04.02 S. 218
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%D""1""77"#$%&"
WM/04-05 S. 218
c("4E*"+M=+*<:+(9-"5+"<(F+-9</,9+)","KE(;H"4I"=49+(9<,008"
*+0+F,(9"I+,9E*+-""
T1
T2
T3
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
0.2
0.6
0.2
0.9
0.8
1.0
0.2
0.2
0.1
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
0.2
0.9
0.4
0.3
0.1
0.0
0.5
0.4
0.1
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
0.5
0.3
1.0
0.2
0.1
0.4
0.7
0.5
0.7
R+,)<(/"O4E(9"XROY"
0.3
0.9
0.3
-
0.9
0.6
0.1
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
-
0.2
0.5
0.2
0.9
0.2
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
0.2
0.6
0.8
1.0
0.2
0.6
…
}ø
WM/04.02 S. 219
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#%?""1""77"#$%&"
WM/04-05 S. 219
'(",//*+/,9<4("4I"*+,)<(/"),9,"I*4:"F,*<4E-"*+,)+*-"/<F+-"
<(-</H9-"<(94"9H+"*+,),K<0<98"4I","9+M9"
$WF
WWF
&I
&!
&I
&'
d
'F+*,/+"^<M,9<4(".E*,9<4("X'^.Y"
.E*,9<4("4I"^<*-9"^<M,9<4("d4<(9"X^^.Y"
R+,)<(/"S"7J<::<(/"O0,--<P;,9<4("XR7Y"
R+,)<(/"O4E(9"XROY"
f"R+/*+--<4(-"-9,*9<(/"<("9H+"'*+,"XR7Y"
f"R+/*+--<4(-"+()<(/"<("9H+"'*+,"XRGY"
WM/04.02 S. 220
I/4M6)%&3,M3,%'&#%/%
$&234)-,%
#)/$%O*%0)>)-%30)#0%+
R. Biedert, M. El Hosseiny, A. Dengel, and G. Buscher, Towards Robust Gaze-Based Objective Quality Measures for Text, Proceedings 7th Biennial Symposium on
Eye Tracking Research & Applications, Santa Barbara, CA, USA (March 2012), pp. 201-204 "
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##$""1""77"#$%&"
WM/04-05 S. 220
^4*"9+-9<(/T"E-+*-"H,)"94"5*<9+"-H4*9"9+M9"+M=0,<(<(/"-4:+"I,;9-"
KE9"5<9H4E9"=*44I*+,)<(/"
1
Italienische Schüler besuchen
zunächst fünf Jahre lang die scuola
elementare vergleichbar mit der deutschen
Grundschule. Anschließen folgt die scuola
media, die die Schüler drei Jahre lang
besuchen. Bevor sie auf eine weiterführende
Schule wechseln können (liceo). Diesen
Schultyp gibt es in verschiedenen
Ausprägungen (z.B. liceo classico, liceo
linguistico), bei dem der Schwerpunkt jeweils
auf einer anderen Fächergruppe liegt.
Anstelle eines liceos kann die Schule auch mit
dem Besuch eines Istituto tecnico
(Fachschule), das auf die Ausübung
kaufmännischer Berufe vorbereitet, oder
eines Instituto professionale, einer
Berufsschule mit den Zweigen Handel,
Tourismus, Industrie und Landwirtschaft,
Während meines Praktikums beim
DFKI, dem Deutschen Forschungszentrum
für Künstliche Intelligenz, in Kaiserslautern
muss ich jeden morgen verschiedene
Verkehrsmittel in Anspruch nehmen, um an
meinen geliebten Arbeitsplatz zu kommen.
Entweder Smart oder Fahrrad sowie eine
Regionalbahn der Deutschen Bahn dienen als
Transportmittel.
Die Reise von
Pirmasens nach Kaiserslautern beginnt zu
Hause um viertel nach sieben. Nachdem
Laptop und Rucksack mit Essen, Trinken,
Papier und Stiften in den riesigen,
überdimensionalen Kofferraum des Smarts
eingeladen sind, geht die Fahrt los. Da man
sich morgens durch den für Pirmasens
relativ
Bei der Musik gibt es verschiedene
Genres z.B. Dubstep, Indie, Electro, House,
Pop, Rock, Metal, Klassik, Core, Punk, Evil
Disco, Blues, Alternative, Hip-Hop, Rap, Soul,
Soundtrack, Jazz, Dance, Hardstyle, Shuffle,
Jumpstyle, etc. Zwischen den Genres gibt es
jeweils Unterscheidungen, manche
fokussieren sich mehr auf den
instrumentalen Part, andere beschäftigen
sich primär mit dem Gesang und andere auf
elektronisch erzeugte Rhythmen und
Melodien. .
Dubstep, Elektro, House,
Hardstyle, Shuffle und Jumpstyle gehören zu
den elektronischen Genren, meist fokussiert
sich das elektronische Genre auf den Bass &
schnelle Beats. Doch auch Core wird von
elekronischen Melodien beeinflusst z.B.
Breakcore. Zu Jumpstyle, Shuffle
Ich spiele leidenschaftlich gerne
Handball. Diesen Sport habe ich allerdings
erst mit 15 Jahren angefangen. Die
Trainingszeiten sind dienstags, von 20:00 bis
22:00 Uhr, und donnerstags, von 18:30 bis
20:00 Uhr. In der letzten Runde stand ich mit
meinem Verein, dem TV Dahn (nur) auf dem
3. Tabellenplatz der Pfalzliga, weil die
Vorrunde sehr schlecht ausgefallen ist, und
wir fast jedes Spiel verloren hatten. In der
Rückrunde konnten wir aber wieder punkten
und gewannen jedes Spiel. Allerdings
mussten wir in der Rückrunde wegen
Spielermangel 2 Punkte hier in
Kaiserslautern lassen, da wir lediglich mit 6
Personen ankamen und der Torwart
ebenfalls fehlte.
Bei meinem ersten Auto handelt es
sich um einen VW Golf Caddy I 1.4d .Dieses
Auto ist ein Pickup mit VW GolfI Basis und
wird von einem 4-Gang Getriebe und einem
1.4 l Dieselmotor angetrieben.
Die
Front des Autos ist eins zu eins von einem
1er Golf 5-Türer übernommen worden . Es ist
ein 2-Sitzer , dessen hinterer Teil aus einer
etwa 1,80m langen Ladefläche besteht .
Beginn der Restauration des Fahzeuges war
zum Beginn der zweiten Hälfte der
Osterferien 2011 und endeteam
7.6.2011 . In dieser Zeitspanne wurde
einiges an dem Fahrzeug getan . Der Zustand
indem das Auto vor den Reperaturen war ,
sah folgendermaßen aus .
WM/04.02 S. 221
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##%""1""77"#$%&"
WM/04-05 S. 221
U9H+*"E-+*-"*+,)",()"*,9+)"=,--,/+"<()<F<)E,008"K8"/*,)+-"%"94"B"
2
Italienische Schüler besuchen
zunächst fünf Jahre lang die scuola
elementare vergleichbar mit der deutschen
Grundschule. Anschließen folgt die scuola
media, die die Schüler drei Jahre lang
besuchen. Bevor sie auf eine
weiterführende Schule wechseln können
(liceo). Diesen Schultyp gibt es in
verschiedenen Ausprägungen (z.B. liceo
classico, liceo linguistico), bei dem der
Schwerpunkt jeweils auf einer anderen
Fächergruppe liegt. Anstelle eines liceos
kann die Schule auch mit dem Besuch eines
Istituto tecnico (Fachschule), das auf die
Ausübung kaufmännischer Berufe
vorbereitet, oder eines Instituto
professionale, einer Berufsschule mit den
Zweigen Handel, Tourismus, Industrie und
Landwirtschaft,
Während meines Praktikums beim
DFKI, dem Deutschen Forschungszentrum
für Künstliche Intelligenz, in Kaiserslautern
muss ich jeden morgen verschiedene
Verkehrsmittel in Anspruch nehmen, um an
meinen geliebten Arbeitsplatz zu kommen.
Entweder Smart oder Fahrrad sowie eine
Regionalbahn der Deutschen Bahn dienen
als Transportmittel.
Die Reise von
Pirmasens nach Kaiserslautern beginnt zu
Hause um viertel nach sieben. Nachdem
Laptop und Rucksack mit Essen, Trinken,
Papier und Stiften in den riesigen,
überdimensionalen Kofferraum des Smarts
eingeladen sind, geht die Fahrt los. Da man
sich morgens durch den für Pirmasens
relativ
Bei der Musik gibt es verschiedene
Genres z.B. Dubstep, Indie, Electro, House,
Pop, Rock, Metal, Klassik, Core, Punk, Evil
Disco, Blues, Alternative, Hip-Hop, Rap,
Soul, Soundtrack, Jazz, Dance, Hardstyle,
Shuffle, Jumpstyle, etc. Zwischen den
Genres gibt es jeweils Unterscheidungen,
manche fokussieren sich mehr auf den
instrumentalen Part, andere beschäftigen
sich primär mit dem Gesang und andere auf
elektronisch erzeugte Rhythmen und
Melodien. .
Dubstep, Elektro, House,
Hardstyle, Shuffle und Jumpstyle gehören
zu den elektronischen Genren, meist
fokussiert sich das elektronische Genre auf
den Bass & schnelle Beats. Doch auch Core
wird von elekronischen Melodien
beeinflusst z.B. Breakcore. Zu Jumpstyle,
Shuffle
Ich spiele leidenschaftlich gerne
Handball. Diesen Sport habe ich allerdings
erst mit 15 Jahren angefangen. Die
Trainingszeiten sind dienstags, von 20:00
bis 22:00 Uhr, und donnerstags, von 18:30
bis 20:00 Uhr. In der letzten Runde stand
ich mit meinem Verein, dem TV Dahn (nur)
auf dem 3. Tabellenplatz der Pfalzliga, weil
die Vorrunde sehr schlecht ausgefallen ist,
und wir fast jedes Spiel verloren hatten. In
der Rückrunde konnten wir aber wieder
punkten und gewannen jedes Spiel.
Allerdings mussten wir in der Rückrunde
wegen Spielermangel 2 Punkte hier in
Kaiserslautern lassen, da wir lediglich mit 6
Personen ankamen und der Torwart
ebenfalls fehlte.
Bei meinem ersten Auto handelt es
sich um einen VW Golf Caddy I 1.4d .Dieses
Auto ist ein Pickup mit VW GolfI Basis und
wird von einem 4-Gang Getriebe und einem
1.4 l Dieselmotor angetrieben.
Die
Front des Autos ist eins zu eins von einem
1er Golf 5-Türer übernommen worden . Es
ist ein 2-Sitzer , dessen hinterer Teil aus
einer etwa 1,80m langen Ladefläche
besteht .
Beginn der Restauration
des Fahzeuges war zum Beginn der zweiten
Hälfte der Osterferien 2011 und
endeteam 7.6.2011 . In dieser
Zeitspanne wurde einiges an dem Fahrzeug
getan . Der Zustand indem das Auto vor
den Reperaturen war , sah folgendermaßen
aus .
!3;35+56e6:91+>05Nf+
-566.+d+]++ + +++++++++
P@65=+8337+;682/2;29=R+
&67+d+g+ + + ++++++++++
P@65=+/07+;682/2;29=R+
WM/04.02 S. 222
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"###""1""77"#$%&"
WM/04-05 S. 222
GH+("5+"E-+)"9H+"9<0+-"94"/+(+*,9+"/*4E()"9*E9H"
3
Italienische Schüler besuchen
zunächst fünf Jahre lang die scuola
elementare vergleichbar mit der deutschen
Grundschule. Anschließen folgt die scuola
media, die die Schüler drei Jahre lang
besuchen. Bevor sie auf eine
weiterführende Schule wechseln können
(liceo). Diesen Schultyp gibt es in
verschiedenen Ausprägungen (z.B. liceo
classico, liceo linguistico), bei dem der
Schwerpunkt jeweils auf einer anderen
Fächergruppe liegt. Anstelle eines liceos
kann die Schule auch mit dem Besuch eines
Istituto tecnico (Fachschule), das auf die
Ausübung kaufmännischer Berufe
vorbereitet, oder eines Instituto
professionale, einer Berufsschule mit den
Zweigen Handel, Tourismus, Industrie und
Landwirtschaft,
Während meines Praktikums beim
DFKI, dem Deutschen Forschungszentrum
für Künstliche Intelligenz, in Kaiserslautern
muss ich jeden morgen verschiedene
Verkehrsmittel in Anspruch nehmen, um an
meinen geliebten Arbeitsplatz zu kommen.
Entweder Smart oder Fahrrad sowie eine
Regionalbahn der Deutschen Bahn dienen
als Transportmittel.
Die Reise von
Pirmasens nach Kaiserslautern beginnt zu
Hause um viertel nach sieben. Nachdem
Laptop und Rucksack mit Essen, Trinken,
Papier und Stiften in den riesigen,
überdimensionalen Kofferraum des Smarts
eingeladen sind, geht die Fahrt los. Da man
sich morgens durch den für Pirmasens
relativ
Bei der Musik gibt es verschiedene
Genres z.B. Dubstep, Indie, Electro, House,
Pop, Rock, Metal, Klassik, Core, Punk, Evil
Disco, Blues, Alternative, Hip-Hop, Rap,
Soul, Soundtrack, Jazz, Dance, Hardstyle,
Shuffle, Jumpstyle, etc. Zwischen den
Genres gibt es jeweils Unterscheidungen,
manche fokussieren sich mehr auf den
instrumentalen Part, andere beschäftigen
sich primär mit dem Gesang und andere auf
elektronisch erzeugte Rhythmen und
Melodien. .
Dubstep, Elektro, House,
Hardstyle, Shuffle und Jumpstyle gehören
zu den elektronischen Genren, meist
fokussiert sich das elektronische Genre auf
den Bass & schnelle Beats. Doch auch Core
wird von elekronischen Melodien
beeinflusst z.B. Breakcore. Zu Jumpstyle,
Shuffle
Ich spiele leidenschaftlich gerne
Handball. Diesen Sport habe ich allerdings
erst mit 15 Jahren angefangen. Die
Trainingszeiten sind dienstags, von 20:00
bis 22:00 Uhr, und donnerstags, von 18:30
bis 20:00 Uhr. In der letzten Runde stand
ich mit meinem Verein, dem TV Dahn (nur)
auf dem 3. Tabellenplatz der Pfalzliga, weil
die Vorrunde sehr schlecht ausgefallen ist,
und wir fast jedes Spiel verloren hatten. In
der Rückrunde konnten wir aber wieder
punkten und gewannen jedes Spiel.
Allerdings mussten wir in der Rückrunde
wegen Spielermangel 2 Punkte hier in
Kaiserslautern lassen, da wir lediglich mit 6
Personen ankamen und der Torwart
ebenfalls fehlte.
Bei meinem ersten Auto handelt es
sich um einen VW Golf Caddy I 1.4d .Dieses
Auto ist ein Pickup mit VW GolfI Basis und
wird von einem 4-Gang Getriebe und einem
1.4 l Dieselmotor angetrieben.
Die
Front des Autos ist eins zu eins von einem
1er Golf 5-Türer übernommen worden . Es
ist ein 2-Sitzer , dessen hinterer Teil aus
einer etwa 1,80m langen Ladefläche
besteht .
Beginn der Restauration
des Fahzeuges war zum Beginn der zweiten
Hälfte der Osterferien 2011 und
endeteam 7.6.2011 . In dieser
Zeitspanne wurde einiges an dem Fahrzeug
getan . Der Zustand indem das Auto vor
den Reperaturen war , sah folgendermaßen
aus .
ie d
f
i
r
e
e v
r
ata
a
d
s
t
l
g
n
u
ad i
Res
e
r
e
io n
t
h
a
t
c
i
h
w it
ssif
a
l
c
!
(62% ccuracy)
a
WM/04.02 S. 223
&E+%267659G+ZE+*;+^31162.=G+$E+F6.86;G+0.7+-E+%?1:D65G+H&=/#$0%F&O30,%J/K).!/0)$%NOQ)2,">)%R3/6",*%D)/03#)0%'&#%H)E,G+
H53:6672.81+S9D+%26..20;+I=>B312?>+3.+*=6+'50:N2.8+&61605:D+h+$BB;2:0923.1G+I0.90+%05/050G+!$G+\I$+PZ05:D+QJ]QRG+BBE+
QJ]TQJVE++
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##@""1""77"#$%&"
WM/04-05 S. 223
+
MD09+9=B61+34+721B;0=67+7090+21+
/69965+1?2967+93+13;@6+1B6:2<:+
9=B61+34+B53/;6>1b+
+
WM/04.02 S. 224
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##A""1""77"#$%&"
WM/04-05 S. 224
c("4*)+*"94"/+9","P*-9"<:=*+--<4(T"5+"*E(","=<049"-9E)8"5<9H"
-9E)+(9-"I*4:"=H8-<;-"
Xa49"4(08Y"<("=H8-<;-"+)E;,9<4(T"<9"<-"H</H08"<:=4*9,(9"I4*"+)E;,94*-",()"<(-9*E;94*-"
94"H,F+"<(-</H9",K4E9"9H+",==*4=*<,9+"=*+=,*,9<4("4I"*+=*+-+(9,9<4(-T"0<J+"F+;94*-T"
9,K0+-T",()")<,/*,:-"I4*"-40F<(/"-=+;<P;"98=+-"4I"=*4K0+:-"
LM=+*<:+(9,0"7+9E="
Q+"H,F+";4()E;9+)",("+8+"
9*,;J<(/"+M=+*<:+(9"K8"
+:=048<(/","0</H915+</H9T"045"
=*<;+",()"=4*9,K0+"+8+"9*,;J+*"
=,<*+)"5<9H","9,K0+9"
WM/04.02 S. 225
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##B""1""77"#$%&"
WM/04-05 S. 225
79E)+(9-"5+*+"-H45("9H*++";4H+*+(9"*+=*+-+(9,9<4(-",K4E9","
=H+(4:+(4(",()"5+*+"<(-9*E;9+)"94"-40F+","=H8-<;-"=*4K0+:"
)6:9351
'0/;6
F20850>
WM/04.02 S. 226
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##&""1""77"#$%&"
WM/04-05 S. 226
LM,:=0+"%N"R,51/,_+"4I","-<(/0+"=,*9<;<=,(9"I4*","=,*9<;E0,*"
]E+-9<4("
An initially latent body is irregularly accelerated. While the experiment various data regarding the movement of
the body is collected. Please verify whether the the following statement is correct:
“The body reaches its maximum speed at time t1.”
WM/04.02 S. 227
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##C""1""77"#$%&"
WM/04-05 S. 227
GH+",;;E:E0,9<4(",()"F<-E,0<_,9<4("4I"9H+"*+-E09-"I4*","-<(/0+"
]E+-9<4("*+F+,0"<(-</H9-"<(94"*+0+F,(;+",()"=*+I+*+(;+"
An initially latent body is irregularly accelerated. While the experiment various data regarding the movement of
the body is collected. Please verify whether the the following statement is correct:
“The body reaches its maximum speed at time t1.”
WM/04.02 S. 228
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##D""1""77"#$%&"
WM/04-05 S. 228
LM,:=0+"#N"R,51/,_+"4I","-<(/0+"=,*9<;<=,(9"I4*","=,*9<;E0,*"
]E+-9<4("
An objects falls from the roof a building at time t=0. After some time the air resistance leads to a constant fall
velocity. z is the position, v the speed, and a the acceleration of the object. Please verify whether the the following
statement is correct:
“The acceleration at t= 0 is maximal.”
WM/04.02 S. 229
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"##?""1""77"#$%&"
WM/04-05 S. 229
^4*"+,;H"=,*9<;<=,(9T"5+";,=9E*+)"*,51/,_+"4I","=,*9<;<=,(9"
I4*","=,*9<;E0,*"]E+-9<4("
An objects falls from the roof a building at time t=0. After some time the air resistance leads to a constant fall
velocity. z is the position, v the speed, and a the acceleration of the object. Please verify whether the the following
statement is correct:
“The acceleration at t= 0 is maximal.”
WM/04.02 S. 230
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#@$""1""77"#$%&"
WM/04-05 S. 230
GH+"+b+;9<F+(+--"4I"+,;H"*+=*+-+(9,9<4("5,-",--+--+)"I4*"9H*++"
0+F+0-"4I"-9E)+(9"+M=+*9<-+N"+M=+*9-T"<(9+*:+)<,9+-",()"(4F<;+-"
)6:935+
'0/;6+
F20850>+
LM=+*9"
%C>Dg"
A$TDg"
A%TAg"
`+)<4;*+"
#@T$g"
@CT?g"
@?T$g"
a4F<;+"
##TDg"
gJGUi+
@&T@g"
Expert gives least preference to vector, and high and equal preference to both
table and diagram"
Mediocre gives little higher preference to vector as compared to expert, and
same, but little lower as compared to expert preference to both table and
diagram"
Novice, unlike expert and mediocre, gives the highest preference to table
WM/04.02 S. 231
among all representations"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#@%""1""77"#$%&"
WM/04-05 S. 231
Q+"H,F+",0-4"*+;4*)+)"9H+"/,_+"K+H,F<4*")E*<(/",(-5+*<(/"
d0+,-+",(-5+*"(45"9H+"]E+-9<4(N"
c-"9H+"I40045<(/"-9,9+:+(9";4**+;9Z"
""""WGH+"K4)8"<-"<(<9<,008",;;+0+*,9+)"+F+(08h>"
"
GH+",(-5+*"<-N"
O4**+;9
"<(;4**+;9"
"
i45";+*9,<(",*+"84EZ"
j"
d0+,-+",(-5+*"(45"9H+"]E+-9<4(N"
c-"9H+"I40045<(/"-9,9+:+(9";4**+;9Z"
""""W^4*"-H4*9"=+*<4)-"1445&(&67189:1&5<9H"9H+"
""P(,0",;;+0+*,9<4(":h>"
"
GH+",(-5+*"<-N"
O4**+;9
"<(;4**+;9"
"
i45";+*9,<(",*+"84EZ"
j"
We calculated a confidence score (CS) and found out that there is a
WM/04.02
S. 232
difference:
95%
(Expert CS), 88% (Intermediate), 84% (Novice)"
!"#$%&"'()*+,-".+(/+0""1""'2"3(450+)/+"6,-+)"78-9+:-""1""7;*<=9"!"##$%"&$'()*+(,'*##(-*,!*++1""=>"#@#""1""77"#$%&"
WM/04-05 S. 232