Does Regularity Make Reading a Foreign Language Easier?

Transcription

Does Regularity Make Reading a Foreign Language Easier?
University of Groningen
Does Regularity Make
Reading a Foreign
Language Easier?
Studying the power of entropy measures when
predicting written mutual intelligibility among
five Germanic languages.
Anne Kingma
S2405792
anne.s.kingma@gmail.com
March 6, 2015
Program: Language & Cognition
1st supervisor: Charlotte Gooskens
2nd supervisor: Wilbert Heeringa
Acknowledgments
First of all, I would like to thank my first supervisor,
Charlotte Gooskens, for her help and guidance and for
always being willing to discuss things. She carefully read
everything I wrote and gave me both positive and
negative feedback, discussed the results with me and
gave me suggestions for new directions to explore or
literature to read. Secondly, I am grateful to my second
supervisor, Wilbert Heeringa, who allowed me (and
taught me how) to use his scripts for calculating the
linguistic measures in this thesis. When I had questions
he always answered them quickly and he helped me
improve the technical and statistical parts of the thesis. I
would like to thank Femke Swarte for her help and
feedback and for allowing me to use some of the
intelligibility data she collected for her PhD thesis, even
though it has not been published yet. Without the results
from her intelligibility experiments, this thesis would not
have been as interesting. Finally, I would like to thank
Mark Härtl for his help with the layout, the images and
the graphs. I could never have made this thesis look so
pretty myself.
2
Table of Contents
Acknowledgments .......................................................................................................................................... 2
1. Introduction ................................................................................................................................................. 5
2. Background .................................................................................................................................................. 6
2.1 Intelligibility .......................................................................................................................................... 6
2.2 Previous research on intelligibility .............................................................................................. 9
2.3 Measuring linguistic distance ....................................................................................................... 12
2.4 The MICReLa project ....................................................................................................................... 19
3. Research questions.................................................................................................................................. 20
4. Languages.................................................................................................................................................... 22
4.1 Overview............................................................................................................................................... 22
4.2 Germanic languages ......................................................................................................................... 24
4.2.1 North Germanic: Swedish and Danish .............................................................................. 26
4.2.2 West Germanic: German and Dutch................................................................................... 27
4.2.3 West Germanic: English.......................................................................................................... 30
4.3 Germanic orthography .................................................................................................................... 32
4.3.1 English ........................................................................................................................................... 33
4.3.2 Dutch .............................................................................................................................................. 35
4.3.3 German .......................................................................................................................................... 35
4.3.4 Danish ............................................................................................................................................ 36
4.3.5 Swedish ......................................................................................................................................... 37
4.3.6 Summary ...................................................................................................................................... 38
4.3.7 The North Wind and the Sun ................................................................................................ 39
5. Data ................................................................................................................................................................ 41
6. Methods ....................................................................................................................................................... 44
6.1 Methods for measuring linguistic distance ............................................................................. 44
6.1.1 Lexical distance.......................................................................................................................... 44
3
6.1.2 Orthographic Levenshtein distance .................................................................................. 46
6.1.3 Measuring conditional entropy ........................................................................................... 48
6.2 Measuring intelligibility ................................................................................................................. 53
6.2.1 Participants ................................................................................................................................. 53
6.2.1 Cloze test ...................................................................................................................................... 56
6.2.2 Word translation task ............................................................................................................. 56
7. Results .......................................................................................................................................................... 57
7.1 Linguistic measures ......................................................................................................................... 57
7.1.1 Lexical distance ......................................................................................................................... 57
7.1.2 Orthographic Levenshtein distance .................................................................................. 59
7.1.3 Entropy measures .................................................................................................................... 61
7.1.4 Correlations of the different linguistic measures ........................................................ 68
7.2 Using linguistic measures to predict intelligibility .............................................................. 71
7.2.1 Cloze test ...................................................................................................................................... 73
7.2.2 Word translation task ............................................................................................................. 77
8. Discussion ................................................................................................................................................... 82
8.1 Linguistic measures ......................................................................................................................... 82
8.2 Research questions .......................................................................................................................... 84
8.3 Future research ................................................................................................................................. 86
9. Conclusion .................................................................................................................................................. 88
10. References................................................................................................................................................ 89
Etymological dictionaries ..................................................................................................................... 93
Appendix .......................................................................................................................................................... 94
Appendix A: Excluded Words .............................................................................................................. 94
Appendix B: Word List ........................................................................................................................... 96
4
1. Introduction
The Scandinavian languages Swedish, Danish and Norwegian are of the North
Germanic branch of the Indo-European language family. They are so similar to each
other, that their speakers can (and do) to some extent communicate with each other
while each using their own language: receptive multilingualism. The successfulness of
this type of communication depends on the level of mutual intelligibility that exists
between the languages: how well can both speakers understand each other’s
language. Three factors are thought to determine this (Gooskens 2007a:446):
1. The listener’s attitude towards the speaker’s language
2. The listener’s contact with the speaker’s language and other language
experience
3. Linguistic distance of the speaker’s language to the listener’s
language
In this thesis, I will focus on the third of these factors: the linguistic factors.
The linguistic relations among five Germanic languages (English, Dutch, German,
Danish and Swedish) are calculated on the orthographic level in three ways: lexical
distance, Levenshtein distance (Heeringa 2004) and conditional entropy (Moberg et
al. 2007). These results are correlated to the results of two written intelligibility tasks
carried out by Femke Swarte as part of the Micrela project of the University of
Groningen (see e.g. Heeringa et al. 2013): a cloze test and a word translation task. The
main purpose of this study is to determine the value of entropy calculations in
addition to the lexical and Levenshtein distance: entropy between two languages is
inherently asymmetrical, unlike lexical and Levenshtein distances. Therefore, it could
be a useful way to capture the existing asymmetry found in mutual intelligibility
(Moberg et al. 2007), for example the asymmetry between Swedish and Danish
(Schüppert 2011). More on this asymmetry can be found in Chapter 2.
In Chapter 2, the theoretical background of the topic will be provided, leading
to the research questions in Chapter 3. Chapter 4 will give some background on the
languages included in this research and the relations among them. Chapter 5
describes the process of building the word lists used for the linguistic measures, and
5
Chapter 6 explains the procedures of these measures and the intelligibility
experiments. Chapter 7 shows the results, Chapter 8 discusses these and Chapter 9 is
the conclusion of this thesis.
2. Background
The first part of this chapter will outline the background and history of intelligibility
research. In the second part, the issue of measuring linguistic distance will be
elaborated upon. Finally, in the third section a recent project on linguistic factors
influencing intelligibility will be described: the Micrela project.
2.1 Intelligibility
When two people with different native languages want to communicate, there are
several ways in which they can go about this. Firstly, one of the speakers can learn to
speak the other’s native language. This commonly happens when one language is
clearly dominant over the other. In individual cases this could be an immigrant
learning the language of his or her new country of residence. It is easy to see the
many advantages learning this new language would give him, making it worth the
effort he will have to put into it. This strategy is also used more structurally, however,
in the case of minority languages. There are many such situations all over the world:
Gaelic languages in Ireland and Great Britain, Frisian in The Netherlands, Swedish in
Finland, Catalan in Spain. The native speakers of the dominant language do not bother
to learn the minority language, forcing the speakers of the minority language to learn
the dominant language in order to advance in life. This situation is somewhat
unequal, as only one of the speakers has to invest time and effort into learning
another language, whereas the other can enjoy the ease of being able to express their
thoughts in their native language. Moreover, it potentially leads to endangerment of
6
the minority language. The ultimate goal of successful communication is reached,
however.
A second possibility is for both speakers to learn a third language, native to
neither of them. This frequently happens with lingua francas, like English, Latin,
Hindi, Modern Standard Arabic.1 It could be argued that certain dialect situations fall
into this category as well, depending on whether learning the standard language is
considered learning a new language. The advantage of this strategy is that both
speakers are equal: they both have to learn a new language and they both have to
struggle with using a language that is not native to them. At the same time, of course,
this is also its major disadvantage. More people having to learn a new language takes
more time, effort, and money. More languages run the risk of being endangered, as in
the previous situation. Also, the risk of miscommunication is higher when neither
speaker is able to use their native language. In addition to this, even in this situation
some speakers might have an advantage over others: if for example their native
language is close to the lingua franca, or they have an aptitude for language learning.
There is a third option, however: receptive multilingualism. In this case, each
speaker simply speaks his or her own language: they are using both languages
simultaneously to communicate. The great advantage of this strategy is of course that
both speakers have the comfort and ease of being able to express themselves in their
native language. Neither of them has to invest the time and effort required to learn to
speak another language: they only need to learn to understand it. If the languages are
closely related, even this might not or barely be necessary. In this case, the languages
are inherently mutually intelligible.
There are several known situations in which this tactic is applied. Serbian and
Croatian, for example, are very similar to each other, and for the most part mutually
intelligible. The distinction between the languages is a political one more than
linguistic. Another example can be found in Scandinavia. The three main languages
spoken there, Norwegian, Swedish and Danish, are so similar to each other, that all
1
Of course, in many situations involving a lingua franca, one of the speakers is in fact a native speaker of
the language in question. These situations belong to the first category.
7
speakers can understand the other languages quite well without any previous
experience or training. When they need to communicate with a speaker of a different
language, rather than resorting to a third language as many other Europeans in that
situation would, they often use the strategy of receptive multilingualism.
What factors contribute to making receptive multilingualism possible?
Gooskens (2007a:446) mentions three factors that could contribute to the level of
mutual intelligibility:
1. The listener’s attitude towards the speaker’s language
2. The listener’s contact with the speaker’s language and other language
experience
3. Linguistic distance of the speaker’s language to the listener’s
language
The first factor, attitude, refers to a listener’s opinion of or feeling towards a
certain language or language variety. If a listener dislikes a language and/or its
speakers, he will probably be less willing to put effort into trying to understand it.
This might result in less successful communication. If the listener likes the language
variety he is listening to, however, he might understand more, just by trying harder.
Schüppert, Hilton and Gooskens (accepted), for example, find a low but significant
positive correlation (r = .19) between attitude and word intelligibility for Danish and
Swedish.
The second factor, contact, concerns previous experience the listener has with
the speaker’s language and other, possibly related, languages. Having learned to
speak a language naturally improves a person’s ability to understand it. But even if
there has only been passive contact, for example by hearing radio programmes or
hearing tourists speak, the listener might start to recognize certain sound
correspondences. In addition, knowing other languages can aid understanding, for
example by providing vocabulary. If a native speaker of Dutch encounters Danish for
the first time, having experience with a related language like German will improve his
understanding of this new language (Swarte, Schüppert and Gooskens (accepted)).
The Danish word kartoffel ‘potato’, for instance, does not have a cognate in Dutch (the
Dutch translation is aardappel). A Dutch reader who is familiar with German,
8
however, will immediately recognize the German word (Kartoffel) and translate it to
Dutch correctly.
The third factor, linguistic distance, is independent of the specific speaker and
listener involved, but refers only to how distant the two languages are from each
other. What exactly this means and how it can be measured has a whole history of its
own, which will be outlined in section 2.3. First, I will discuss the history of
intelligibility research in general.
2.2 Previous research on intelligibility
Research on intelligibility amongst the Scandinavian languages has a long history.
Schüppert (2011) summarizes this history in her introduction. One of the first studies
on this topic was carried out by Haugen and published in 1953 (in Norwegian,
published in English as Haugen 1966). He sent out a questionnaire in Norway,
Sweden and Denmark asking people in the first place about their personal
experiences with receptive multilingualism: had they ever communicated with people
speaking one of the other two languages, how well had they understood each other,
and which problems had occurred. These were very important questions, as no
research into this had been done up to that point for the Scandinavian languages.
Furthermore, the questionnaire asked about the informant’s opinion on the other
languages (i.e. attitude) and amount of contact with them:
“He was further asked to indicate the approximate amount of instruction
he had received in the languages, how much he read in each, whether he
enjoyed inter-Scandinavian radio programs, and whether he listened to
broadcasts from the neighbouring countries.”
(Haugen 1966:283)
He found that of these three languages, Norwegian and Swedish seemed to be
the most intelligible to each other, and the combination of Swedish and Danish was
the most problematic.
9
Haugen’s (1966) study was based on a questionnaire. All of the data was
indirect; it consisted of reports by the respondents of their personal experiences.
That is, he used the method of ‘asking the informant’ (Voegelin and Harris, 1951):
asking speakers how well they think they can understand a certain language variety,
or how well they think they can be understood by speakers of a certain language
variety. A more extensive version of essentially the same method is to present
informants with a sample of the particular language variety, instead of naming it. This
helps them to focus more on the actual linguistic information, instead of the nonlinguistic connotations they may have with the variety. Tang and Van Heuven (2009)
observe that "[l]isteners appear to have reliable (i.e. reproducible) ideas about how
much language B differs from their own, even if they know the stimulus language
from past exposure, and even if the recording quality of the speech samples may
differ substantially" (p. 710). However, these judgments do not necessarily match
actual intelligibility.
A more direct way of determining intelligibility between two language
varieties is ‘testing the informant’ (Voegelin and Harris, 1951): have the informants
listen to a certain language variety and determine how much they actually
understand, by for example, asking questions about the text or asking them to
translate parts of it. This gives more objective results than asking informants for their
perception of a variety, but it is more complicated to carry out. Instead of asking
questions, an experiment needs to be designed. In addition, as Tang and Van Heuven
(2009:711) point out, the number of speaker-listener combinations to be tested
grows exponentially with the number of language varieties included. We can ask one
speaker the same questions about different varieties, but we cannot ask him to
translate the same text in different varieties, because of obvious priming effects. Tang
and Van Heuven (2009) correlate ‘test the informant’ results with their ‘ask the
informant’ results from their previous study (Tang and Van Heuven 2007), and
although they find reasonable correlations (between .74 and .82), they conclude that
objective intelligibility testing cannot be completely replaced by asking for people’s
opinions.
10
Maurud (1976) carried out such an experimental study of the same three
Scandinavian languages: Danish, Swedish and Norwegian. He had informants who
lived in the capital cities (respectively Copenhagen, Stockholm and Oslo) translate
texts from both other languages to their own language. His results agreed with
Haugen’s (1966) results in that the combination of Swedish and Norwegian is the
most successful and the combination of Swedish and Danish the most problematic,
but unlike in Haugen’s study, the scores were not symmetrical. This is true for the
spoken texts especially: Swedes had a much harder time with Danish (understanding
about 23%) than Danes had understanding Swedish (43%; Schüppert, 2011). Maurud
himself seems to attribute this result to non-linguistic factors:
“Swedes’ low understanding of the neighbour languages is a sign that the
habit of hearing them and the attitude towards the need for
understanding them are of major importance for the Scandinavians’
ability to communicate with each other in their respective languages.”
(Maurud 1976:71, translated in Schüppert 2011:5)
One problem with Maurud’s (1976) study is the fact that all his informants
lived in the capital cities of each country (Schüppert 2011). The capital of Denmark,
Copenhagen, is very close to Sweden. There is likely to be some contact between
Danes and Swedes there, and the people living in that region have access to TV and
radio programmes in Swedish. Sweden’s capital Stockholm, on the other hand, is
quite far away from both Denmark and Norway. The advantage that Danes seem to
have over Swedes, then, could be simply due to the geographical location of the
particular participants in this study and the amount of contact with the other
languages that that implies.
Bø (1978) addressed this issue by testing two groups of informants from each
country: a group of people who lived in the border region, and a group of people who
lived more inland. The people living in the border region did indeed perform better in
the intelligibility tasks, indicating that Maurud’s (1976) results should be interpreted
with care. However, the asymmetry between Danish and Swedish persisted
nevertheless.
11
Although this research established that both previous contact with and the
listener’s attitude to the speaker’s language influenced the level of intelligibility, it
was still unclear in how far intelligibility is determined by purely linguistic factors. An
attempt to fill this hole was made by Gooskens (2007a). She used the results from an
extensive set of intelligibility experiments carried out some years earlier (Delsing and
Lundin Åkesson 2005). This study included only background questions on attitude
and contact (numbers 1 and 2 of the list in 2.1); no attempt was made to study the
influence of linguistic factors (number 3). Gooskens correlated their results with
objective measures of linguistic distance, both lexical and phonological, and found
that phonetic distance was indeed the best predictor for intelligibility between
Swedish, Danish and Norwegian (r = -.80).
Probably part of the reason why linguistic factors have been neglected in
intelligibility research is the fact that objectively measuring the distance between two
languages, like Gooskens (2007a) did, is not quite a straightforward matter. In the
next section, I will elaborate on this issue.
2.3 Measuring linguistic distance
The history of measuring linguistic distance is closely tied to the history of
dialectology. This is not surprising. A core issue in dialectology is determining how
different dialects are related and should be grouped together, and at which point two
varieties should be considered no longer dialects of the same language, but two
different languages altogether. To do that, the researcher needs to determine which
criteria are used to make the distinction between language and dialect. An initial
approach by for example Haugen (1966), discussed in the previous section, was to
simply ask speakers how different they thought a certain variety was from their own.
This moves the criteria problem, however. There is no way to know what the
informants base their answers on – for example, to what extent non-linguistic factors,
such as a general dislike for speakers of a certain dialect, influence the answers given
by the informants. Haugen’s questionnaire measured overall intelligibility, not
12
specific linguistic distance. A more objective way of determining distance between
dialects was needed.
An early criterion to determine language distance was in fact intelligibility
itself: if two people can understand each other, they must be speaking the same
language (see e.g. Voegelin and Harris, 1951). A problem for this strategy is posed by
dialect continua. Going from the west of the Netherlands to the east of Germany, for
example, every dialect is mutually intelligible with its neighbouring dialects. Does this
mean Dutch and German are one language? Yet the dialects at each end of the
spectrum are completely unintelligible to one another. Going from north to south, the
situation seems similar: a speaker of a West Flemish dialect (spoken in Belgium in the
south of the Dutch language area) will have a hard time communicating with a
speaker of a Groningen dialect (spoken in the north) without switching to a standard
language. Yet both varieties are considered dialects of the same language: Dutch.
As pointed out as early as 1959 by Wolff (1959), intelligibility is not a reliable
measure for linguistic distance. Too many other factors play a role. Some of these are
attitude and previous contact, as also mentioned by Gooskens (2007a, see section
2.1). Languages are not isolated things, they are used in the context of a certain
culture. An objective, computational method would be a more reliable way to
measure only linguistic distances. The problem with this is formulated by Tang and
Van Heuven (2009) as follows:
“In spite of its apparent success and conceptual simplicity, the notion of
linguistic distance, i.e. the inverse of similarity shared between
languages, has persistently eluded quantification. The problem is that
languages do not differ along just one dimension. Languages may differ
formally in their lexicon, phonetics and phonology, morphology, and in
their syntax. And again, at each of these linguistic levels, the ways in
which languages may vary are further subdivided along many different
parameters.”
(Tang & Van Heuven 2009:710)
A group of dialects (a dialect continuum) is characterized by many small
changes from one dialect to the other (Heeringa 2004). When looking at the dialects
13
up close, it is often not very clear which differences should be given the most
importance when classifying them. A commonly used method is to draw a line on a
map representing the border between two particular representations of a linguistic
item: isoglosses. A group of isoglosses together is a bundle and signals a possible
dialect border (after all, the varieties on either side of the bundle differ on several
points). A problem with this is that isoglosses do not always group together nicely
into bundles. And even if they do, it is not always clear when this bundle should be
considered a border between dialects. Chambers and Trudgill (1980) put the problem
as follows:
“It is undeniable that some isoglosses are of greater significance than
others (…). It is equally obvious that some bundles are more significant
than others (…). Yet, in the entire history of dialectology, no one has
succeeded in successfully devising a satisfactory procedure or a set of
principles to determine which isoglosses or which bundles would
outrank some others. The lack of a theory or even a heuristic that would
make this possible constitutes a notable weakness in dialect geography.”
(Chambers and Trudgill 1980:112, quoted in Tang and van Heuven 2009:710)
Two languages being similar in one respect does not entail their being close to
each other on the other levels. Moreover, the way to measure and the criterion for
closeness to be used are different on every level. We need to determine, then, on
which level(s) the distance should be measured, and how to measure it.
A basic way to measure distance is on the lexical level: counting the number of
cognates two languages share (Séguy 1973). This approach was often used as a way
to determine how languages are related to each other, and it is the methodology
behind the Swadesh list (Swadesh, 1971): a list of 100 basic words that are not likely
to be borrowed from other languages. When measuring objective distance, the words
must be chosen randomly, but the principle is the same. This approach has been used
in intelligibility research, such as Gooskens (2007a). It has been shown to indeed
correlate with intelligibility, but it is not a very reliable predictor. When trying to
understand a different language, it is not only important how many words are
different from those in your own language, but it also matters very much which words
are different. If a few keywords of a text are unintelligible to the listener, he will not
14
be able to understand the text as a whole, even if most of the function words are clear
to him (Gooskens, Heeringa and Beijering 2008).
Heeringa (2004) presents a history of computational methods used in
dialectology. According to him, the first who used a computational strategy to
determine dialect distance was Séguy and his associates in creating their Atlas
linguistique de la Gascogne (published in six volumes between 1954 and 1973). He
mapped many different features of French dialects and calculated distances by
counting the number of items on which two neighbouring dialects disagreed. These
items were taken from all linguistic levels: lexicon, pronunciation, phonology,
morphology and syntax. The higher the percentage of differing items, the more
distant the two dialects are. When these distances are visualized in a map, separate
dialect areas can be distinguished.
Goebl (1982, 1993) took a similar approach to Séguy (although developed
independently (Heeringa 2004)) by comparing individual items across dialects. He
did not count the items that differed, however, but the items that were the same. His
scores do not reflect dialect distance, then, but its opposite, dialect similarity.
Hoppenbrouwers and Hoppenbrouwers (1988, 2001) developed the corpus
frequency method in order to calculate dialect distances (Heeringa 2002, 2004). In
essence, this method compares two languages based on text corpora. It started with
the letter frequency method, in which the frequencies of individual letters in the
corpora are compared. An issue with this method is the fact that different languages’
orthographies do not represent those languages in the same way – the same sound
can be spelled in different ways, or the same spelling used for different sounds. A
more accurate comparison, then, would be on the phone level: the phone frequency
method. This is essentially the same as the letter frequency method, but instead of
text corpora, phonetic transcriptions of texts are compared. This method still has a
disadvantage, however: it gives every difference the same weight. Some sounds,
however, are obviously closer to each other than others: the difference between [e]
and [ɪ] is much smaller than that between [e] and [u], for example. The phone
frequency method does not take this into account (Heeringa, 2004).
15
A more refined version of this method, then, is the feature frequency method. It
breaks down the individual phones into phonological features (front/back,
rounded/unrounded, plosive/fricative, et cetera). Calculating the frequencies of these
features in texts in the different languages results in a more reliable measure for
dialect distance. Using this method, Hoppenbrouwers and Hoppenbrouwers (2001)
mapped and classified 156 varieties of Dutch as spoken in the Netherlands and
Belgium.
A disadvantage of the feature frequency method is that it does not take the
order of speech segments into account (Heeringa, 2004). If two corresponding words
in two languages contain exactly the same features, but in a different order, the
feature frequency method will not be able to take this difference into account. A
simplified example, using letters instead of features, is English wart and its Dutch
translation wrat, or Dutch drie ‘three’ with its German equivalent drei.
Kessler (1995) introduced a more accurate method to measure dialect
distances: the Levenshtein distance. Heeringa (2004) refined his method and applied
it to Norwegian and Dutch dialects. Its mechanism consists of mapping words of both
languages onto each other and counting how many individual elements (e.g.
phonemes or graphemes) need to be changed, removed or inserted to get from one
language to the other. The method is described in more detail in Chapter 6.
Measuring distance with the Levenshtein algorithm has been done in
intelligibility research (amongst others by Gooskens (2007a), which was described
above) and been shown to be an accurate measure of linguistic distance. In many
cases, it predicted intelligibility better than the lexical distance (Gooskens 2007a,
2007b; Beijering, Gooskens and Heeringa 2008; Kürschner, Gooskens and Van
Bezooijen 2008).
The Levenshtein algorithm then seems to be the best method of measuring
language distance we have so far. A disadvantage it has, however, is that it is
symmetrical. It does not take into account which of the two languages measured is
the speaker’s language and which is the listener’s language. It simply measures the
objective distance between two languages. Asymmetry between languages has,
16
however, clearly been established in past research. Spoken Danish, for example, is
harder to understand for Swedes than spoken Swedish is for Danes (Maurud 1976; Bø
1978; Börestam 1987; Delsing and Lundin Åkesson 2005; Gooskens et al. 2010;
Schüppert 2011; Gooskens and Van Bezooijen 2013). Gooskens, Van Bezooijen and
Van Heuven (accepted) show a similar asymmetry between German and Dutch: Dutch
is harder to understand for Germans than German is for Dutch listeners (while
controlling for non-linguistic factors such as previous contact, which would otherwise
be the more likely cause for asymmetry). The existence of asymmetry, even when all
non-linguistic factors have been accounted for, indicates that the Levenshtein
distance cannot be the only explanatory factor of the level of intelligibility. There has
to be something that explains the difference, something that takes into account the
direction of the communication.
Moberg et al. (2007) attempt to explain the asymmetrical intelligibility by
measuring the amount of entropy in each combination. As the languages involved in
these intelligibility studies are related and share a history, the differences between
them are not completely random. There is a certain regularity to it. Because sound
changes tend to be regular, a certain sound in one language can systematically
correspond with a certain different sound in another language. This systematicity can
aid the listener with understanding the language. The entropy calculations are a way
to measure this regularity: given a certain sound (or character) in language A, how
predictable is the corresponding sound (or character) in language B? The more
predictable this sound is, the lower the entropy. Higher predictability aids
intelligibility, therefore the hypothesis is that a low entropy measure corresponds
with a high intelligibility score.
Moberg et al. (2007) calculated phonetic entropies between Danish, Swedish
and Norwegian and generally found relatively low entropy for combinations where
previous research has found high intelligibility and vice versa, supporting the
hypothesis. Moreover, they found asymmetric entropy between Danish and Swedish,
where asymmetric intelligibility is well-established. Their study did not include
enough languages to calculate correlations, however. Tang and Van Heuven (2009)
used a calculation similar to conditional entropy (calling it a phonological
17
correspondence index, Cheng 1997) and found that it correlated well with the results
from their intelligibility experiments on 15 Chinese dialects (r = .772 and r = .769).
One of the strengths of the entropy measurement is the fact that it is naturally
asymmetrical. I will demonstrate this using the correspondence between German and
Dutch in Table 1. In this particular set, there is no entropy for the vowels. In other
words, German <ü> always corresponds to Dutch <u>, German <o> always
corresponds to Dutch <oo> and German <u> always corresponds to Dutch <oe>; and
vice versa. A speaker of either language reading the words in the other language
needs in theory to have no doubt about which sound to look for in his or her own
vocabulary.2
Looking at the initial consonants, however, a different story unfolds. German
<d> always corresponds to Dutch <d>, German <t> always corresponds to Dutch <d>,
and German <z> always corresponds to Dutch <t>. So far so good. Dutch people
reading German can predict the sounds in their own language with 100% certainty.
The other way around, however, this is not the case. Dutch <d> corresponds to
German <d> in 50% of the cases and to German <t> in the other 50% of the cases. A
German reader encountering a word containing <d> in a Dutch text cannot be sure of
which character to map this unto in his or her own language. There is then some
entropy in the direction from Dutch to German, but no entropy from German to
Dutch.
Table 1: A mini corpus consisting of three word pairs in three languages.
German
dünn
tot
zu
Dutch
dun
dood
toe
English
thin
dead
to
Entropy is thus naturally an asymmetrical measure. This is an advantage
compared to the Levenshtein distance, which is completely symmetrical. As explained
above, Levenshtein distance has already been shown to be a good predictor of
2
In order to be aware of this, the reader needs to have some prior experience with the other language: if
he has never encountered it before, he cannot know that for example <u> corresponds to <oe>.
18
intelligibility. However, because the entropy measure, like intelligibility, is
asymmetrical, it might be able to provide some more predictive power in addition to
the Levenshtein distance. The main purpose of this thesis is to find evidence for this.
2.4 The MICReLa project
MICReLa stands for: Mutual intelligibility of closely related languages. It is an
extensive project at the Center for Language and Cognition Groningen at the
University of Groningen. It is funded by the Netherlands Organization for Scientific
Research (NWO). This thesis originated in this project and draws on its materials and
preliminary results. In this section, a description of the project will be given, in order
to show how this thesis fits into the bigger picture. For more information, see the
Micrela project description3 and Heeringa et al. (2013).
The project was started in 2011 and is scheduled to last for five years, until
2016. The project leader is Charlotte Gooskens. The project originated from the
intelligibility research described in section 2.2 (such as Gooskens, 2007a). This
research focused mostly on Scandinavian languages and showed promising results
for these languages. In the Micrela project, the research is extended to the three major
groups of closely related languages in Europe: Germanic languages, Romance
languages and Slavic languages. The main aim is to “develop a model of intelligibility
of closely related languages” (Micrela project description, p. 6). This thesis research
takes place exclusively within the Germanic languages group.
One of the things this project focuses on is how to explain the asymmetrical
mutual intelligibility found by previous research. One of the research questions is:
“What explanations can be found for asymmetric intelligibility?” (Micrela project
description, p. 7). This thesis hopes to contribute to finding an answer to this
question by determining the effect of the amount of entropy.
The Germanic part of the project includes five languages, divided over the two
main sub-branches within the Germanic family: English, Dutch and German as West3
http://www.let.rug.nl/gooskens/project/pdf/Gooskens_Vrije_Competitie.pdf
19
Germanic languages, and Danish and Swedish as North-Germanic languages.
Intelligibility is tested by means of three experiments: a word translation task, a cloze
test and a picture task. An effort is made to find data for every language combination
in both directions. The methodology of the experiments included in this thesis, the
word translation task and the cloze test for the Germanic languages, can be found in
Chapter 6.
3. Research questions
The history described in Chapter 2 has led to the first research question:
Are orthographic entropy measures a useful predictor of written intelligibility in
addition to Levenshtein distance?
As is clear from this question, this thesis will be concerned with written intelligibility
only, and correspondingly, the orthographic distances between the languages (as
opposed to distances based on phonetic transcriptions of the words). Five Germanic
languages are included: English, Dutch, German, Danish and Swedish.
Levenshtein distance has in previous research been shown to be a reliable
predictor of intelligibility (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa
2008; Gooskens, Heeringa and Beijering 2008; Kürschner, Gooskens and Van
Bezooijen 2008). However, it does not automatically capture the asymmetry present
in mutual intelligibility situations (Maurud 1976; Bø 1978; Börestam 1987; Delsing
and Lundin Åkesson 2005; Gooskens et al. 2010; Gooskens and Van Bezooijen 2013).
Entropy calculations (Moberg et al. 2007), however, are asymmetrical by default.
Therefore, they should contribute to predicting intelligibility, in combination with the
Levenshtein distances. This correlation will be negative: the lower the entropy for a
certain combination, the higher the intelligibility score. If there exist a high entropy in
a certain language combination, the orthographic correspondences between these
combinations are irregular and unpredictable. This is likely to make it harder for the
20
reader to decipher the language, as he cannot rely on regular correspondences.
Therefore, intelligibility is lower when the entropy is high. The results from Moberg
et al. (2007) suggest that this hypothesis is true, but as this study included only three
languages, no correlation between entropy and intelligibility could be calculated. In
this study, five languages are included, amongst which not only Scandinavian
languages, but the West Germanic languages German, Dutch and English as well.
Can lexical distance accurately predict written intelligibility?
This second question focuses only on the relationship between lexical distance and
intelligibility. When the lexical distance between two language varieties is high, this
means that they share relatively few cognates. Non-cognates are incomprehensible
for a reader who has not learned the language in question, therefore a high number of
non-cognates means low intelligibility. A negative correlation between lexical
distance and intelligibility is therefore expected. Previous research has often found
this negative correlation: the higher the lexical distance between two language
varieties, the lower intelligibility. Tang and Van Heuven (2009), for example, found
correlations of .78 and .75 for 15 Chinese dialects, and Gooskens, Heeringa and
Beijering (2008) found a correlation of -.64 for 18 Scandinavian language varieties. In
Gooskens (2007a), investigating six Germanic languages, the correlation with lexical
distance was not significant (p = .11), but the tendency was in the same direction. The
results of the present study should be in line with previous research, and show a
negative correlation between lexical distance and intelligibility.
Can orthographic Levenshtein distance accurately predict written intelligibility?
Levenshtein distance is a way to measure the amount of difference between two
languages. As with lexical distance, a negative correlation is expected: a greater
orthographic distance between two languages should hamper written intelligibility.
This is in fact what has been found in previous research, and often, the correlation of
Levenshtein distance with intelligibility was higher than that of lexical distance with
intelligibility (Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008;
Gooskens, Heeringa and Beijering 2008; Kürschner, Gooskens and Van Bezooijen
2008). In Gooskens, Heeringa and Beijering (2008), for example, the correlation
21
between intelligibility and Levenshtein distance was -.86, where the correlation
between intelligibility and lexical distance was -.64. Gooskens (2007a) found no
significant correlation between lexical distance and intelligibility, but she did find a
correlation between Levenshtein distance and intelligibility of -.64. The results of the
present study should be in line with previous research, and show a negative
correlation between orthographic Levenshtein distance
and intelligibility. In addition, this correlation
should be greater than that of lexical
distance with intelligibility.
4. Languages
4.1 Overview
The languages included in this
thesis
are
languages
five
spoken
Germanic
in
the
northern and western parts of
Europe. The map in Figure 1
shows where these languages
Figure 1: Map of Northwestern Europe. The standard
languages of the five marked countries are included in
this study. Starting from the left, counterclockwise: the
United Kingdom (English), the Netherlands (Dutch),
Germany (German), Denmark (Danish) and Sweden
(Swedish).
are spoken.4 These languages
represent the two different branches of the Germanic languages that still exist today:
West Germanic and North Germanic (see below). First, a short characterization of
each language will be given, based on information from the Ethnologue (Gordon
2005). Following this, a brief history of these languages will be given, showing how
4
That is, the countries with which the standard varieties included in this study are associated. All of these
languages are spoken in more than one country.
22
they are related to each other and how they were influenced by each other and by
other languages outside of the Germanic group.
English
English is spoken all over the world by some 335 million people as a native language,
and by many more as a second language. In this project, standard British English is
used, as spoken in the United Kingdom.
Dutch
Dutch has some 20 million speakers, most of whom (about 16 million) live in the
Netherlands. In this thesis, standard Netherlandic Dutch is used.
German
The German language has almost 80 million speakers, the majority of which (70
million) live in Germany. In this thesis, standard High German is used.
Danish
Danish is spoken by over 5.5 million people, almost all of whom live in Denmark. In
this thesis, standard Danish as spoken in Denmark is used.
Swedish
Swedish is spoken by over 9 million people, 8.8 million of whom live in Sweden. In
this project, standard Swedish as spoken in Sweden is used.
23
4.2 Germanic languages
The Germanic languages are a part of the Indo-European language family, to which
most European languages belong. They descended from one common ancestor
language, Proto-Germanic. This language split into two branches: East Germanic and
Northwest Germanic (Harbert 2007). The East Germanic branch has gone extinct and
is of no interest to the current study. The Northwest Germanic group further split into
two branches: North Germanic and West Germanic. West Germanic is the ancestor of
for example English, German, Dutch, and Frisian. The North Germanic branch consists
of the Scandinavian languages Danish, Swedish and Norwegian, as well as Icelandic
and Faroese. The split between West and North is thus the first and biggest division
within the language group included in this study, grouping Danish and Swedish
together on one side, and German, Dutch and English on the other. In the following
these two branches will be discussed separately, concluding with a discussion on
English only, because its development has been considerably different from the other
West Germanic languages.
24
Figure 2: The Germanic family tree (adapted from Harbert 2007:8).
25
4.2.1 North Germanic: Swedish and Danish
Around 500 AD, North Germanic, in turn, split into two varieties as well: east and
west (Vikør 2001). From the eastern dialect developed Norwegian, Icelandic and
Faroese, whereas the languages concerned in this study, Danish and Swedish, are
both descendents of the western branch. The distinction is not as clear-cut as that
between North and West Germanic, however. It is more of a continuum with two
extremes. Icelandic and Faroese have through their conservatism separated from the
others, quite possibly because of their location on islands, but the mainland
Scandinavian languages are still very close to each other:
“Rather than viewing Norwegian, Swedish and Danish as units, we should
think of these names as loose designations for groups of dialects,
arbitrarily distinguished on the basis of linguistic characteristics selected
by modern language historians.”
(Vikør 2001:34)
In the Middle Ages, however, a new split occurred, this time between the north and
the south (Vikør 2001). This essentially separates Danish from the two other
(standard) languages (some dialects in the south of Norway and Sweden show
characteristics of the southern group). The main changes separating Danish from the
other languages are phonological. First of all, vowels in unstressed inflectional
endings were merged into a schwa, just like in the West Germanic languages. Thus
Swedish timmar ‘hours’ corresponds to Danish timer (the <e> in this case is
pronounced as a schwa), and Swedish stjärnor ‘stars’ corresponds to Danish stjerner.
Secondly, unvoiced plosives following long vowels were weakened, leading to
correspondences like Swedish gripa ‘to seize’ and bita ‘to bite’ with Danish gribe and
bide. Finally, Danish developed the phenomenon of stød, a kind of creaky voice
present in some words. There are many minimal pairs differing only on this point, but
it is not present in the spelling. These sound changes can of course be expected to
cause problems with mutual intelligibility on the spoken level, and to the extent to
which they are represented in the spelling, on the orthographic level, too.
26
4.2.2 West Germanic: German and Dutch
West Germanic split into several different varieties as well, but as they were spoken
in one area with many possibilities for contact between groups of people, these
language varieties kept influencing each other continuously (Harbert 2007). This has
resulted in a dialect continuum covering a large area (stretching from the Alps in
Austria and Switzerland to the North Sea coast), and classifications into groups can be
hard. Newer contact-induced changes have blurred the earlier distinctions caused by
dialect splits.
In classifications of language varieties in these areas, the terms ‘High’ and
‘Low’ occur frequently. These refer to geographical locations: ‘High’ varieties
originated in the relatively mountainous south of the area, whereas the ‘Low’
varieties originate from the flat, lower lying north (Harbert 2007). In the middle ages,
one of the low varieties (Middle Low German, Harbert (2007)) became the lingua
franca of the Hanseatic League, heavily influencing the mainland Scandinavian
languages. Nowadays, the status of the descendents of this variety has been reduced
to being considered dialects of the standard language of the country in which they are
spoken (either Dutch or German), despite their separate origin (Harbert 2007, see
also Figure 2).
Currently, two national standard languages are dominant in this area: Dutch
and German.5 They can be considered part of one dialect continuum, together with all
the other dialects that are still in use.
Standard German, the official language of Germany today, is based mostly on
the higher and middle varieties. In many parts of the country, however, dialects are
still in common use, and their speakers can be considered bilingual, even if their
native language is generally considered a mere dialect.
Standard Dutch, on the other hand, developed from Low Franconian varieties,
which were spoken along the western coast of Belgium and the Netherlands.
5
Both of these language have more than one local standard. Belgian Dutch and Netherlandic Dutch, for
example, are considered different standards of the same language. In this study, Netherlandic Dutch and
German as spoken in Germany are used.
27
Although the standard languages of Germany and the Netherlands thus developed
from very different varieties, several varieties are still present in both countries,
being considered dialects or regional languages.
One of the most salient differences between German and Dutch is the High
German Consonant Shift (Figure 3). It occurred around 500 AD (Van Gelderen 2006)
and involved the transformation of voiceless plosives [p, t, k] into, depending on
position, an affricate or a fricative (see Table 2). The consonant shift is absent in the
lower varieties, including Dutch, and complete in the southernmost (i.e. ‘highest’)
varieties of German. Several varieties in between have partially completed the shift
(see Figure 3). One of these is standard German, which includes all changes except for
the shift of [k] to [kχ] (hence the unexpected unaffricated [k] in Kopf ‘head’ and
backen ‘to bake’, see the third column of Table 2).
This consonant shift may cause confusion when a speaker of either language
encounters the other for the first time, as it affects many words and the
correspondence is not immediately clear. It is a very regular correspondence,
however, and the words involved have not changed beyond recognition.
28
Figure 3: The Rhenish Fan, showing the partial completion of the High German Consonant Shift
in the southwest of Germany (Van Gelderen 2006: 39).
Table 2: Some cognates between Dutch (left) and German (right) that demonstrate the effects
of the High German Consonant Shift.
p > pf/f
t > z/s (<z> is pronounced [ts])
k > ch (<ch> is pronounced [χ])
peper – Pfeffer 'pepper'
tien – zehn 'ten'
maken – machen 'to make'
dapper – tapfer 'brave'
tuin 'garden' – Zaun 'fence'
boek – Buch 'book'
kop – Kopf 'head'
zitten – sitzen 'to sit'
zoeken – suchen 'to search, seek'
schaap – Schaf 'sheep'
laten – lassen 'to let, leave'
kop – Kopf 'head'
heet – heiß 'hot'
bakken – backen 'to bake'
29
4.2.3 West Germanic: English
English originates from one particular sub-group of West Germanic: North Sea
Germanic (Harbert 2007, see Figure 2). These varieties were spoken along the North
Sea shore. This group still has descendants on the main land in the north of the
Netherlands and Germany, but as contact with the other varieties spoken there has
influenced them so strongly, they are generally considered mere dialects of the
standard language of the country in which they are spoken. The exception to this is
constituted by the Frisian languages, but even these have been heavily influenced by
Dutch and German.
Some groups of speakers of a North Sea Germanic language, however, crossed
the North Sea and landed in England around 450 AD (Van Gelderen 2006). Over the
following centuries they expanded and their languages gradually replaced the Celtic
languages spoken on the British Isles before that time. Some of these languages are
still very much alive (such as Irish, Welsh, Scottish Gaelic), but English is the
dominant language almost everywhere in the area. As it was relatively cut off from
the other West Germanic languages, it has had its own independent developments.
First of all, English has been influenced more by Celtic than by the other West
Germanic languages, being so close to the area where Celtic languages were spoken.
This influence shows mainly in loan words and names, although it is argued that the
syntax was influenced as well (Van Gelderen 2006). With the spread of Christianity
came some Latin words, as in all of the other Germanic languages, but it was nothing
compared to the later influence of Latin during the Renaissance.
In the 8th century, speakers of Old Norse (‘the Vikings’) came from Norway and
Denmark to the north of Britain and settled there. Their language has had a
considerable influence on English (Van Gelderen 2006). For one, English borrowed a
lot of words. This often resulted in Scandinavian words replacing their own cognates
in the English language: Old Norse egg, for example, replaced its Middle English
cognate ey. In some cases, both words coexist with slightly different meanings, such
as shirt (West Germanic) and skirt (North Germanic). In addition to words, however,
Scandinavian has influenced English grammar as well. In this time period, a
30
simplification of word endings spread from the north to the rest of the island. This is
probably caused or enhanced by contact with Scandinavian (Van Gelderen, 2006).
Even today, English morphology is less extensive than it is in the other West
Germanic languages.
In 1066, the Normans, speaking a variety of French, defeated the English king.
The English nobility was replaced by Normans and French became the dominant
language, although English remained the language of the masses. Because this
situation lasted for a few hundred years, French had an extensive influence on
English, mainly in the vocabulary: possibly up to 10 000 words (Van Gelderen 2006).
Unlike what was the case with the Scandinavian influence, the native English words
were not replaced by Germanic words, but by Romance words, setting English apart
from the other Germanic languages. Some of the many words borrowed in this period
are royal, tax, judge, grammar, art, poet, dinner, confess, mercy, age, damage. In
addition to whole words, affixes were borrowed as well. Most of these stick to words
of Romance origin (disinterest, solemnity) but there are some hybrids of Germanic
words with Romance affixes (disbelief, oddity) or Romance words with Germanic
affixes (useless, apprenticeship).
In the Renaissance, English further borrowed many words directly from Latin,
as did the other Germanic languages. The same is true for the new words needed for
technological advancements in the 19th and 20th centuries (Van Gelderen 2006). As
these words were borrowed relatively recently, they are still quite similar in all these
languages, especially in their written forms.
One of the biggest developments setting English apart from the other
Germanic languages, however, did not come from the outside, but from within the
English language. It is a sound change known as the Great Vowel Shift. This was a
chain shift in which the long vowels were raised and the ones that could not be raised
any further, /i/ and /u/, were diphthongized. Some examples of this are lane /leɪn/
(Dutch laan /la:n/), wine /waɪn/ (Swedish vin /vi:n/), mouse /maʊs/ (Danish mus
/mu:s/), sea /si:/(German See /se:/ ‘lake’).
31
4.3 Germanic orthography
As this study concerns only the written version of these languages, it is important to
know the background of their orthographies. All Germanic languages are written
using the Roman alphabet. This alphabet was originally developed for Latin, the
language of writing in the (early) Middle Ages (Molewijk 1992, Scheuringer and Stang
2004). When in the later Middle Ages it became more customary to write in the
common languages of the people instead of, or in addition to, Latin, there was no
universal spelling standard the writers could adhere to. They had to invent their own
way to write these languages, using the alphabet they already knew. The Latin
alphabet, however, is not perfectly suited for Germanic languages. These languages
contain sounds that are not present in Latin. For example, there were no letters for
the sounds /j/ and /w/ (the current letters developed from Latin I and V (= /u/)
respectively (Scheuringer and Stang 2004), hence still the English name ‘double u’ for
the letter ‘w’). For other sounds, digraphs were established (such as <ng> for /ŋ/) or
letters from other alphabets were introduced (such as <þ> (thorn) from the Runic
alphabet). Also, there was no universal way to distinguish between short and long
vowels, a very important distinction for these languages (Scheuringer and Stang
2004). Every writer had to come up with his own solution to the problems this
caused. This, in addition to the fact that every writer based his spelling on his own
dialect as there were no standard languages yet, resulted in a wide range of variation.
Some of the spellings included in the Oxford English Dictionary (OED) for book, for
example, are: boocke, bouke, boock, beuk, buik, bewk, bouck, bouk, bowyk, buike,
buk, buyk, bvik, bwck, bwik, bwike, bwk, booke, buick, book, buik, buke, beuk, beuck.
When the ability to write and read became more wide-spread, and the
invention of the printing press in 1476 made it easier to produce and copy books for a
larger audience (Van Gelderen 2006), standards started to be developed. This was
true for the languages as a whole, but especially for their writing systems.
Standardization initially happened mostly unofficially (Van Gelderen 2006): spellings
emerged by convention. This means that the spellings that originated in the most
prestigious regions of this time had the most influence on the standardized versions.
These regions produced the most books and other writings, thus their spelling
32
conventions became the most widespread. This is similar to how the dialect of the
most prestigious region ends up being the basis of the standard language. Only after
an initial standard had been established, people started to consciously influence it.
When and how this happened and what the current attitude to the spelling of a
language is, differs for each of the languages in this study. Therefore, I will discuss
their recent history and current situation one by one below.
4.3.1 English
The spelling of English is notoriously irregular (Van Gelderen 2006). Although the
development of its writing system started out similarly to those of the other Germanic
languages, several circumstances have contributed to its being irregular nowadays.
Its standardization started quite early – in the early 15th century (Van Gelderen
2006). Although many attempts have been made, no real spelling reform happened
after the establishment of this standard. The spelling therefore essentially reflects the
pronunciation of the language in the 15th and 16th century. The Great Vowel Shift
(described in section 4.2.3) which changed almost all long vowels in the language,
happened after this time. Because of this, the pronunciation of many letters in English
no longer matches the way these particular letters are pronounced in the other
languages (see Table 3).
Table 3: Some words showing the difference in pronunciation between English and
other Germanic languages of some vowel graphemes.
English
Dutch
Swedish
state /stejt/
staat /sta:t/
stat /stɑ:t/
cook /kʊk/
kook /ko:k/
week /wi:k/
week /we:k/
wine /wain/
vin /vi:n/
Other contributing factors are etymological respelling and borrowing words
from other languages without changing the spelling (Van Gelderen 2006). This
happens in all languages in this study to some extent, but it is more widespread in
English.
33
Etymological respelling happens when the spelling of a word is changed, not
according to its pronunciation, but according to its (supposed) origin. The word debt,
for instance, was borrowed from French without the b (as French had already lost it
at that point). Learned writers however, recognizing its connection with the Latin
words it derived from, added the b in the written form of the word, to show this
connection more clearly. Doing this, however, moves the spelling away from the
pronunciation. In addition, this did not happen consistently: for some words, the
respelling became standard, and for some it didn’t. The word receipt, for example, has
a silent p, but conceit does not. In some cases, these respellings were based on a
mistaken etymology. The s in island, for example, originates from its supposed
connection to the French loan word isle, when in fact the first part of the word is a
Germanic root that never contained an s (Old English íg, íeg (OED)).
Loan words for which the pronunciation has been adapted to English, but the
original spelling has been retained (Van Gelderen 2006), cause further irregularities.
This spelling then does not match the pronunciation of the word in English. Examples
of this are suite, glacier, phoenix. For words like this, spelling and pronunciation needs
to be learned separately. The other Germanic languages have the same problem, but
to a lesser extent: the spelling of words are adapted to the language’s own spelling
system more easily. Dutch, for example has words like foto ‘photo’ and orthografie
‘orthography’ and kwarts ‘quartz’; and Swedish has byrå ‘bureau’ and buljong
‘bouillon’.
Another issue, which is specific to English, is related to the Norman Conquest.
For almost 500 years, the administrative language in England was French, and many
of the people who knew how to write, had first learned this skill in French. Naturally,
they applied conventions of French spelling to English (Van Gelderen 2006). This
results in the many cases where <qu> is used for the sound sequence /kw/, even in
Germanic words (such as queen) and the spelling of <ou> for (at that time) /u/ (such
as mouse).
34
4.3.2 Dutch
The first bible translation into Dutch was published in 1637 (the Statenvertaling,
State’s translation, because it was funded by the state). As this translation was meant
to be used throughout the Dutch language area, an effort was made to use a
somewhat ‘neutral’ Dutch, with elements from different dialects (Molewijk 1992).
Because of the wide influence of this bible translation, this has become the basis of
modern Dutch. The spelling, as does the spoken language, consists mainly of
characteristics of the (south)western varieties of Dutch, because this was the
economic centre. The last big spelling reform happened in 1946-1947, uniting
Netherlandic Dutch and Belgian Dutch spelling (Molewijk 1992).
Spelling changes made after this are minor, concerning mostly the spelling of
foreign loan words and the spelling of compound words. Proposals for more phonetic
spellings have been made, but receive so much opposition from the public, that they
were never carried through. The spelling, then, is not completely regular and
phonetic: especially words of foreign origin are exceptions. In loans from French
(crèche ‘day care’, garage ‘garage’, comité ‘committee’), German (überhaupt ‘at all,
anyway’, sowieso ‘anyway’, föhn ‘hair dryer’) and more recently English (computer,
race, cake, poster) the original spelling is retained, even when this does not match the
pronunciation in Dutch.
Dutch uses diacritics when they are present in loan words (unlike English,
where the diacritic is usually dropped), but has not make any additions to the basic
26-letter alphabet for the spelling of native words.
4.3.3 German
The German spelling standard was established relatively late: at the end of the 19th
century, during German unification under Prussia (Scheuringer & Stang 2004). Before
that, there were some regional standards, but nothing covering the whole area of
present-day Germany. Scheuringer and Stang mention this as a reason for the
relatively
good
1:1-correspondence
in
German
between
graphemes
and
pronunciation. Like with Dutch, some minor changes have been made in the 20th
35
century, but like in Dutch, proposals for bigger changes meet with heavy resistance
(Scheuringer & Stang 2004). At the end of the 20th century, proposals were made that
in effect would make the spelling more regular, such as spelling all <ai> as <ei> (they
are pronounced the same). This would affect very many words, and the general public
was so much opposed to it, it was never carried through. A proposal in the 90s to
write all common nouns with a lower case (instead of the current practice to
capitalize them), was resisted as well. Eventually, only minor changes came in effect,
having to do with punctuation, word separation, and spelling of foreign loan words.
German spelling, as mentioned, is characterized by the practice to capitalize all
nouns. In addition, it has four extra letters compared to the English 26-letter
alphabet: ä, ö, ü and ß (pronounced /s/).
4.3.4 Danish
After World War II, an idealistic movement of the necessity for Scandinavian unity
grew strong in the Scandinavian countries (Vikør 2001). In Denmark, being so close
to Germany in a time so shortly after the war in which Germany was ‘the enemy’, this
movement was strongest and showed itself in a move away from German. A spelling
change was adopted in 1948 (Vikør 2001). Among other things, this involved the
decapitalization of nouns (which up to that point had been written with a capital, as is
still the case in German) and spelling <aa> as <å>, conforming with Swedish and
Norwegian. This reform initially met with opposition, but after some time was
nevertheless accepted everywhere. Since then, however, no serious reforms have
been made or even attempted, except for some notes on how to handle foreign loan
words.
The movement towards a Scandinavian unity sparked a spelling reform, but
that same movement makes further reforms unfavourable (Vikør 2001). The reason
for this is the sound changes Danish has undergone, which have separated it from the
other Scandinavian languages (see section 4.2.1). The spelling, like English spelling,
essentially reflects an older version of the language, rather than being an accurate
representation of the current pronunciation. This older version, however, is closer to
36
the other Scandinavian languages than the spoken version is. Moreover, a more
phonological orthography would, like in English, involve so many changes it is hardly
feasible:
“[A] completely phonological orthography would have to be so totally
different from the present one that it would be unreadable for the entire
Danish population – to learn it would be almost like learning a new
language. […] By such drastic reform, the Danes would exclude
themselves from their own literary heritage as well as from inter-Nordic
written communication.”
(Vikør 2001:190)
Apart from the <å>, which it shares with Swedish (and Norwegian), Danish has
two more letters not present in the other languages included in this study: <æ> and
<ø>.
4.3.5 Swedish
Swedish, as Danish, strives towards Scandinavian unity. The last big spelling reform
for Swedish took place over a hundred years ago (Vikør 2001), in 1906. The changes
in this reform resulted mostly in making the spelling more phonetic, that is, more
representative of how the words are actually pronounced. haf /ha:v/ ‘ocean’, for
example, became hav, and rödt /røt/ ‘red (adv.)’ became rött. Some of these changes
made it more similar to the other Scandinavian languages, and some made it more
distinct. Later attempts to make the spelling even more phonetic have not been
adopted into the spelling standard. Swedish is a bit more prone than the other
languages, however, to adapt foreign loan words to its own spelling system: English
hike became hajk, French directeur became direktör (Vikør 2001).
Another phenomenon that makes Swedish spelling relatively phonetic, is the
fact that the spread of the standard language in schools happened mostly in written
form (Vikør 2001). Neither the teachers nor the students were personally familiar
with the spoken version of the standard, and in an attempt to speak correctly, they
used spelling pronunciations that were different from how the words had been
pronounced up to that point. These spelling pronunciations eventually became the
37
standard language. Thus, for example, drottning ‘queen’ went from /drɔniŋ/ to
/drɔtniŋ/ and till ‘to’ went from /te/ to /til/.
The Swedish alphabet has the additional letters å, ä and ö, where <ä>
corresponds to Danish <æ> and <ö> corresponds to Danish <ø>.
4.3.6 Summary
All of these languages, then, developed a standard spelling, solving the issues
that arose from using the Latin alphabet. Some issues are solved in different ways in
the different languages, however. A long vowel, for example, is in English often
signalled by a silent ‘e’ following the consonant: cape, make, cake, duke, grape, rope,
etc. In Dutch, however, the vowel is doubled: kaap ‘cape’, maak ‘make’, leek ‘layman’,
meen ‘mean’, vuur ‘fire’, rood ‘red’. In German, a common strategy is to add an <h>
after the vowel: Zahn ‘tooth’, zehn ‘ten’, Lohn ‘wage’ (in Dutch: loon), mehr ‘more’ (in
Dutch: meer), lehren ‘teach’. In some cases, these differences might obstruct
intelligibility between languages, as they make words look more different than they
are. In other cases, however, the orthography can help understanding. Some sound
changes, which have made the spoken versions of the languages more different from
each other, are not reflected in the spelling – making the orthographic versions of the
languages more similar to the others than the spoken versions. This is especially true
for English and Danish, where the written language reflects an older version of the
spoken language.
38
4.3.7 The North Wind and the Sun
Below, the short fable of The North Wind and the Sun is printed in all five languages, in
order to give an impression of the orthographies of these languages.
English
The North Wind and the Sun were disputing which was the stronger, when a traveler
came along wrapped in a warm cloak. They agreed that the one who first succeeded
in making the traveler take his cloak off should be considered stronger than the other.
Then the North Wind blew as hard as he could, but the more he blew the more closely
did the traveler fold his cloak around him; and at last the North Wind gave up the
attempt. Then the Sun shone out warmly, and immediately the traveler took off his
cloak. And so the North Wind was obliged to confess that the Sun was the stronger of
the two.
(Ladefoged 1999)
Dutch
De noordenwind en de zon hadden een discussie over de vraag wie van hun tweeën
de sterkste was, toen er juist iemand voorbij kwam die een dikke, warme jas aanhad.
Ze spraken af dat wie de voorbijganger ertoe zou krijgen zijn jas uit te trekken de
sterkste zou zijn. De noordenwind begon uit alle macht te blazen, maar hoe harder hij
blies, des te dichter de voorbijganger zijn jas om zich heen trok. Tenslotte gaf de
noordenwind het maar op. Vervolgens begon de zon krachtig te stralen, en
onmiddellijk daarop trok de voorbijganger zijn jas uit. De noordenwind kon toen
slechts beamen dat de zon de sterkste was.
(Gussenhoven 1999)
German
Einst stritten sich Nordwind und Sonne, wer von ihnen beiden wohl der Stärkere
wäre, als ein Wanderer, der in einen warmen Mantel gehüllt war, des Weges
daherkam. Sie wurden einig, daß derjenige für den Stärkeren gelten sollte, der den
39
Wanderer zwingen würde, seinen Mantel abzunehmen. Der Nordwind blies mit aller
Macht, aber je mehr er blies, desto fester hüllte sich der Wanderer in seinen Mantel
ein. Endlich gab der Nordwind den Kampf auf. Nun erwärmte die Sonne die Luft mit
ihren freundlichen Strahlen, und schon nach wenigen Augenblicken zog der
Wanderer seinen Mantel aus. Da mußte der Nordwind zugeben, daß die Sonne von
ihnen beiden der Stärkere war.
(Kohler 1999)
Danish
Nordenvinden og solen kom engang i strid om, hvem af dem der var den stærkeste.
Da så de en vandringsmand, der kom gående, svøbt i en varm kappe. Og de enedes
om, at den der først kunne få kappen af ham skulle anses for den stærkeste. Først tog
nordenvinden fat, og han blæste og blæste, men jo mere han blæste, des tættere holdt
manden kappen sammen om sig. Til sidst måtte nordenvinden give fortabt. Så tog
solen fat. Og han skinnede og skinnede, og til sidst fik manden det for varmt og måtte
tage kappen af. Da måtte nordenvinden indrømme, at solen var den stærkeste af de
to.
(Grønnum 1998)
Swedish
Nordanvinden och solen tvistade en gång om vem av dom som var starkast. Just då
kom en vandrare vägen fram, insvept i en varm kappa. Dom kom då överens om, att
den som först kunde få vandraren att ta av sig kappan, han skulle anses vara starkare
än den andra. Då blåste nordanvinden så hårt han nånsin kunde, men ju hårdare han
blåste desto tätare svepte vandraren kappan om sig, och till sist gav nordanvinden
upp försöket. Då lät solen sina strålar skina helt varmt och genast tog vandraren av
sig kappan, och så var nordanvinden tvungen att erkänna att solen var den starkaste
av dom två.
(Engstrand 1999)
40
5. Data
In order to calculate the linguistic distances, a corpus of parallel word lists in the five
languages included in the study is needed. In the Micrela project, a word list of a
hundred nouns is used to collect data on the mutual intelligibility of the languages in
the project. These nouns are taken from a list of all the words contained in the British
National Corpus6 (BNC) ordered by frequency. Roughly, they are simply the 100 most
frequent nouns in the corpus. These words were translated to the other languages in
the project, creating parallel word lists for all languages (see e.g. Heeringa et al.
(2013) for more details on the creation of these lists). The lists are being used to
calculate lexical distances and Levenshtein distances in publications of the project
(Heeringa et al. 2013). In many other publications involving lexical and Levenshtein
distance, these distances have been calculated with relatively short word lists as well.
Gooskens, Heeringa and Beijering (2008), for example, used the words from the text
The North Wind and the Sun which they used in their experiment - about 100 words,
depending on the language variety. Gooskens (2007a) used the words from the text
she used in her experiment, as well: a news item consisting of about 250 - 290 words,
depending on the language. These lists are too short to reliably calculate entropy
measures, however (Moberg et al. 2007; see also Chapter 6). Therefore, I created new
word lists consisting of 1500 words and used them not only to calculate the entropy,
but the lexical and Levenshtein distances as well. The word list size should not make a
significant difference for these distance calculations, but in order to be certain of this,
I will correlate these results with the lexical and Levenshtein distances calculated by
Heeringa et al. (2013), based on the much smaller set of 100 words.
This new word list was, again, based on the British National Corpus. In this
case not only the nouns, but words from all parts of speech were included. This will
make for a better representation of the languages, as nouns might behave differently
from other word classes when it comes to linguistic similarity. For example, loan
words are very often nouns. The words were translated to Dutch, German, Danish and
Swedish with help from internet sources, dictionaries and native speakers of Dutch
6
http://www.natcorp.ox.ac.uk/
41
and German. During this process, some words were removed from the list because
they proved to be too hard to be translated reliably. These cases consisted usually of
words from the original English list which simply do not exist, or at least do not exist
in the same form, in one or more of the other languages. A word like ‘whatever’, for
example, does not have a clear translation in any of the other languages, and even if it
has, it can only be translated by a multi-word expression. In Dutch, for example, the
‘translation’ consists of three words which are intervened by other words in the
sentence (see the example sentence below). Another example of a problematic word
is the verb ‘to face’, which does not have a clear translation covering its meaning in
the other four languages. It can be translated by many different verbs and
expressions, depending on subtle differences in the context.
(1)
EN
Paint your house in whatever colour you like
DU
Verf
je
huis
in
paint your house in
wat
voor kleur je
what for
colour you
maar wilt
just
want
Another translation issue is caused by certain function words which might not
even exist in the other languages. The translation of English modals, for example
(such as ‘could’, ‘might’, ‘should’), depends highly on the context. Translating them as
a separate word, as is necessary for this list, is difficult. Therefore, these were
removed as well. In total, 51 words were removed from the list. In order to replace
them, new words were added at the end of the list (simply the next words in the BNC
frequency list). Because a margin was taken to anticipate words being possibly
excluded at a later step in the process, the final list ended up containing 1510 words.
In many cases, the English original word is ambiguous, and its meanings are
covered by several different words in one or more of the other languages. In this case,
one of these meanings was chosen and used consistently for the other languages. The
noun practice, for example, can mean (amongst others) the following things:
42
1. The carrying out or exercise of a profession, esp. that of medicine or
law
2. The actual application or use of an idea, belief, or method, as opposed
to the theory or principles of it
3. The habitual doing or carrying on of something
4. Repeated exercise in or performance of an activity so as to acquire,
improve, or maintain proficiency in it
(Oxford English Dictionary online edition,7 entry for ‘practice’, meanings 1-4)
In the other languages, different words are used to express these meanings. In
Dutch, for example, meanings 1 and 2 are covered by praktijk, meaning 3 translates as
gewoonte and the fourth meaning is expressed by oefening. In this case, the fourth
meaning was chosen for all languages. The choosing of one meaning was not done
systematically – in many cases it was simply the meaning that first emerged in the
translator’s head or the first translation given by the dictionary used. Care was taken,
however, to always choose one of the most common meanings of the word, and not
one of the more obscure ones.
Once one of the meanings of the English word was decided upon, it was
translated by the most common word in the target language that accurately
represents this meaning. If there were two or more alternatives that are both
common (i.e. not considered jargon), and one of these was a cognate to the word in
English or in one or more of the other languages, that word was chosen.
Because the basis of the word list was taken from an English-language corpus,
the inevitable result is that the final word list is somewhat centred on English. If it
had been based on a word list taken from a Swedish corpus, for example, it would
have contained different words. In all languages other than English, some words are
included more than once because they correspond to multiple lemmas in English. The
Dutch word leren, for example, means both ‘to learn’ and ‘to teach’ and is therefore
included in the list twice. Other frequent words, on the other hand, might not be
included at all, because their English equivalent has more than one possible
translation and one of the others has been chosen for the word list – as is the case
7
http://www.oed.com
43
with ‘practice’ above. This should not be considered a problem, however. The goal of
this part of the research is to create a list of words with corresponding meanings in
the five languages that are included. Which words these are exactly is not of
importance, as long as they are randomly chosen and are good representations of the
languages. I believe that in this case, these conditions have been met.
A list of the words that were excluded and the full word list in the five
languages and can be found in appendices A and B respectively.
6. Methods
As elaborated on in Chapter 2, there are several ways to go about measuring the
linguistic similarity between two language varieties. In this study, three methods
were used: lexical distance, Levenshtein distance, and conditional entropy measuring.
As this thesis focuses on written language, these were applied only on the
orthographic level, on the data described in Chapter 5. In the first part of this chapter,
I will describe these methods in detail. In the second part of this chapter, the methods
used in measuring intelligibility will be described. The experiments mentioned there
were carried out as a part of the MICReLa project described in section 2.4.
6.1 Methods for measuring linguistic distance
6.1.1 Lexical distance
A computationally simple way to measure linguistic distance is by measuring the
lexical distance. This has been used many times in the past. An example of this is the
well-known Swadesh list (Swadesh 1971): A list of 100 words, representing very
basic concepts, constructed for the purpose of comparing languages to establish their
44
relationships to each other. Lexical distance, as can be expected from its name,
consists of measuring distance on a lexical level.
When two language varieties are related to each other, they usually share
many cognates, but there will also be a part of the lexicon that consists of noncognates. This happens, for example, when one language has borrowed a word from a
third language, whereas the other language has maintained the inherited word.
English has many examples of this phenomenon, where mainly Latin and French
words have replaced the Germanic words. Compare for example to contribute with its
translations in the four other languages in this study: bijdragen, beitragen, bidrage,
bidra. It can also be the result of semantic shift, however: the cognate word is in fact
still present in both languages, but no longer has the same meaning. This results in
false friends, such as English queen with Swedish kvinna ‘woman’, or English town
with Dutch tuin ‘garden’ and German Zaun ‘fence’.
The idea behind measuring lexical distance is this: The more cognates two
language varieties share, the closer they are to each other.8 Lexical distance is then
simply the percentage of non-cognates between a given language pair. This is to be
measured for the 1500-word samples from each of the languages in this study.
In order to measure this, it has to be determined whether two corresponding
words are cognates. The traditional definition of cognate words stresses the shared
origin of the words in an older form of the languages, as in this definition from the
Oxford English Dictionary: “Coming naturally from the same root, or representing the
same original word, with differences due to subsequent separate phonetic
development”. For this research, however, a broader definition was used. In the
situation in which a speaker of one language is trying to understand the words of
another language, he or she does not see the etymological history of a word. The only
thing that matters to the reader, is the fact that there is some kind of similarity to the
corresponding word in their own language. Therefore, any two words of which the
stems are related were considered cognates. This of course includes cognates in the
8
This is valid only when the cognates still have the same meaning - as many language learners know, false
friends tend to impair intelligibility more than help it.
45
traditional sense, but it also includes loan words sharing a common source, such as
German Party and its English equivalent which it is derived from, and words such as
information, which occurs in all five languages and has a common source outside the
Germanic family. Words which share a base form but have different affixes were
considered cognates as well, such as Dutch betalen (‘pay’, be- + talen) with German
zahlen (‘pay’, lacking the be- prefix). When a word consists of multiple lexical items,
however, and one of them is not related, the complete words were not considered
cognates. Take, for example, the compounds buitenlands (Dutch) and udenlandsk
(Danish, ‘foreign’, literally roughly ‘out-landish’). The second parts of these words,
lands and landsk, are cognates, but the first parts derive from different root words.
The word pair as a whole is therefore not considered a cognate pair.
When there was doubt about whether or not two words shared the same
origin, etymological dictionaries were used. In addition, because the data consisted of
parallel word lists, only word pairs with the same meaning in both languages were
considered – false friends are no part of this study.
6.1.2 Orthographic Levenshtein distance
Lexical distance calculates the percentage of cognate words in a language pair, but it
says nothing about how similar these cognates are to each other. When two language
varieties have started growing apart a long time ago, sound changes may have
changed both words in a pair beyond recognition, even if they stemmed from the
same root. Levenshtein distance (Heeringa 2004) is a computational way of
measuring the distance between two cognates on a phonetic or orthographic level.9 In
this study, the distance was calculated based on the orthography only, focusing on
written intelligibility.
9
Although it is technically possible to calculate the distance between two non-cognates, it does not make a
lot of sense. Two things need to have some common ground before we can sensibly consider how much
they differ. What is for example the distance between the colour red and a tree? In the same way, it makes
no sense to consider the distance between two non-cognate words, even though the algorithm can be
applied to any word pair. When using a computer, one should never forget to use common sense, because
that is something a computer does not have.
46
The Levenshtein algorithm (Heeringa 2004) calculates the distance between
two strings (orthographic or phonetic transcriptions of words) by counting how
many characters or phonetic segments minimally need to be changed in order to get
from one string to the other. These changes can be insertion (adding a character),
deletion (removing a character) or substitution (changing one character into
another). The number of changes is then divided by the length of the total alignment,
in order to normalize the calculation over words of different lengths. Without this
normalization, longer words would contribute to the total average more than shorter
words would, as longer words consist of more items. An example of what such a
calculation could look like can be found in (2) for the English word long and its Dutch
counterpart lang. There is one substitution (a for o) in a total length of four
characters, resulting in a Levenshtein distance of .25 for this word pair.
(2)
EN
l
o
n
g
DU
l
a
n
g
0
1
0
0
1/4 = .25
As can be seen here, the two words are aligned to each other so that each
character in one word corresponds to one character in the other word. For this
combination, the alignment is straightforward, but this is not the case for all words.
For example, in many cases the two corresponding words are of different length, and
therefore some characters have to be aligned to an empty position. Putting these
empty positions at the end of the shortest word, thus aligning from left to right, does
not always give the desired results. Take for example the word pair of English word
and Dutch woord in (3):
(3)
EN
w
o
r
d
DU
w
o
o
r
d
0
0
1
1
1
3/5 = .60
For a naïve reader, it is obvious that the r in the English word should
correspond to the r in the Dutch word, and idem ditto for the d’s. In the alignment
shown here, however, this is not the case, resulting in three changes being needed
(two substitutions and an insertion) to change word into woord. With a more
47
fortunate and sensible alignment, this can be reduced to only one change (see (4a)
below). The algorithm takes this into account: it aligns the words in such a way that
letters representing similar sounds are mapped onto each other (e.g. consonants to
consonants and vowels to vowels) and the least possible changes are needed to get
from one word to the other. Some examples of this process are shown in (4).
(4)
(a)
(b)
EN
w
o
DU
w
o
0
w
DU
SW
(c)
(d)
r
d
o
r
d
0
1
0
0
o
o
r
d
r
d
0
o
1
0
1
0
EN
w
o
r
d
GE
w
o
r
t
0
0
0
1
o
r
d
w
o
r
t
1
0
0
1
SW
GE
1/5 = .20
2/5 = .40
1/4 = .25
2/4 = .50
The Levenshtein distances are calculated for each cognate word pair in a
language combination and then averaged to get a distance for the language pair as a
whole. With five languages, this results in ten distances, as the Levenshtein distance is
a symmetrical distance measure.10
6.1.3 Measuring conditional entropy
Moberg et al. (2007) explored the possibility of using conditional entropy as a way to
measure language similarity. Entropy is a way to measure the regularity or
predictability present in the correspondences between two language varieties. As
such, it is not a measure of distance per se – it does not measure how similar the two
10
The Levenshtein distance can be adapted to be asymmetrical, by for example giving different weights to
different replacements instead of the binary 1 and 0, or by taking other translations of a word into account
than the pair included in the word list, as in Heeringa et al. (2013). In its basic form as used here, however,
it is symmetrical.
48
parts of a correspondence are, but simply how predictable the correspondence is in a
certain language pair. Like Levenshtein distance, it uses the phonetic or orthographic
level of a language to calculate this (in the case of this study, only orthographic). The
major advantage of measuring entropy between two languages is the fact that, unlike
lexical and Levenshtein distance, it is inherently an asymmetrical measure.11
(5)
𝐻(𝑋|𝑌) = −
∑
𝑝(𝑥, 𝑦) log 2 𝑝(𝑥|𝑦)
𝑥 ∈𝑋,𝑦∈𝑌
(Moberg et al. (2007), p. 4)
The formula used to calculate entropy is shown above. H(X|Y) is the entropy of
X given Y, that is, the amount of uncertainty regarding the value of X when the value
of Y is known. In the case of languages, Y is the stimulus language (the value of which
is known, it is the text the participant is reading) and X is the reader’s native language
(the value of which is unknown: the reader is trying to guess which values in his
native language correspond to what he is reading in language Y). p(x,y) is the chance
that a certain combination of x and y occurs and p(x|y) is the chance of the occurrence
of x in the case of y. The units of x and y can be anything, but in the case of this study,
they represent letters, or combinations of letters.12 The use of this formula will be
illustrated by the data set in Table 4, consisting of three word pairs. English is added
merely for reference; this illustration focuses on Dutch and German only, and
specifically, on the initial consonants of each word.
Table 4: A (tiny) corpus of three words in three languages.
German
Dutch
English
tot
dood
dead
dünn
dun
thin
zu
toe
to
11
The Levenshtein distance can be asymmetrical, see footnote 10. It is not necessarily asymmetrical,
however, unlike conditional entropy. The implementation of the Levenshtein distance used in this study is
symmetrical.
12
In phonetic entropies, as Moberg et al. (2007) uses, they represent phonemes.
49
Consider first a native speaker of Dutch reading these German words. He
encounters three different initial consonants: t, d, and z (see Table 5). Each t he
encounters corresponds to a d in his own language, each d corresponds to a d and
each z corresponds to a t. All correspondences are thus absolute and completely
predictable. To put it in the terms of the formula above: p(d|t) is 1 (each t
corresponds to a d), the log2 of it is therefore 0 and the whole product is 0. The total
entropy for a Dutch reader of German is thus 0 (that is, H(Dutch|German) = 0).
The other way around, however, paints a different picture. A German reader of
Dutch sees only two different letters: d and t (see Table 6). The d occurs twice: once it
corresponds with a t in the reader’s native language, and once with a d. The t
corresponds in each case to a z. p(t|d) is then .5 (when a d occurs, there is a 50%
chance that it corresponds to a t), the log2 of which is -1. p(t,d) is .33 (the
correspondence is present in one out of three word pairs, i.e. 33% of the data), the
entropy for this correspondence then being -(.33 * -1) = .33. The entropy of the
correspondence of d and d is calculated in the same way, resulting in a total entropy
for a German reader of Dutch of 0.67 (H(German|Dutch) = 0.67).
Table 5: The entropy of German for a native speaker of Dutch.
German
Dutch
Correspondence
t
d
1:1
d
d
1:1
z
t
1:1
Entropy German → Dutch: 0
Table 6: The entropy of Dutch for a native speaker of German.
Dutch
d
t
German
t
Correspondence
1:2
d
z
1:1
Entropy Dutch → German: 0.67
50
We can see here, then, that H(Dutch|German) is not the same as
H(German|Dutch). Therefore, we have the asymmetry that we were looking for. Note
that for the calculation of entropy, it does not matter whether the two sounds or
graphemes of the correspondence are the same or in any way similar. The only thing
that matters is the regularity of the correspondence.
The entropy was calculated in this way for each language pair in both
directions, for five languages resulting in 20 measures. According to Moberg et al.
(2007), at least 800 words are needed to reach stable entropy measures, but
calculations based on less words already show the relative differences among
language pairs accurately. This is illustrated in Figure 4: the entropy (vertical axis)
stabilizes when calculated for around 800 words (horizontal axis), but even before
that, the distance between both entropies is constant. Although the word lists used in
this study consisted of over 1500 words, this number includes the non-cognates.
Grapheme correspondences, however, were calculated only for cognate words, as was
the case with the Levenshtein distance. The number of cognate words in each
language combination is shown in Table 7. Some of these numbers are below 800,
especially the combinations involving English are problematic. Figure 4 shows that
even with these numbers, asymmetry can be reliably found. Some caution should be
taken, however, when interpreting the results for the language pairs involving
English.
51
Figure 4: Entropy calculations for Danish and Swedish (vertical axis) using various word list sizes
(horizontal axis). Moberg et al. 2007:7.
Table 7: The number of cognates for each language combination in the word list of 1510 words.
Danish and Dutch
Number of
cognates (abs.)
806
Danish and English
632
Danish and German
824
Danish and Swedish
1175
Dutch and English
687
Dutch and German
961
Dutch and Swedish
784
English and German
601
English and Swedish
632
German and Swedish
829
Language pair
52
Entropy can be calculated for correspondences of one character each, as in the
example above, but it is also possible to use larger units of language. All languages in
this study have certain combinations of characters in their orthography that occur
often and possibly correspond to a specific combination in the other languages
(bigrams or trigrams). Examples of this for English are sh, ng or th. In this study, the
entropy for each language pair was calculated three times: taking one letter, two
letters and three letters as a unit. When the units consist of two (bigrams) or three
letters (trigrams), the algorithm can take correspondences of unequal lengths (such
as Dutch oe with German u, or English th with Dutch and German d in Table 4) into
account by making for example a bigram consisting of one letter and ‘nothing’.
In addition to analyzing each of these three measures individually, the sum of
the three results for each language pair was included as well, combining the results of
the unigram, bigram and trigram entropy. All four of these entropy results are
presented in Chapter 7.
6.2 Measuring intelligibility
In order to measure intelligibility, preliminary results from a written cloze test and a
written word translation task carried out by Femke Swarte in the Micrela project (see
section 2.4) were used. The tests are available on-line at www.micrela.nl. They
included all language combinations in both directions. As there were five languages
(English, Dutch, German, Danish, Swedish), there were twenty combinations.
6.2.1 Participants
The experiments were carried out on-line and presented as a game. The participants
were therefore not paid for their participation. Only people who spoke one of the five
languages in the project natively (Danish, Dutch, English, German or Swedish) could
participate. Each participant was randomly assigned one of the tasks (picture task
(not included in this thesis), cloze test or word translation task) in either the written
or the spoken form (only the written tasks are included in this thesis) in one of the
53
four languages that was not their native language. From the 18,108 people who
participated in the experiments, the criteria described below were used to arrive at
the final participant group of 2,976 people.
All participants had only one native language and no languages other than this
language were spoken in their homes when growing up. They grew up in one of the
countries included in the project (see Chapter 4), corresponding to their native
language. They had not previously learned the language they were tested in, with the
exception of German and English, which is widely taught in schools throughout the
Germanic language area. For German, participants who had learned German for more
than seven years (meaning they must have spent time learning German outside of
school) were excluded, and for English none were excluded. For more information on
the selection of the participants and the procedure of the experiments, see Swarte (in
preparation).
In this thesis, the results from two of the six tasks are included: the written
cloze test and the written word translation task. The final number of participants of
the written cloze test is 528 and the number of participants of the written word
translation task is 495. A breakdown of the number of participants per language
combination can be found in Table 8 for the cloze test and Table 9 for the word
translation task.
54
Table 8: Number of participants for each language combination of the written cloze test.
Stimulus language
Reader's native language
Danish
Dutch
English
German
Swedish
Total
Danish
0
39
25
27
15
106
Dutch
22
0
27
26
15
90
English
30
36
0
41
15
122
German
34
32
18
0
15
99
Swedish
30
36
25
20
0
111
Total
116
143
95
114
60
528
Table 9: Number of participants for each language combination of the written word translation
task.
Stimulus language
Reader's native language
Danish
Dutch
English
German
Swedish
Total
Danish
0
36
16
21
15
88
Dutch
25
0
27
19
15
86
English
31
38
0
35
15
119
German
26
29
25
0
16
96
Swedish
15
31
30
30
0
106
Total
97
134
98
105
61
495
55
6.2.1 Cloze test
In the written cloze test, the participants read a text which contained twelve gaps.
They had a list of the twelve words belonging in these gaps in the target language
with a translation into their own language: four adjectives, four nouns and four verbs.
Their task was to fill the gaps with the words given. In order to do this, they need to
be able to understand the text up to a certain point in order to know which words to
fill in.
The texts were taken from the Cambridge English Preliminary English Test
(PET) and translated into the other languages by three translators (one translating,
two checking). They are about 200 words and 16-17 sentences long. The topics are
everyday things: catching a cold, riding a bike, driving in winter, child athletes.
6.2.2 Word translation task
For the word translation task, participants were presented with single isolated nouns
and were required to translate them. They were encouraged to provide an answer
even if they had no idea what the word could possibly mean. The words used in this
task were the words from the word list used in the Micrela project (Heeringa et al.
2013, see also section 2.4). This list consists of the 100 most frequent nouns in the
British National Corpus (BNC) and their translations into the four other languages,
again by three translators, one translating and two checking. Each participant was
randomly assigned 50 words from this list. With this word translation task, the
participants cannot use context to derive the meaning of a word: they only have that
single word. Therefore, the influence of linguistic factors should be more clearly
present.
56
7. Results
In this chapter, the results and statistical analyses of the data are presented. In the
first section, the results of the three linguistic measures (lexical distance, Levenshtein
distance and entropy) are described. In the second section, their contribution to
intelligibility, measured by two different tests (a cloze test and a word translation
task), will be investigated. All correlation coefficients are Pearson’s R and all
significances are calculated using the Mantel test (Mantel 1967).
7.1 Linguistic measures
7.1.1 Lexical distance
Table 10 shows the results of the lexical distance measurements. The numbers
represent the percentage of non-cognates in each language pair. As mentioned before,
lexical distance in this study is a symmetrical measure, therefore the distance
between a language pair is the same in both directions.
There are a few things to note about this table. Firstly, the distance between
Swedish and Danish is by far the smallest, at 22%. The distance between Dutch and
German is quite small as well, with 36%. English is the most distant from all the other
languages, with percentages of over 50. This is not surprising, as English has
borrowed many words from Romance languages over the course of its history (Van
Gelderen 2007, see Chapter 4 of this thesis).
In Figure 5, a graphical representation of the distances is shown. The darker a
line is, the closer the two languages it connects are together. The separation of
English from the other languages is clearly visible, as well as the two clusters formed
by Danish/Swedish and Dutch/German. These groupings correspond to what would
be expected from the history of these languages as described in Chapter 4.
57
Table 10: Lexical distances between the languages pairs. Each number corresponds to the
percentage of non-cognates between the two languages; thus the higher the number, the
greater the distance. As this distance is symmetrical, half of the table is greyed out.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
X
X
X
X
Dutch
47
X
X
X
X
English
58
55
X
X
X
German
45
36
60
X
X
Swedish
22
48
58
45
X
Figure 5: Graphical representation of the lexical distances in Table
10. The darker the line between two languages, the lower the
lexical distance is between them.
58
7.1.2 Orthographic Levenshtein distance
The orthographic Levenshtein distances were calculated for every cognate word pair
in each language combination. Table 7 shows the number of cognates in each
language pair, to show how many words each distance calculation was based on. Note
that this is essentially the reversed lexical distance (as lexical distance consists of the
percentage of non-cognates). The Levenshtein distances are shown in Table 11 and a
graphical representation can be found in Figure 6. Like the lexical distance, the
Levenshtein distance is symmetrical and every language combination is represented
only once.
In Figure 6, the same clusters show up as we saw with the lexical distance:
Swedish and Danish are closest together, followed by a cluster of Dutch and German.
English is again the most distant from the others, although its separation is less
pronounced than with the lexical distance: there are similar distances between
Dutch/German on the one hand and Swedish/Danish on the other.
59
Table 11: Orthographic Levenshtein distances between the language pairs. The higher the
number, the greater the distance. For an explanation of how these distances were calculated,
see Chapter 6. As these distances are symmetrical, half of the table is greyed out.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
X
X
X
X
Dutch
35
X
X
X
X
English
34
35
X
X
X
German
34
31
36
X
X
Swedish
23
36
35
36
X
Figure 6: Graphical representation of the orthographic Levenshtein
distances in Table 11. The darker the line between two languages,
the lower the lexical distance is between them.
60
7.1.3 Entropy measures
Table 12 shows the entropies based on unigrams. Unlike the lexical and Levenshtein
distances, entropy is inherently asymmetrical. Therefore this table shows 20 different
values. A higher entropy value means that there is more irregularity in the
orthographic correspondences in that language pair. Figure 7 is a graphical
representation of the unigram entropies. For each language pair, the entropy in one
direction is plotted on the horizontal axis and the entropy in the other direction is
plotted on the vertical axis. The line marks where the entropies in both directions are
the same, and there is no asymmetry. The further a language pair is located from the
line, the higher the asymmetry in that pair. Table 13 and Figure 8 show the bigram
entropies and Table 14 and Figure 9 show the trigram entropies.
The entropy data clearly show the close relationship between Swedish and
Danish. In every case, the entropies between Swedish and Danish in both directions
are the two lowest entropies, often with a clear gap between these two and the next
lowest value. Only in the case of trigrams, the entropy for English given Swedish is
very close (0.34 where H(Swedish|Danish) is 0.32), but this pair is very asymmetrical:
the entropy in the other direction is much higher (H(Swedish|English) = 0.42).
In all cases, the entropy of Swedish given Danish (that is, with Danish as
stimulus language) is higher than the entropy of Danish given Swedish (with Swedish
as stimulus language). The difference is exceptional relative to the asymmetries
among the other language pairs, however. For the bigrams, the asymmetry is only
0.01 - the lowest of all pairs. For unigrams, the largest asymmetries are in DanishGerman (0.10) and Danish-Dutch (0.09). For bigrams, the largest asymmetries are
English-Swedish (0.19) and English-Danish (0.17). For trigrams, the largest
asymmetries are English-Swedish (0.08) and English-German (0.06).
61
Figure 7: Unigram entropies for each language pair in both directions plotted against each
other. The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical.
Table 12: Entropy for each language pair based on unigrams. These measures are not
symmetrical. A higher number (marked by lighter colours) means a greater distance.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
1.09
1.06
1.22
0.89
Dutch
1.00
X
0.99
0.99
0.99
English
1.02
1.06
X
1.14
1.05
German
1.12
1.03
1.16
X
1.13
Swedish
0.86
0.99
1.12
1.15
X
62
Figure 8: Bigram entropies for each language pair in both directions plotted against each other.
The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical.
Table 13: Entropy for each language pair based on bigrams. These measures are not
symmetrical. A higher number (marked by lighter colours) means a greater distance.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
0.74
0.74
0.83
0.64
Dutch
0.81
X
0.82
0.82
0.79
English
0.91
0.89
X
0.92
0.94
German
0.82
0.80
0.80
X
0.86
Swedish
0.63
0.72
0.75
0.85
X
63
Figure 9: Trigram entropies for each language pair in both directions plotted against each other.
The line is where Entropy (A|B) = Entropy (B|A), i.e. where a pair is symmetrical.
Table 14: Entropy for each language pair based on trigrams. These measures are not
symmetrical. A higher number (marked by lighter colours) means a greater distance.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
0.37
0.35
0.37
0.32
Dutch
0.41
X
0.40
0.41
0.40
English
0.40
0.42
X
0.43
0.42
German
0.39
0.39
0.37
X
0.43
Swedish
0.28
0.36
0.34
0.38
X
64
Table 15 shows the correlations of the results for unigrams, bigrams and
trigrams. All three measures correlate significantly with the two others. The
correlation between the bigrams and trigrams is very high (r = .87), suggesting a high
overlap between these two measures. The correlations of unigrams with each of the
others are lower (r = .54 with bigrams and r = .41 with trigrams). It can be concluded
that the unigrams are the most distinct of the three measures.
Table 15: Correlations of the different entropies.
r
p
n
Entropy 1-gram – entropy 2-gram
.54
< .01
20
Entropy 1-gram – entropy 3-gram
.41
< .05
20
Entropy 2-gram – entropy 3-gram
.87
< .001
20
Table 16 shows the sum of the unigram, bigram and trigram entropies for each
language pair. As expected from the distributions of the separate entropy measures,
the entropies for Danish and Swedish in both directions are clearly the lowest. The
entropies with English as the stimulus language are consistently high. Figure 10 is a
representation of the summed entropies in the way lexical and Levenshtein distance
were shown in the previous sections (see Figure 5 and Figure 6). As asymmetry
cannot be shown in this figure, it is created using the averages of the two directions
for each language combination. For example, the distance between English and Dutch
was calculated as the average of H(English|Dutch), 2.36 and H(Dutch|English), 2.20:
2.28.13 Like with the previous two linguistic measures, this graph again shows a close
relationship between Danish and Swedish: the orthographic correspondences
between these languages are relatively regular. The other patterns from the previous
two measures are not present, however. The closest combination after Danish and
Swedish is Dutch and Swedish, a pairing that did not show up in the lexical distance
and Levenshtein distance results. In addition, German appears to be the most
separate from the other languages, where with the lexical and Levenshtein distances,
this was English.
13
Remember also that what entropy measures (and thus what this graph shows) is not actual distance, but
rather predictability or regularity.
65
Table 16: Sum of the entropies calculated using unigrams (Table 12), bigrams (Table 13) and
trigrams (Table 14). These measures are not symmetrical. A higher number (marked by lighter
colours) means a greater distance.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
2.21
2.15
2.42
1.85
Dutch
2.21
X
2.20
2.22
2.18
English
2.33
2.36
X
2.49
2.41
German
2.33
2.22
2.34
X
2.42
Swedish
1.78
2.07
2.21
2.37
X
Figure 10: Graphical representation of the sum of the entropy
unigrams, bigrams and trigrams within each language pair. The
numbers in Table 16 for both directions of each pair were averaged.
A darker line means lower entropy among that pair.
66
Table 17 shows the asymmetry within each language pair for the summed
entropies. The higher the number, the higher the asymmetry. A negative number
means that the entropy for that direction of the language combination is lower than
the entropy in the other direction. There is a little asymmetry between Danish and
Swedish (0.07, where the entropy for a Danish person reading Swedish is lower than
for a Swedish person reading Danish), but it is one of the lowest asymmetries in the
table. (Note, however, that there are two pairs with no asymmetry at all (DutchGerman and Dutch-Danish)). The highest asymmetries are the language combinations
involving English, where it is always the case that the entropies for pairs with English
as the stimulus language are higher than those with English as the reader’s native
language. This would suggest that written English is intrinsically harder to
understand for speakers of the other four languages than vice versa.
Table 17: The asymmetry in each language pair, based on the sum of the entropies as presented
in Table 16. The number represents the difference between the entropies of both directions.
E.g., H(English|Danish) = 2.33 and H(Danish|English) = 2.15. The difference is 0.18 or -0.18. A
negative number in the table means that the entropy in that direction is lower than the entropy
in the other direction.
Stimulus language
Reader language
Danish
Dutch
English
German
Swedish
Danish
X
0
-0.18
0.09
0.07
Dutch
0
X
-0.16
0
0.11
English
0.18
0.16
X
0.15
0.20
German
-0.09
0
-0.15
X
0.05
Swedish
-0.07
-0.11
-0.20
-0.05
X
67
7.1.4 Correlations of the different linguistic measures
For the linguistic measures presented above, correlations were calculated. The
results of this can be seen in Table 18. The lexical distances (section 7.1.1) and the
Levenshtein distances (section 7.1.2) correlate with each other very highly,
Pearson’s R = .92 (p < .01). This can also be seen in Figure 11: the higher the lexical
distance, the higher the Levenshtein distance. Note that, although they are both
measures of distance, they need not necessarily correlate, as they measure two
different things: lexical distance determines the proportion of cognates in a language
pair, whereas Levenshtein distance measures how similar these cognates are. It is
possible for two languages to have many, but very different cognates; or few
cognates, but when two words are in fact cognate, they're very similar. That is not the
case for these data, however: language pairs that share few cognates (i.e. have a high
lexical distance), differ greatly in these cognates (i.e. have a high Levenshtein
distance).
Table 18: Correlations among the three linguistic measures: lexical distance with orthographic
Levenshtein distance, lexical distance with the entropy measures and Levenshtein distance with
the entropy measures. For the lexical-Levenshtein comparison, half the matrices are used, as
these are symmetrical. For comparisons with entropy, the full matrices are used (each language
pair in both directions), where for the lexical and Levenshtein distances, the values for each
language pair in the two directions is the same.
r
p
n
Lexical – Levenshtein
.85
< .01
10
Lexical – entropy 1-gram
.59
< .01
20
Lexical – entropy 2-gram
.65
< .01
20
Lexical – entropy 3-gram
.54
< .05
20
Levenshtein – entropy 1-gram
.71
< .001
20
Levenshtein – entropy 2-gram
.67
< .01
20
Levenshtein – entropy 3-gram
.69
< .01
20
68
The correlations of both lexical distance and Levenshtein distance with the
different entropy measures are lower (ranging from r = .54 to r = .71), though all of
them are still significant. Entropy, again, measures something completely different
from the other two: It does not measure how similar or different two languages are,
but how regular these similarities and differences are. A language pair can have a low
lexical and Levenshtein distance yet a high entropy, or vice versa. These data show
that both distance measures do correlate significantly with entropy, but they do not
completely overlap with each other. Entropy is a distinct measure and can very well
add to explaining intelligibility.
Figure 11: Scatter plot of the correlation between lexical distance and orthographic Levenshtein
distance.
69
For the Micrela project (see section 2.4), Heeringa et al. (2013) calculated the
lexical distance and Levenshtein distance similarly to this study, but using a list of
100 nouns, instead of the 1500 words of all word classes used here. Table 19 shows
the correlations of these distances with the same distances calculated using the data
for this study, and the correlations of Heeringa et al.’s distances with the entropy
measures from this study. The correlations of the lexical and Levenshtein distances
are again very high (r = .90) and clearly significant, suggesting that calculating these
distances based on 100 or 1500 words does not make a big difference. The
correlations of Heeringa et al.’s distances with the entropy from this thesis are much
lower, though all but one (lexical distance with entropy trigrams) are still significant.
This again shows that although entropy correlates with the other linguistic measures,
it is still very distinct, and can contribute to explaining intelligibility in addition to
lexical and Levenshtein distance.
Table 19: Correlations between the three linguistic measures calculated in this study, based on
a list of about 1500 words, and the lexical and Levenshtein distances calculated by Heeringa et
al. (2013), based on a list of 100 words.
r
p
n
Lexical (1500 vs. 100 words)
.90
< .001
20
Levenshtein (1500 vs. 100 words)
.90
< .001
20
Lexical (100) – entropy 1-gram (1500)
.42
< .05
20
Lexical (100) – entropy 2-gram (1500)
.42
< .05
20
Lexical (100) – entropy 3-gram (1500)
.28
.12
20
Levenshtein (100) – entropy 1-gram (1500)
.66
< .01
20
Levenshtein (100) – entropy 2-gram (1500)
.59
< .01
20
Levenshtein (100) – entropy 3-gram (1500)
.68
< .01
20
70
7.2 Using linguistic measures to predict intelligibility
Two tests were used to measure intelligibility: a cloze test and a word translation
task, carried out by Femke Swarte as part of the Micrela project. These experiments
were described in section 6.2.
Figure 12: Correlation of the lexical distance with the written cloze test with all languages
included. The cases with English as a stimulus language (marked in red) are outliers.
For the following analyses, all cases including English were removed from the
data. The reason for this is the fact that when English was the stimulus language,
participants performed at ceiling level (see Figure 12). The cause for this is not likely
to lie in linguistic factors, but in the position that English currently has of a lingua
franca in Europe. The participants have had so much exposure to and experience with
71
English, that they can understand it nearly perfectly. This will cancel out any possible
effect of linguistic factors. There is no correlation between lexical distance and
written intelligibility in the data shown in Figure 12. Because the Mantel test, which is
used to calculate the significances, compares matrices rather than individual
numbers, we cannot remove only these four data points: English needs to be excluded
completely. Figure 13 shows the data from Figure 12 excluding English. The
correlation is highly significant (Pearson’s R = -.87, p < .001). When excluding English
from the analysis, four languages are left (Danish, Dutch, German and Swedish). This
means there are 12 data points to be correlated (each language combined with the
other three languages, in both directions).
Figure 13: Correlation of the lexical distance with the written cloze test excluding the cases with
English as the stimulus language or as the reader’s language.
72
7.2.1 Cloze test
Table 20 shows the correlations of the three linguistic measures with the written
cloze test. Having excluded the English cases, there are 12 cases left. The correlations
calculated based on these 12 cases are very high, especially considering lexical (-.87)
and orthographic distance (-.85). The correlations with the different entropy
measures are lower (-.61, -.59, and -.56), but still considerable and significant. A
multiple regression analysis including the three entropy measures yields R = .67
(R2 = .45), but the model is not significant (F(3,8) = 2.2, p = .17). The contribution of
unigrams is the largest in this model, followed by trigrams and bigrams (see Table
21), but none of the predictors are significant.
All correlations are negative, which means that the greater the distance
between the languages is according to the linguistic measure in question, the lower
the intelligibility score for that language pair. The scatter plots in Figure 14, Figure 15
and Figure 16 show a graphical representation of these correlations.
Table 20: Correlations of the three linguistic measurements with the results of the written cloze
test. The n derives from the full matrix (each language combination in both directions) for the
languages Danish, Dutch, German and Swedish.
r
p
n
Lexical distance
-.87
< .001
12
Orthographic Levenshtein distance
-.85
< .001
12
Orthographic entropy 1-gram
-.61
< .05
12
Orthographic entropy 2-gram
-.59
< .05
12
Orthographic entropy 3-gram
-.56
< .05
12
73
Table 21: Properties of the three entropy measures as predictors in a multiple regression
analysis with the written cloze test as the dependent variable. Model statistics: R2 = .45,
F(3,8) = 2.21, p = .165.
β
t(8)
p
Unigrams
-.586
-1.22
.259
Bigrams
.339
0.41
.695
Trigrams
-.527
-0.86
.417
Figure 14: Correlation of the lexical distance with the written cloze test excluding English.
74
Figure 15: Scatter plot of the Levenshtein distance with the results of the written cloze test.
75
Figure 16: Scatter plot of the entropy measures with the results of the written cloze test.
Multiple linear regression analysis was used to find the best possible
combination of predictors. The model that best predicted the results of the cloze test
was a model with only one predictor: lexical distance (β = -.87, p < .001; R2 = .76,
p < .001). This model is only slightly better than the model including only Levenshtein
distance as a predictor (β = -.85, p < .001; R2 = .72, p < .001). In any model containing
more than one predictor, all predictors were not significant. The entropy measures do
not significantly add anything in combination with the lexical or Levenshtein
distances.
76
7.2.2 Word translation task
Table 22 shows the correlations between the results of the word translation task and
the linguistic measures described in section 6.1. The pattern is similar to the results
of the correlations with the cloze test, though more extreme: lexical distance has the
highest correlation (-.85, p < .01) followed by the Levenshtein distance (-.82, p < .01)
and the correlations with entropy are the lowest and least significant (-.62, -.56, -.56,
p < .05 in all cases). A multiple regression analysis including the three entropy
measures yields R = .71 (R2 = .50), but the model is not significant (F(3,8) = 2.7, p =
.12). The contribution of trigrams is the largest in this model, followed by unigrams
and bigrams (see Table 23), but none of the predictors are significant.
All correlations are negative, which means that the greater the distance
between the languages is according to the linguistic measure in question, the lower
the intelligibility score for that language pair. All these results show the same pattern
as the results of the correlations with the written cloze test (see section 7.2.1).
The scatter plots showing the correlations graphically can be found in Figure
17, Figure 18 and Figure 19. It is clear from these graphs, too, that the correlation
with lexical distance is the strongest (Figure 17) and that with entropy the weakest
(Figure 19).
A multiple regression analysis showed that the model with only lexical
distance as predictor variable was the best model to predict the results of the word
translation task (β = -.91, p < .001, R2 = .83). As was the case with the results of the
cloze test, the entropy measures do not significantly add predictive power in
combination with the lexical or Levenshtein distances.
77
Table 22: Correlations of the three linguistic measures with the written word translation task.
The n derives from the full matrix (each language combination in both directions) for the
languages Danish, Dutch, German and Swedish.
r
p
n
Lexical distance
-.85
< .01
12
Orthographic Levenshtein distance
-.82
< .01
12
Orthographic entropy 1-gram
-.62
< .05
12
Orthographic entropy 2-gram
-.56
< .05
12
Orthographic entropy 3-gram
-.56
< .05
12
Table 23: Properties of the three entropy measures as predictors in a multiple regression
analysis with the written word translation task as the dependent variable. Model statistics:
R2 = .50, F(3,8) = 2.71, p = .115.
β
t(8)
p
Unigrams
-.760
-1.66
.137
Bigrams
.723
.91
.389
Trigrams
-.774
-1.32
.223
78
Figure 17: Scatter plot of the lexical distance with the results of the written intelligibility task.
79
Figure 18: Scatter plot of the Levenshtein distance with the results of the written intelligibility
task.
80
Figure 19: Scatter plot of the orthographic entropy measures with the results of the written
intelligibility task.
81
8. Discussion
8.1 Linguistic measures
The results of the lexical distance and Levenshtein distance are largely as was
expected. They both show that Danish and Swedish are closest together, followed by
German and Dutch. The results for the lexical distance also clearly single out English
as the most distant from all the others, as is expected from the history of this language
(see Chapter 4).
The results of the entropy calculations, too, clearly put Swedish and Danish
closer together than all other language combinations. The other relations expected
based on the background of the languages, however, do not show. A reason for this
could be that the entropy calculated here is based on the orthographic forms of the
words. As described in Chapter 4, the standard orthographies for these language were
established only in the last few centuries, when the languages were already clearly
separated from each other. These standards developed largely independently from
those of the other languages. Differences and similarities in orthography, then, need
not reflect differences and similarities in the languages themselves. For example,
German and Dutch share many cognates that both have the sound /u/ (Buch - boek
‘book’, suchen - zoeken ‘to search’, tun - doen ‘to do’). They each have a different way
to spell it, however: in German, this sound is represented by <u>, whereas in Dutch it
is the digraph <oe>. This suggests a difference that is not in fact there. In the
orthographies of Swedish and Danish, a conscious effort was made to make them
similar to each other, in order to preserve Scandinavian unity (Vikør 2001). It is not
surprising, then, that the entropy between them is the lowest, in both directions. The
highest average entropies exist between German and other languages. This suggests
that the correspondences between written German and the other languages are the
most irregular.14
14
This is not due to the German practice to capitalize nouns, as capitalization was ignored for the entropy
calculations.
82
A clear asymmetry between Danish and Swedish, which is often reported in
intelligibility research (Delsing and Lundin Åkesson 2005; Gooskens et al. 2010;
Schüppert 2011; Gooskens and Van Bezooijen 2013), was not found by these entropy
measurements. This is against Moberg et al.’s (2007) findings. It is possible that this is
due to the fact that the entropy calculations in the present study were only on the
orthographic level. The asymmetry in intelligibility has been established more
strongly in spoken language than in written language. As mentioned in Chapter 4, the
orthography of Danish is very different from its present day pronunciation, which is
not the case for Swedish. It is possible that the entropy between the spoken versions
of these languages, which will be calculated in future research, will show the
asymmetry that is expected.
The greatest amount of asymmetry was found in all combinations involving
English: the asymmetry in the sum of the entropies for combinations with English
ranged from .15 to .20, whereas the highest asymmetry in the other combinations
was .11. In all of these cases, the entropies for pairs with English as the stimulus
language were higher than those with English as the reader’s native language. This
means that someone reading English encounters more unpredictability than an
English-speaking person reading one of the other languages. This could be due to the
relative irregularity of English spelling, which was described in Chapter 4. It should
also be repeated here that the percentage of cognates between English and the other
languages was relatively low (see Table 7), and for all of them lower than the
minimum of 800 words that Moberg et al. (2007) established for reliable entropy
measurements. The calculations should be redone with an extended version of the
word list, which does contain enough cognates for the combinations with English as
well.
83
8.2 Research questions
The first research question of this thesis is: Are orthographic entropy measures a
useful predictor of written intelligibility in addition to Levenshtein distance? Significant
correlations were found for all entropy measures with the results of the two
intelligibility experiments. The correlations with the cloze test ranged from r = -.56 to
r = -.61 and those with the word translation task ranged from r = -.56 to r = -.62. This
shows that the amount of entropy (irregularity) between languages can predict the
level of intelligibility to some extent: when the entropy from one language to another
is high, the intelligibility between the two language will be low. A regression analysis
combining all three entropy measures as predictors did yield this relation but it was
not significant for both intelligibility tests. Unigrams (in the cloze test) and trigrams
(in the word translation task) contributed the most to the prediction. This might be
due to the fact that unigrams and trigrams have the lowest correlation with each
other. Bigrams overlap with both of the others.
Although all correlations with the entropy were significant at the .05 level, the
correlations with lexical distance and Levenshtein distance were much higher
(between -.82 and -.87). In a linear regression analysis, the best fitting model included
lexical distance as the only predictor for both the cloze test and the word translation
task. This suggests that, for these intelligibility experiments, the entropy measures do
not add explanatory power to the lexical and Levenshtein measures of linguistic
distance. There are several possible causes for this.
The main contribution of entropy to predicting intelligibility, in theory, is its
inherent ability to register asymmetry between two language pairs. Especially for
Danish and Swedish, asymmetry has been repeatedly shown, and Gooskens, Van
Bezooijen and Van Heuven (accepted) showed this asymmetry for Dutch and German
too. The only asymmetry found in these data, however, was that between English and
the other languages. As mentioned above, a probable cause for this is the fact that
these calculations were based on the written versions of the languages, whereas the
asymmetry usually surfaces only in the spoken intelligibility (see e.g. Schüppert 2011
for Danish and Swedish).
84
The asymmetry that did surface in the entropy measurements, namely that
involving English, could not be correlated to the intelligibility data. The reason for
this is that the participants performed at ceiling level for English. Apparently,
northern and western Europeans are exposed to English so much, in school and in
their daily life, that they can understand the language well enough to perform these
tasks nearly perfectly. This makes it impossible to show whether there is any
asymmetry present in the level of intelligibility that is due to linguistic factors, such as
irregular orthographic correspondences.
There is another issue in using entropy to predict the intelligibility scores in
this study. Entropy, as opposed to lexical distance and Levenshtein distance, does not
measure actual similarity, but it measures the regularity of the correspondences
between two languages - regardless of how similar these correspondences are. In
order for this regularity to be helpful for intelligibility, the reader or listener needs to
have some experience with the stimulus language. The experiments which this study
draws from, however, were very short: the texts in the cloze test consist of about 200
words, and in the word translation task, each participant had to translate 50 words. If
the participant does not have any previous experience with the stimulus language,
these experiments does not give him enough input to benefit from a low entropy.
The second and third research questions are related to each other. The answer
to the second research question, Can lexical distance accurately predict written
intelligibility?, is clearly ‘yes’. The lexical distance had a very high correlation with
both intelligibility experiments (r = -.87 and r = -.85). A high lexical distance between
two languages means low intelligibility. In the linear regression analyses, the best
model for both intelligibility tasks included lexical distance as the only predictor, with
R2 = .76 for the cloze test and R2 = .83 for the word translation task.
The third research question, Can orthographic Levenshtein distance accurately
predict written intelligibility?, can be answered affirmatively as well. The correlations
of the intelligibility data with the Levenshtein distance were very high: r = -.85 and
r = -.82. However, contrary to the expectations and to results from previous research
(Gooskens 2007a, 2007b; Beijering, Gooskens and Heeringa 2008; Kürschner,
85
Gooskens and Van Bezooijen 2008), they were slightly lower than the correlations
with lexical distance. There are a few differences between these studies and the
present one. Firstly, the lexical distance in this study was calculated based on a list of
1500 words instead of 100 - 300, as in the previous research, but the high correlation
between the lexical distance from this study and that from Heeringa et al. (2013)
reported in section 7.1.4 suggests that this should not make a big difference.
Moreover, the Levenshtein distance was calculated using the same extensive word
list. A more likely explanation lies in the fact that the studies cited above involved
only the Scandinavian languages Danish, Swedish and Norwegian, or dialects of these
languages. The lexical variation among these language varieties is not very high
(Gooskens 2007a), as also exemplified by the very low lexical distance between
Danish and Swedish in this study (see Table 10 or Table 7). In such a group of
languages, lexical distance might not be a very good predictor, as the lexical distances
between all varieties is simply very low. The group of languages in this study,
however, includes West Germanic languages as well as the North Germanic languages
from Scandinavia. More lexical variation is present in this group. This is likely to make
lexical distance a better predictor of intelligibility.
8.3 Future research
First of all, this research should be expanded with spoken intelligibility instead of
written intelligibility and linguistic measures calculated with phonological
transcriptions of the word lists. The results for spoken intelligibility have proven to
be different from those for written intelligibility (see e.g. Schüppert 2011): most
notably, spoken intelligibility between Danish and Swedish is asymmetrical, whereas
written intelligibility is not. The linguistic measures calculated with phonological
transcriptions are likely to be different from those based on orthography as well, as
different languages have different ways of representing the pronunciation in their
orthography. In addition, the orthography is not always a faithful representation of
the pronunciation of a language, especially in the cases of English and Danish (see
Chapter 4). The relations between these languages based on the pronunciation could
86
be very different from those based on the orthography, as were shown in this study.
This is currently already being worked in the Micrela project, using the same word
list used for this thesis.
Secondly, the word list needs to be expanded in order to obtain sufficient
cognates for the combinations involving English, as well. In the current study, these
are between 600 and 700 words, whereas a minimum of 800 is required for reliable
entropy calculations (Moberg et al. 2007).
In addition to lexical and Levenshtein distance and conditional entropy, there
are more ways to measure linguistic relations between languages. Spruit (2006) and
Nerbonne and Wiersma (2006), for example, developed methods to computationally
measure syntactic and morphological distances between languages. Research
investigating how well linguistic factors on these levels (i.e. other than the word
level) can predict intelligibility between languages should be carried out in the future.
A disadvantage of using entropy to predict intelligibility, is the fact that
readers who have never encountered the language they see before, are not helped by
a low entropy. The method of conditional entropy measures whether the
correspondences between two languages are regular. It does not measure, however,
how transparent these correspondences are to the naive reader. If the letter <k> in
language A, for example, corresponds in 100% of the cases to a letter <k> in language
B, this is calculated as zero entropy. If that letter <k>, however, corresponds in 100%
of the cases to the letter <s> in language B, this, too, is calculated as zero entropy.
There is, after all, no irregularity. To a reader with native language B who has never
before encountered language A, however, the first of these two cases will pose no
problems for intelligibility, whereas the second case most likely will. How should he
guess that he needs to replace every <k> with an <s>? After familiarizing himself with
language A, however, he might be able to derive the rule eventually. In this case, the
lower the entropy is (i.e. the lower the irregularity), the easier it will be for the reader
to derive the correspondences between a foreign language and his native language. In
future research, the influence of the amount of entropy on the speed with which a
participant learns to understand a new language should be studied.
87
9. Conclusion
The main research question of this thesis is: Are orthographic entropy measures a
useful predictor of written intelligibility in addition to Levenshtein distance? The
entropy measures correlated significantly with the results of both intelligibility
experiments, but the correlations of both the lexical distance and Levenshtein
distance with the intelligibility data was much higher. In a regression analysis, both
Levenshtein distance and entropy were excluded, leaving lexical distance as the
strongest predictor. This is caused by the lack of asymmetry in the entropy
calculations, which is the strength of the conditional entropy method. The asymmetry
was present only in the language combinations involving English. In the intelligibility
data, however, the items with English had to be excluded due to ceiling effects.
The second and third research questions are connected: Can lexical distance
accurately predict written intelligibility? and Can orthographic Levenshtein distance
accurately predict written intelligibility? The answer to both of these questions is ‘yes’:
the correlations of both Levenshtein distance and lexical distance with the
intelligibility data were very high (-.82 at the lowest). However, contrary to previous
research, the lexical distance was shown to be the better predictor of the two in a
regression analysis. In previous studies (Gooskens 2007a, 2007b; Beijering, Gooskens
and Heeringa 2008; Kürschner, Gooskens and Van Bezooijen 2008), the Levenshtein
distance tended to be the best predictor.
88
10. References
Beijering, Karin, Charlotte Gooskens and Wilbert Heeringa (2008). Predicting
intelligibility and perceived linguistic distances by means of the Levenshtein
algorithm. Linguistics in the Netherlands, 13-24.
Bø, Inge (1978). Ungdom og naboland. Stavanger: Rogalandsforskning (rapport 4).
Börestam, Ulla (1987). Dansk-svensk språkgemenskap på undantag. Uppsala: Uppsala
Universitet.
Chambers, Jack and Peter Trudgill (1980). Dialectology. Cambridge: Cambridge
University Press.
Cheng, Chin-Chuan (1997). Measuring relationship among dialects: DOC and related
sources. Computational Linguistics & Chinese Language Processing, 2(1), 41-72.
Delsing, Lars-Olof and Katarina Lundin Åkesson (2005). Håller språket ihop i Norden?
En forskningsrapport om ungdomars förståelse av danska, svenska och norska.
Copenhagen: Nordiska ministerådet.
Engstrand, Olle (1999). Swedish. Handbook of the International Phonetic Association.
Cambridge: Cambridge University Press, 140-142.
Van Gelderen, Elly (2006). A History of the English Language.
Amsterdam/Philadelphia: John Benjamins Publishing Company.
Goebl, Hans (1982). Dialektometrie; Prinzipen und Methoden des Einsatzes der
numerischen Taxonomie im Bereich der Dialektgeographie. PhilosophischHistorische Klasse Denkschriften, 157. Vienna: Verlag der Österreichischen
Akademie der Wissenschaften. With assistance of W.-D. Rase and H. Pudlatz.
Goebl, Hans (1993). Probleme und Methoden der Dialektometrie: Geolinguistik in
globaler Perspektive. In: W. Viereck (ed.), Proceedings of the International
Congress of Dialectologists, 1. Stuttgart: Franz Steiner Verlag, 37-81.
Gooskens, Charlotte (2007a). The contribution of linguistic factors to the intelligibility
of closely related languages. Journal of Multilingual and Multicultural
Development, 28(6), 445-467.
Gooskens, Charlotte (2007b). Contact, attitude and phonetic distance as predictors of
inter-Scandinavian communication. In: J.-M. Eloy and T. ÓhLfearnáin (eds.),
89
Near languages – Collateral languages. Actes du colloque international réuni à
Limerick, du 16 au 18 juin 2005, 99-109.
Gooskens, Charlotte, Wilbert Heeringa and Karin Beijering (2008). Phonetic and
lexical predictors of intelligibility. International Journal of Humanities and Arts
Computing, 2(1-2), 63-81.
Gooskens, Charlotte and Renée van Bezooijen (2013). Explaining Danish-Swedish
asymmetric word intelligibility – An error analysis. In: C. Gooskens & R. van
Bezooijen (eds.), Phonetics in Europe: Perception and Production. Frankfurt
a.M.: Peter Lang, 59-82.
Gooskens, Charlotte, Renée van Bezooijen and Vincent van Heuven (accepted). Mutual
intelligibility of Dutch-German cognates by children: The devil is in the detail.
Linguistics, 53(2).
Gooskens, Charlotte, Vincent van Heuven, Renée van Bezooijen and Jos Pacilly (2010).
Is spoken Danish less intelligible than Swedish? Speech Communication, 52,
1022-1037.
Gordon, Raymond G., Jr., ed. (2005). Ethnologue: Languages of the World, 15th edn.
Dallas: SIL International. Online version: http://www.ethnologue.com/.
Grønnum, Nina (1998). Danish. Journal of the International Phonetic Association, 28,
99-105.
Gussenhoven, Carlos (1999). Dutch. Handbook of the International Phonetic
Association. Cambridge: Cambridge University Press, 74-77.
Harbert, Wayne (2007). The Germanic languages. Cambridge: Cambridge University
Press.
Haugen, Einar (1953). Nordiske sprakproblemer- en Opinionsundersøkelse. Nordisk
Tidtkrift, 29, 225-249 .
Haugen, Einar (1966). Semicommunication: The language gap in Scandinavia.
Sociological Inquiry, 36, 280-297.
Heeringa, Wilbert (2002). Over de indeling van de Nederlandse streektalen. Een
nieuwe methode getoetst. Driemaandelijkse bladen voor taal en volksleven in
het oosten van Nederland, 54(1-4), 111-148.
Heeringa, Wilbert (2004). Measuring dialect pronunciation differences using
Levenshtein distance. PhD thesis. Groningen: Grodil, 46.
90
Heeringa, Wilbert, Jelena Golubovic, Charlotte Gooskens, Anja Schüppert, Femke
Swarte and Stefanie Voigt (2013). Lexical and orthographic distances between
Germanic, Romance and Slavic languages and their relationship to geographic
distance. In: C. Gooskens & R. van Bezooijen (eds.), Phonetics in Europe:
Perception and Production. Frankfurt a.M.: Peter Lang, 99-137.
Hoppenbrouwers, Cor and Geer Hoppenbrouwers (1988). De
featurefrequentiemethode en de classificatie van Nederlandse dialecten. TABU:
Bulletin voor taalwetenschap, 18(2), 51-92.
Hoppenbrouwers, Cor and Geer Hoppenbrouwers (2001). De indeling van de
Nederlandse streektalen. Dialecten van 156 steden en dorpen geklasseerd
volgens de FFM. Assen: Koninklijke Van Gorcum B.V.
Kessler, Brett (1995). Computational dialectology in Irish Gaelic. In Proceedings of the
7th Conference of the European Chapter of the Association for Computational
Linguistics. Dublin: EACL, 60-67.
Kohler, Klaus (1999). German. Handbook of the International Phonetic Association.
Cambridge: Cambridge University Press, 86-89.
Kürschner, Sebastian, Charlotte Gooskens and Renée van Bezooijen (2008). Linguistic
determinants of the intelligibility of Swedish words among Danes.
International Journal of Humanities and Arts Computing, 2(1-2), 83-100.
Ladefoged, Peter (1999). English, American. Handbook of the International Phonetic
Association. Cambridge: Cambridge University Press, 41-44.
Mantel, Nathan (1967). The detection of disease clustering and a generalized
regression approach. Cancer Research, 27(2), 209–220.
Maurud, Øivind (1976). Nabospråkforståelse i Skandinavia: en undersøkelse om
gjensidig forståelse av tale- og skriftspråk i Danmark, Norge og Sverige.
Stockholm: Nordiska rådet.
Moberg, Jens, Charlotte Gooskens, John Nerbonne and Nathan Vaillette (2007).
Conditional entropy measures intelligibility among related languages. In: P.
Dirix, I. Schuurman, V. Vandeghinste and F. Van Eynde (eds.), Computed
Linguistics in the Netherlands 2006: Selected papers from the 17th CLIN Meeting.
Utrecht: LOT, 51-66.
91
Molewijk, G.C. (1992). Spellingverandering van zin naar onzin (1200–heden). The
Hague: Sdu Uitgeverij Koninginnegracht.
Nerbonne, John and Wybo Wiersma (2006). A Measure of Aggregate Syntactic
Distance. In: J. Nerbonne and E. Hinrichs (eds.), Proceedings of the Workshop on
Linguistic Distances, 82-90.
Scheuringer, Hermann and Christian Stang (2004). Die deutsche Rechtschreibung.
Vienna: Edition Praesens.
Schüppert, Anja (2011). Origin of asymmetry. Mutual intelligibility of spoken Danish
and Swedish. PhD thesis. Groningen: Grodil, 94.
Schüppert, Anja, Nanna Haug Hilton and Charlotte Gooskens (accepted). Swedish is
beautiful, Danish is ugly? Investigating the link between language attitudes and
intelligibility. Linguistics, 53(2).
Séguy, Jean (1973). La dialectométrie dans l’Atlas linguistique de la Gascogne.
In: Revue de linguistique Romane, 37, 1-24.
Spruit, Marco R. (2006). Measuring Syntactic Variation in Dutch Dialects. Literary and
Linguistic Computing, 21(4), 493-506.
Swadesh, Morris (1971). The origin and diversification of language. Chicago: Aldine.
Edited post mortem by Joel Sherzer.
Swarte, Femke (in preparation). Mutual intelligibility in the Germanic Language Area.
PhD thesis. Groningen: Grodil.
Swarte, Femke, Anja Schüppert and Charlotte Gooskens (accepted). Does German help
speakers of Dutch to understand written and spoken Danish words? - The role
of second language knowledge in decoding an unkown but related language.
In: G. De Angelis, U. Jessner and M. Kresic (eds.), Crosslinguistic Influence and
Multilingualism.
Tang, Chaoju and Vincent J. van Heuven (2009). Mutual intelligibility of Chinese
dialects experimentally tested. Lingua, 119(5), 709-732.
Tang, Chaoju and Vincent J. van Heuven (2007). Mutual intelligibility and similarity of
Chinese dialects. In: B. Los and M. van Koppen (eds.), Linguistics in the
Netherlands 2007. Amsterdam: John Benjamins, 223-234.
Vikør, Lars S. (2001). The Nordic Languages: Their Status and Interrelations. Oslo:
Novus forlag (Novus Press).
92
Voegelin, C.F. and Zellig S. Harris (1951). Methods for determining intelligibility
among dialects of natural languages. Proceedings of the American Philosophical
Society, 95(3), 322-329.
Wolff, Hans (1959). Intelligibility and Inter-Ethnic Attitudes. Anthropological
Linguistics, 1(3), Urbanization and standard language: A symposium presented
at the 1958 meetings of the American Anthropological Association, 34-41.
Etymological dictionaries
Duden (2001). Herkunftswörterbuch: Etymologie der deutschen Sprache (3rd edition).
Mannheim/Leipzig/Wien/Zürich: Dudenverlag.
Katler, Jan (2000). Politikens Etymologisk Ordbog. Aalborg: Politikens Forlag.
Norstedts etymologiska ordbok (2008). Nordstedts Akademiska Förlag.
Oxford English Dictionary (OED), online edition. Online at: www.oed.com (retrieved
April-September 2014).
Philippa, Marlies, Frans Debrabandere and Arend Quak (2005). Etymologisch
Woordenboek van het Nederlands, F-Ka. Amsterdam: Amsterdam University
Press.
Philippa, Marlies, Frans Debrabandere Arend Quak, Tanneke Schoonheim and
Nicoline van der Sijs. Etymologisch Woordenboek van het Nederlands, web
edition: www.etymologie.nl. Amsterdam: Amsterdam University Press
(retrieved April-June 2014).
93
Appendix
Appendix A: Excluded Words
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Part of Speech
a
a
a
a
a
a
adv
adv
adv
adv
conj
det
det
det
interjection
interjection
interjection
interjection
modal
modal
modal
modal
modal
modal
modal
modal
modal
modal
modal
n
n
n
94
Word
chief
due
key
labour
major
prime
all
either
off
to
cos
another
either
whatever
no
well
yeah
yes
can
could
may
might
must
ought
shall
should
used
will
would
claim
good
item
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
n
n
prep
prep
pron
pron
pron
pron
pron
pron
pron
pron
v
v
v
v
v
v
provision
rate
down
into
anything
herself
himself
itself
myself
themselves
whom
yourself
face
manage
market
mind
propos
result
95
Appendix B: Word List
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Part of Speech
det
v
prep
conj
det
prep
infinitive-marker
v
pron
prep
prep
pron
conj
pron
pron
prep
prep
v
prep
prep
adv
det
English
the
be
of
and
a
in
to
have
it
to
for
i
that
you
he
on
with
do
at
by
not
this
Dutch
de
zijn
van
en
een
in
te
hebben
het
naar
voor
ik
dat
jij
hij
aan
met
doen
te
bij
niet
dit
German
der
sein
von
und
ein
in
zu
haben
das
zu
für
ich
dass
du
er
an
mit
tun
bei
bei
nicht
dies
96
Danish
den
være
af
og
en
i
at
have
det
til
til
jeg
at
du
han
på
med
gøre
ved
fra
ikke
dette
Swedish
den
vara
av
och
en
i
att
ha
det
till
till
jag
att
du
han
på
med
göra
vid
av
inte
detta
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
conj
prep
pron
det
det
pron
conj
det
conj
pron
det
v
conj
det
v
det
pron
det
v
det
v
pron
prep
adv
adv
v
v
but
from
they
his
that
she
or
which
as
we
an
say
if
their
go
what
there
all
get
her
make
who
as
out
up
see
know
maar
van
zij
zijn
dat
zij
of
welk
als
wij
een
zeggen
of
hun
gaan
wat
daar
al
krijgen
haar
maken
wie
als
uit
op
zien
weten
aber
von
sie
sein
das
sie
oder
welch
als
wir
ein
sagen
wenn
ihr
gehen
was
da
alle
kriegen
ihr
machen
wer
wie
aus
oben
sehen
wissen
97
men
fra
de
hans
den
hun
eller
hvilke
som
vi
en
sige
hvis
deres
gå
hvad
der
alle
få
hendes
lave
hvem
som
ud
op
se
vide
men
från
de
hans
den
hon
eller
vilken
som
vi
en
säga
om
deras
gå
vad
där
alla
få
hennes
göra
vem
såsom
ut
upp
se
veta
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
n
v
pron
det
adv
pron
n
det
adv
v
det
v
conj
adv
prep
adv
a
det
pron
det
a
v
adv
det
n
adv
adv
time
take
them
some
so
him
year
its
then
think
my
come
than
more
about
now
last
your
me
no
other
give
just
these
people
also
well
tijd
nemen
hen
enkele
zo
hem
jaar
zijn
dan
denken
mijn
komen
dan
meer
over
nu
laatst
jouw
mij
geen
ander
geven
slechts
deze
mensen
ook
goed
zeit
nimmen
ihr
einige
so
ihm
jahr
sein
dann
denken
mein
kommen
dann
mehr
über
jetzt
letzt
dein
mir
kein
ander
geben
nur
dies
leute
auch
gut
98
tid
tage
dem
nogen
så
ham
år
dens
så
tænke
min
komme
end
mere
om
nu
sidst
din
mig
ingen
anden
give
bare
disse
mennesker
også
godt
tid
ta
dem
någon
så
honom
år
dess
då
tänka
min
komma
än
mera
om
nu
sista
din
mig
ingen
annan
ge
bara
dessa
människor
också
bra
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
det
adv
a
adv
conj
n
v
prep
v
pron
det
adv
conj
adv
adv
a
v
n
det
v
n
prep
adv
adv
det
det
pron
any
only
new
very
when
way
look
like
use
her
such
how
because
when
as
good
find
man
our
want
day
between
even
there
many
those
one
enig
slechts
nieuw
zeer
als
weg
kijken
zoals
gebruiken
haar
zulk
hoe
omdat
wanneer
als
goed
vinden
man
ons
willen
dag
tussen
zelfs
daar
veel
die
men
jeder
nur
neu
sehr
wenn
weg
schauen
wie
nutzen
ihr
solch
wie
weil
wann
als
gut
finden
mann
unser
willen
tag
zwischen
sogar
da
viel
diejenigen
man
99
nogen
kun
ny
meget
når
vej
se
ligesom
bruge
hende
sådan
hvordan
fordi
hvornår
som
god
finde
mand
vores
ville
dag
mellem
selv
der
mange
dem
man
någon
endast
ny
mycket
när
väg
titta
ligesom
använda
henne
sådan
hur
därför att
när
som
bra
finna
man
vår
vilja
dag
mellan
även
där
många
dem
man
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
prep
adv
conj
n
v
prep
adv
adv
n
adv
prep
adv
v
det
adv
v
v
det
a
n
v
n
v
n
a
adv
n
after
down
so
thing
tell
through
back
still
child
here
over
too
put
own
on
work
become
more
old
government
mean
part
leave
life
great
where
case
na
neer
dus
ding
vertellen
door
terug
steeds
kind
hier
over
ook
zetten
eigen
aan
werken
worden
meer
oud
regering
menen
deel
verlaten
leven
groot
waar
geval
nach
hinunter
also
ding
erzählen
durch
zurück
noch
kind
hier
über
auch
stellen
eigen
zu
arbeiten
werden
mehr
alt
regierung
meinen
teil
verlassen
leben
groß
wo
fall
100
efter
ned
så
ting
fortælle
gennem
tilbage
stadig
barn
her
over
også
sætte
egen
på
arbejde
blive
mere
gammel
regering
betyde
del
forlade
liv
stor
hvor
tilfælde
efter
ned
så
ting
berätta
genom
tillbaka
fortfarande
barn
här
över
också
sätta
egen
på
arbeta
bli
mera
gammal
regering
betyda
del
lämna
liv
stor
var
fall
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
n
adv
v
det
pron
n
v
v
n
det
v
adv
v
n
n
adv
adv
n
n
v
n
n
prep
n
prep
adv
adv
woman
over
seem
same
us
work
need
feel
system
each
may
much
ask
group
number
however
again
world
area
show
course
company
under
problem
against
never
most
vrouw
over
lijken
zelfde
ons
werk
nodig hebben
voelen
systeem
elk
mogen
veel
vragen
groep
nummer
daarentegen
weer
wereld
gebied
laten zien
cursus
bedrijf
onder
probleem
tegen
nooit
meest
frau
über
scheinen
gleich
uns
arbeit
brauchen
fühlen
system
jeder
dürfen
viel
fragen
gruppe
nummer
jedoch
wieder
welt
gebiet
zeigen
kurs
firma
unter
problem
gegen
nie
meist
101
kvinde
over
synes
samme
os
arbejde
have brug for
føle
system
hver
måtte
meget
spørge
gruppe
nummer
dog
igen
verden
område
vise
kursus
entreprise
under
problem
mod
aldrig
mest
kvinna
över
tyckas
samma
oss
arbete
behöver
känna
system
varje
få
mycken
fråga
grupp
nummer
dock
igen
värld
område
visa
kurs
företag
under
problem
mot
aldrig
mest
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
n
v
v
n
n
a
adv
pron
n
adv
a
n
prep
adv
conj
adv
v
n
n
a
n
adv
v
n
v
a
n
service
try
call
hand
party
high
about
something
school
in
small
place
before
why
while
away
keep
point
house
different
country
really
provide
week
hold
large
member
dienst
proberen
bellen
hand
feest
hoog
ongeveer
iets
school
in
klein
plaats
voor
waarom
terwijl
weg
houden
punt
huis
anders
land
echt
voorzien
week
houden
groot
lid
dienst
versuchen
anrufen
hand
party
hoch
ungefähr
etwas
schule
in
klein
platz
für
warum
während
weg
halten
punkt
haus
anders
land
wirklich
bieten
woche
halten
groß
mitglied
102
tjeneste
forsøge
ringe op
hånd
fest
høj
omkring
noget
skole
i
lille
plads
foran
hvorfor
mens
væk
holde
punkt
hus
anderledes
land
virkelig
forsyne
uge
holde
stor
medlem
tjänst
försöka
ringa
hand
fest
hög
omkring
något
skola
i
små
plats
före
varför
medan
borta
hålla
punkt
hus
annorlunda
land
verkligen
förse
vecka
hålla
stor
medlem
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
adv
det
v
prep
v
n
prep
a
conj
prep
v
det
n
v
conj
n
adv
n
adv
n
v
a
v
n
n
adv
det
always
next
follow
without
turn
end
within
local
where
during
bring
most
word
begin
although
example
next
family
rather
fact
like
social
write
state
percent
quite
both
altijd
volgend
volgen
zonder
draaien
einde
in
lokaal
waar
gedurende
brengen
meest
woord
beginnen
hoewel
voorbeeld
volgend
familie
liever
feit
leuk vinden
sociaal
schrijven
staat
procent
behoorlijk
beide
immer
nächst
folgen
ohne
drehen
ende
in
lokal
wo
während
bringen
meist
wort
anfangen
obwohl
beispiel
nächst
familie
eher
tatsache
mögen
sozial
schreiben
staat
prozent
ziemlich
beide
103
altid
næste
følge
uden
dreje
ende
inden
lokal
hvor
under
bringe
mest
ord
begynde
skønt
eksempel
næste
familie
snarere
faktum
synes om
social
skrive
stat
procent
helt
begge
alltid
nästa
följa
utan
vända
ände
inom
lokal
där
under
bringa
mest
ord
börja
fastän
exempel
nästa
familj
snarare
faktum
tycka om
social
skriva
stat
procent
ganska
båda
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
v
v
a
adv
v
v
det
n
n
n
n
a
n
n
n
n
n
v
n
n
n
v
n
n
n
adv
n
start
run
long
right
set
help
every
home
month
side
night
important
eye
head
information
question
business
play
power
money
change
move
interest
order
book
often
development
starten
rennen
lang
goed
zetten
helpen
elk
thuis
maand
zijde
nacht
belangrijk
oog
hoofd
informatie
vraag
zaak
spelen
macht
geld
verandering
verhuizen
interesse
bestelling
boek
vaak
ontwikkeling
starten
laufen
lang
richtig
setzen
helfen
jeder
zuhause
monat
seite
nacht
wichtig
auge
haupt
information
frage
unternehmen
spielen
macht
geld
veränderung
umziehen
interesse
bestellung
buch
oft
entwicklung
104
starte
løbe
lang
rigtig
sætte
hjælpe
hver
hjem
måned
side
nat
vigtig
øje
hoved
information
spørgsmål
forretning
spille
magt
penge
forandring
flytte
interesse
bestilling
bog
ofte
udvikling
starta
löpa
lång
rätt
ställa
hjälpa
varje
hem
månad
sida
natt
viktig
öga
huvud
information
fråga
företag
spela
makt
pengar
förändring
flytta
intresse
beställning
bok
ofta
utveckling
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
a
a
v
v
n
conj
n
n
n
n
adv
adv
v
n
conj
conj
n
v
v
n
adv
a
pron
n
v
n
n
young
national
pay
hear
room
whether
water
form
car
other
yet
perhaps
meet
level
until
though
policy
include
believe
council
already
possible
nothing
line
allow
need
effect
jong
nationaal
betalen
horen
kamer
of
water
vorm
auto
ander
nog
misschien
ontmoeten
niveau
tot
hoewel
beleid
bevatten
geloven
raad
al
mogelijk
niets
lijn
toestaan
behoefte
effect
jung
national
Zahlen
hören
zimmer
ob
wasser
form
auto
ander
noch
vielleicht
treffen
niveau
bis
obwohl
politik
einbeziehen
glauben
rat
schon
möglich
nichts
linie
erlauben
bedarf
effekt
105
ung
national
betale
høre
værelse
hvorvidt
vand
form
bil
anden
endnu
måske
møde
niveau
til
selvom
politik
indeholde
tro
råd
allerede
mulig
intet
line
tillade
behov
effekt
ung
nationell
betala
höra
rum
huruvida
vatten
form
bil
annan
ännu
kanske
träffa
nivå
till
fast
politik
inkluderar
tro
råd
redan
möjlig
ingenting
linje
tillåta
behov
effekt
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
a
n
v
v
n
n
n
v
n
conj
n
n
n
v
n
n
adv
a
adv
det
v
n
adv
a
n
a
adv
big
use
lead
stand
idea
study
lot
live
job
since
name
result
body
happen
friend
right
least
right
almost
much
carry
authority
long
early
view
public
together
groot
gebruik
leiden
staan
idee
studie
lot
leven
baan
sinds
naam
resultaat
lichaam
gebeuren
vriend
recht
minst
rechts
bijna
veel
dragen
autoriteit
lang
vroeg
uitzicht
publiek
samen
groß
gebrauch
leiten
stehen
idee
studie
los
leben
job
seit
name
ergebnis
körper
passieren
freund
recht
mindest
rechts
fast
viel
tragen
autorität
lang
früh
blick
publik
zusammen
106
stor
brug
føre
stå
idé
studium
lod
leve
job
siden
navn
resultat
krop
ske
ven
rettighed
mindst
højre
næsten
megen
bære
autoritet
langt
tidlig
udsigt
offentlig
sammen
stor
användning
leda
stå
idé
studie
lott
leva
jobb
sedan
namn
resultat
kropp
ske
vän
rättighet
minst
höger
nästan
mycken
bära
auktoritet
långt
tidig
utsikt
offentlig
tillsammans
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
v
n
conj
a
conj
n
n
v
n
v
v
a
a
adv
n
n
n
n
n
v
n
v
n
adv
n
n
n
talk
report
after
only
before
bit
face
sit
market
appear
continue
able
political
later
hour
law
door
court
office
let
war
produce
reason
less
minister
subject
person
praten
verslag
na
enig
voor
beetje
gezicht
zitten
markt
verschijnen
doorgaan
capabel
politiek
later
uur
wet
deur
hof
kantoor
laten
oorlog
produceren
reden
minder
minister
onderwerp
persoon
reden
report
nach
einzig
bevor
bisschen
gesicht
sitzen
markt
scheinen
fortsetzen
fähig
politisch
später
stunde
gesetz
tür
gericht
büro
lassen
krieg
produzieren
grund
weniger
minister
thema
person
107
snakke
rapport
efter
eneste
før
stykke
ansigt
sidde
marked
synes
fortsætte
dygtig
politisk
senere
time
lov
dør
domstol
kontor
lade
krig
producere
grund
mindre
minister
emne
person
prata
rapport
efter
enda
före
bitars
ansikte
sitta
marknad
synas
fortsätta
duktig
politisk
senare
timme
lag
dörr
domstol
kontor
låta
krig
producera
skäl
mindre
minister
ämne
person
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
n
a
a
v
n
v
v
a
prep
n
v
v
v
n
n
n
v
a
n
adv
adv
n
n
v
v
adv
v
term
particular
full
involve
sort
require
suggest
far
towards
period
consider
read
change
society
process
mother
offer
late
voice
both
once
police
kind
lose
add
probably
expect
periode
bijzonder
vol
betrekken
soort
vereisen
suggereren
ver
richting
periode
beschouwen
lezen
veranderen
maatschappij
proces
moeder
aanbieden
laat
stem
beide
eens
politie
soort
verliezen
toevoegen
waarschijnlijk
verwachten
laufzeit
besondere
voll
beteiligen
sorte
erfordern
vorschlagen
weit
zu
periode
betrachten
lesen
ändern
gesellschaft
prozess
mutter
anbieten
spät
stimme
beide
einmal
polizei
art
verlieren
hinzufügen
warscheinlich
erwarten
108
periode
særlig
fuld
involvere
sort
kræve
foreslå
fjern
mod
periode
betragte
læse
forandre
samfund
proces
mor
tilbyde
sen
stemme
begge
engang
politi
slags
tabe
tilføje
sandsynligvis
forvente
term
särskild
full
involvera
sort
kräva
föreslå
fjärran
mot
period
överväga
läsa
ändra
samhälle
process
mor
erbjuda
sen
röst
båda
en gång
polis
slag
förlora
tillägga
troligen
förvänta
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
adv
a
adv
n
a
n
n
adv
v
n
a
n
det
n
n
v
n
n
n
adv
n
v
v
n
adv
adv
v
ever
available
no
price
little
action
issue
far
remember
position
low
cost
little
matter
community
remain
figure
type
research
actually
education
fall
speak
few
today
enough
open
ooit
beschikbaar
niet
prijs
klein
actie
kwestie
ver
herinneren
positie
laag
kosten
weinig
zaak
gemeenschap
blijven
figuur
type
onderzoek
eigenlijk
educatie
vallen
spreken
paar
vandaag
genoeg
openen
je
verfügbar
nicht
preis
klein
aktion
frage
weit
erinnern
position
niedrig
kosten
wenig
sache
gemeinschaft
bleiben
figur
typ
forschung
eigentlich
bildung
fallen
sprechen
paar
heute
genug
öffnen
109
nogensinde
tilgængelig
nej
pris
lille
handling
emne
fjernt
huske
position
lav
omkostninger
lidt
sag
fællesskab
forblive
figur
type
forskning
faktisk
uddannelse
falde
tale
få
i dag
nok
åbne
någonsin
tillgänglig
nej
pris
liten
handling
fråga
långt
minnas
position
låg
kostnad
litet
sak
gemenskap
förbli
figur
typ
forskning
faktiskt
utbildning
falla
tala
få
i dag
nog
öppna
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
a
v
n
n
n
n
n
n
v
n
n
v
n
v
a
v
v
n
v
n
n
v
n
det
v
v
v
bad
buy
programme
minute
moment
girl
age
centre
stop
control
value
send
health
decide
main
win
understand
decision
develop
class
industry
receive
back
several
return
build
spend
slecht
kopen
programma
minuut
moment
meisje
leeftijd
centrum
stoppen
controle
waarde
zenden
gezondheid
besluiten
hoofdwinnen
begrijpen
beslissing
ontwikkelen
klasse
industrie
ontvangen
rug
verschillende
terugkeren
bouwen
besteden
schlecht
kaufen
programm
minut
moment
mädchen
alter
zentrum
stoppen
kontrolle
wert
schicken
gesundheit
entscheiden
hauptgewinnen
verstehen
entscheidung
entwickeln
klasse
industrie
empfangen
rücken
verschiedene
zurückkehren
bauen
ausgeben
110
dårlig
købe
program
minut
moment
pige
alder
centrum
stoppe
kontrol
værdi
sende
sundhed
beslutte
hovedvinde
forstå
beslutning
udvikle
klasse
industri
modtage
ryg
adskillige
vende tilbage
bygge
bruge
dålig
köpa
program
minut
moment
flicka
ålder
centrum
stoppa
kontroll
värde
sända
hälsa
besluta
huvudvinna
förstå
beslut
utveckla
klass
industri
mottaga
rygg
flera
återvända
bygga
spendera
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
n
n
n
prep
v
v
a
v
prep
v
a
n
adv
n
n
n
adv
n
n
n
prep
n
v
a
v
n
a
force
condition
paper
off
describe
agree
economic
increase
upon
learn
general
century
therefore
father
section
patient
around
activity
road
table
including
church
reach
real
lie
mind
likely
kracht
conditie
papier
vanaf
beschrijven
eens zijn
economisch
toenemen
op
leren
algemeen
eeuw
daarom
vader
sectie
patiënt
rond
activiteit
weg
tafel
inclusief
kerk
bereiken
echt
liggen
geest
waarschijnlijk
kraft
kondition
papier
ab
beschreiben
zustimmen
wirtschaftlich
zunehmen
auf
lernen
allgemein
jahrhundert
deswegen
vater
abschnitt
patient
rund
aktivität
weg
tisch
einschließlich
kirche
erreichen
echt
liegen
geist
warscheinlich
111
kraft
kondition
papir
fra
beskrive
forliges
økonomisk
øge
på
lære
generel
århundrede
derfor
far
sektion
patient
omkring
aktivitet
vej
bord
inklusive
kirke
nå
ægte
ligge
sind
sandsynlig
kraft
kondition
papper
av
beskriva
enas
ekonomisk
öka
på
lära
allmän
århundrade
därför
far
sektion
patient
runt
aktivitet
väg
bord
inklusive
kyrka
nå
äkta
ligga
sinne
trolig
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
prep
n
n
n
adv
n
n
n
a
n
det
prep
n
v
v
a
a
a
adv
n
n
n
v
v
prep
n
n
among
team
experience
death
soon
act
sense
staff
certain
student
half
around
language
walk
die
special
difficult
international
particularly
department
management
morning
draw
hope
across
plan
product
onder
team
ervaring
dood
binnenkort
daad
zin
personeel
zeker
student
half
rond
taal
lopen
sterven
speciaal
moeilijk
internationaal
bijzonder
afdeling
beheer
ochtend
tekenen
hopen
over
plan
product
unter
team
erfahrung
tod
bald
handlung
sinn
personal
sicher
student
halb
rund
sprache
laufen
sterben
speziell
schwierig
international
besonders
abteilung
management
morgen
zeichnen
hoffen
über
plan
produkt
112
blandt
hold
erfaring
død
snart
handling
mening
personale
vis
studerende
halv
rundt omkring
sprog
gå
dø
speciel
svær
international
især
afdeling
ledelse
morgen
tegne
håbe
over
plan
produkt
bland
team
erfarenhet
död
snart
handling
mening
personal
viss
student
halv
runt
språk
gå
dö
speciell
svår
internationell
särskilt
avdelning
ledning
morgon
rita
hoppas
över
plan
produkt
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
n
adv
n
n
n
v
n
n
a
n
n
n
n
n
n
adv
n
v
n
n
n
prep
a
v
v
a
n
city
early
committee
ground
letter
create
evidence
foot
clear
boy
game
food
role
practice
bank
else
support
sell
event
building
range
behind
sure
report
pass
black
stage
stad
vroeg
comité
grond
letter
creëren
bewijs
voet
duidelijk
jongen
spel
voedsel
rol
oefening
bank
anders
ondersteuning
verkopen
gebeurtenis
gebouw
gebied
achter
zeker
verslag leggen
passeren
zwart
fase
stadt
früh
ausschuss
boden
buchstabe
schaffen
beweis
fuß
klar
junge
spiel
nahrung
rolle
übung
bank
sonst
unterstützung
verkaufen
veranstaltung
gebäude
reichweite
hinter
sicher
berichten
passieren
schwarz
phase
113
by
tidligt
komite
jord
bogstav
skabe
bevis
fod
klar
dreng
spil
mad
rolle
øvelse
bank
andet
støtte
sælge
begivenhed
bygning
rækkevide
bag
sikker
berette
passere
sort
stadie
stad
tidigt
utskott
jord
bokstav
skapa
bevis
fot
klar
pojke
spel
mat
roll
övning
bank
annars
stöd
sälja
händelse
byggnad
räckvidd
bakom
säker
rapportera
passera
svart
stadium
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
n
adv
adv
v
n
n
adv
n
v
n
n
n
n
n
v
a
n
det
adv
n
n
n
n
a
a
a
n
meeting
sometimes
thus
accept
town
art
further
club
cause
arm
history
parent
land
trade
watch
white
situation
whose
ago
teacher
record
manager
relation
common
strong
whole
field
vergadering
soms
dus
accepteren
stad
kunst
verder
club
veroorzaken
arm
geschiedenis
ouder
land
handel
kijken
wit
situatie
wiens
geleden
leraar
opname
manager
relatie
veelvoorkomend
sterk
geheel
veld
treffen
manchmal
so
akzeptieren
stadt
kunst
weiter
klub
verursachen
arm
vergangenheit
elternteil
land
handel
ansehen
weiß
situation
wessen
her
lehrer
rekord
manager
beziehung
häufig
stark
ganz
feld
114
møde
sommetider
således
acceptere
by
kunst
videre
klub
forårsage
arm
historie
forælder
land
handel
iagttage
hvid
situation
hvis
siden
lærer
rekord
leder
forhold
almindelig
stærk
hel
felt
möte
ibland
således
acceptera
stad
konst
vidare
klubb
orsaka
arm
historia
förälder
land
handel
titta
vit
situation
vems
sedan
lärare
rekord
chef
förhållande
vanlig
stark
hel
fält
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
a
v
adv
v
n
n
v
v
det
v
adv
n
n
n
n
v
v
n
v
n
n
adv
n
v
v
v
n
free
break
yesterday
support
window
account
explain
stay
few
wait
usually
difference
material
air
wife
cover
apply
project
raise
sale
relationship
indeed
light
claim
form
base
care
vrij
breken
gisteren
ondersteunen
raam
rekening
uitleggen
blijven
weinig
wachten
gewoonlijk
verschil
materiaal
lucht
vrouw
bedekken
toepassen
project
verhogen
verkoop
relatie
inderdaad
licht
beweren
vormen
baseren
verzorging
frei
brechen
gestern
unterstützen
fenster
konto
erklären
bleiben
wenig
warten
gewöhnlich
unterschied
material
luft
frau
abdecken
anwenden
projekt
erhöhen
verkauf
beziehung
tatsächlich
licht
behaupten
bilden
basieren
pflege
115
fri
brække
i går
støtte
vindue
konto
forklare
blive
få
vente
normalt
forskel
materiale
luft
kone
dække
anvende
projekt
hæve
salg
forhold
virkelig
lys
hævde
danne
basere
pleje
fri
bryta
i går
stödja
fönster
konto
förklara
stanna
få
vänta
vanligtvis
skillnad
material
luft
fru
täcka
tillämpa
projekt
höja
försäljning
förhållande
verkligen
ljus
hävda
bilda
basera
omsorg
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
pron
pron
adv
n
adv
v
v
a
n
n
n
n
n
n
n
a
n
n
n
a
n
n
n
n
a
adv
v
someone
everything
certainly
rule
home
cut
grow
similar
story
quality
tax
worker
nature
structure
data
necessary
pound
method
unit
central
bed
union
movement
board
true
simply
contain
iemand
alles
zeker
regel
thuis
snijden
groeien
soortgelijk
verhaal
kwaliteit
belasting
werker
natuur
structuur
data
nodig
pond
methode
eenheid
centraal
bed
unie
beweging
bestuur
waar
simpel
bevatten
jemand
alles
sicherlich
regel
zuhause
schneiden
wachsen
ähnlich
geschichte
qualität
steuer
arbeiter
natur
struktur
daten
notwendig
pfund
methode
einheit
zentral
bett
union
bewegung
vorstand
wahr
einfach
enthalten
116
nogen
alt
sikkert
regel
hjem
skære
vokse
lignende
fortælling
kvalitet
skat
arbejder
natur
struktur
data
nødvendig
pund
metode
enhed
central
seng
union
bevægelse
bestyrelse
sand
simpelthen
indeholde
någon
allt
säkert
regel
hem
skära
växa
liknande
historia
kvalitet
skatt
arbetare
natur
struktur
data
nödvändig
pund
metod
enhet
central
säng
union
rörelse
styrelse
sann
enkelt
innehålla
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
adv
a
a
a
n
n
v
a
v
v
v
n
a
a
n
det
n
n
n
n
v
v
n
v
a
a
n
especially
open
short
personal
detail
model
bear
single
join
reduce
establish
wall
easy
private
computer
former
hospital
chapter
scheme
theory
choose
wish
property
achieve
financial
poor
officer
speciaal
open
kort
persoonlijk
detail
model
dragen
enkel
samenvoegen
reduceren
oprichten
muur
makkelijk
privé
computer
vorig
ziekenhuis
hoofdstuk
schema
theorie
kiezen
wensen
eigendom
bereiken
financieel
arm
officier
insbesondere
offen
kurz
persönlich
detail
modell
tragen
einzig
verbinden
reduzieren
gründen
mauer
leicht
privat
computer
vorherig
krankenhaus
kapitel
schema
theorie
wählen
wünschen
eigentum
erreichen
finanziell
arm
offizier
117
især
åben
kort
personlig
detalje
model
bære
enkelt
sammenføje
reducere
oprette
væg
let
privat
computer
tidligere
hospital
kapitel
ordning
teori
vælge
ønske
ejendom
opnå
finansiel
fattig
officer
speciellt
öppen
kort
personlig
detalj
modell
bära
enkel
förbinda
reducera
upprätta
vägg
lätt
privat
dator
tidigare
sjukhus
kapitel
schema
teori
välja
önska
egendom
uppnå
finansiell
fattig
officer
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
prep
n
n
v
v
v
n
n
n
v
a
prep
n
n
n
n
v
a
n
a
n
n
n
n
a
n
n
up
charge
director
drive
deal
place
approach
chance
application
seek
foreign
along
top
amount
son
operation
fail
human
opportunity
simple
leader
look
share
production
recent
firm
picture
op
kosten
directeur
rijden
uitdelen
plaatsen
benadering
kans
toepassing
zoeken
buitenlands
langs
top
hoeveelheid
zoon
operatie
falen
menselijk
kans
simpel
leider
blik
deel
productie
recent
firma
afbeelding
auf
gebühr
direktor
fahren
austeilen
platzieren
ansatz
chance
anwendung
suchen
ausländisch
entlang
top
menge
sohn
operation
scheitern
menschlich
chance
einfach
führer
blick
teil
produktion
kürzlich
firma
bild
118
op
omkostning
direktør
køre
dele ud
placere
tilgang
chance
anvendelse
søge
udenlandsk
langs
top
mængde
søn
operation
fejle
menneskelig
lejlighed
simpel
leder
blik
del
produktion
nylig
firma
billede
upp
kostnad
direktör
köra
utdela
placera
närmande
chans
tillämpning
söka
utländsk
längs
topp
mängd
son
operation
misslyckas
mänsklig
tillfälle
enkel
ledare
blick
del
produktion
färsk
firma
bild
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
n
n
v
prep
v
n
a
v
n
adv
v
v
n
n
v
a
prep
n
v
n
v
v
v
n
adv
n
n
source
security
serve
according
end
contract
wide
occur
agreement
better
kill
act
site
labour
plan
various
since
test
eat
loss
close
represent
love
colour
clearly
shop
benefit
bron
zekerheid
dienen
volgens
beëindigen
contract
wijd
voorkomen
overeenkomst
beter
doden
handelen
plaats
arbeid
plannen
verschillend
sinds
test
eten
verlies
sluiten
vertegenwoordigen
houden van
kleur
duidelijk
winkel
voordeel
119
quelle
sicherheit
dienen
nach
beenden
vertrag
weit
vorkommen
vereinbarung
besser
töten
handeln
stelle
arbeit
planen
verschiedene
seit
test
essen
verlust
schließen
darstellen
lieben
farbe
deutlich
laden
vorteil
kilde
sikkerhed
tjene
ifølge
slutte
kontrakt
vid
forekomme
aftale
bedre
dræbe
handle
sted
arbejde
planlægge
forskellige
siden
test
spise
tab
lukke
repræsentere
elske
farve
tydeligt
butik
fordel
källa
säkerhet
tjäna
enligt
sluta
kontrakt
vid
förekomma
avtal
bättre
döda
handla
plats
arbete
planera
diverse
sedan
test
äta
förlust
stänga
representera
älska
färg
tydligt
butik
fördel
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
n
n
n
n
n
n
v
n
a
n
n
v
n
pron
pron
n
n
v
v
n
n
a
n
n
a
n
n
animal
heart
election
purpose
standard
secretary
rise
date
hard
music
hair
prepare
factor
other
anyone
pattern
piece
discuss
prove
front
evening
royal
tree
population
fine
plant
pressure
dier
hart
verkiezing
doel
standaard
secretaris
stijgen
datum
hard
muziek
haar
voorbereiden
factor
ander
iemand
patroon
stuk
bespreken
bewijzen
voorkant
avond
koninklijk
boom
populatie
fijn
plant
druk
tier
herz
wahl
zweck
standard
sekretär
steigen
datum
hart
musik
haar
vorbereiten
faktor
andere
jemand
muster
stück
besprechen
beweisen
vorderseite
abend
königlich
baum
population
fein
pflanze
druck
120
dyr
hjerte
valg
formål
standard
sekretær
stige
dato
hård
musik
hår
forberede
faktor
anden
nogen
mønster
stykke
drøfte
bevise
forside
aften
kongelig
træ
befolkning
fin
plante
tryk
djur
hjärta
val
ändamål
standard
sekreterare
stiga
datum
hård
musik
hår
förbereda
faktor
annen
någon
mönster
stycke
dryfta
bevisa
framsida
kväll
kunglig
träd
befolkning
fin
planta
tryck
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
n
v
n
v
n
n
prep
n
n
v
n
v
n
adv
v
n
n
n
prep
n
n
n
a
a
prep
adv
n
response
catch
street
pick
performance
knowledge
despite
design
page
enjoy
individual
suppose
rest
instead
wear
basis
size
environment
per
fire
series
success
natural
wrong
near
round
thought
reactie
vangen
straat
plukken
prestatie
kennis
ondanks
ontwerp
pagina
genieten
individu
veronderstellen
rust
in plaats van
dragen
basis
maat
milieu
per
vuur
serie
succes
natuurlijk
verkeerd
dichtbij
rond
gedachte
reaktion
fangen
straße
pflücken
leistung
kenntnis
trotz
entwurf
seite
genießen
einzelne
annehmen
ruhe
stattdessen
tragen
basis
größe
umwelt
pro
feuer
serie
erfolg
natürlich
falsch
nah
rund
gedanke
121
reaktion
fange
gade
plukke
ydeevne
viden
trods
design
side
nyde
individ
antage
hvile
i stedet
have på
basis
størrelse
miljø
per
brand
serie
succes
naturlig
forkert
nær
rundt
tanke
respons
fånga
gata
plocka
prestation
kunskap
trots
design
sida
njuta
individ
anta
vila
i stället
ha på
basis
storlek
miljö
per
brand
serie
framgång
naturlig
fel
nära
runt
tanke
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
n
v
a
n
v
n
v
n
v
v
n
n
n
n
n
v
n
n
n
v
n
n
n
a
v
n
n
list
argue
final
future
introduce
analysis
enter
space
arrive
ensure
demand
statement
attention
love
principle
pull
set
doctor
choice
refer
feature
couple
step
following
thank
machine
income
lijst
ruzie maken
definitief
toekomst
introduceren
analyse
binnengaan
ruimte
aankomen
verzekeren
eis
uitspraak
aandacht
liefde
principe
trekken
stel
dokter
keuze
verwijzen
kenmerk
koppel
stap
volgend
danken
machine
inkomen
liste
streiten
endgültig
zukunft
einführen
analyse
hereinkommen
raum
ankommen
gewährleisten
bedarf
aussage
achtung
liebe
prinzip
ziehen
set
doktor
wahl
verweisen
merkmal
paar
schritt
folgende
danken
maschine
einkommen
122
liste
drøfte
endelig
fremtid
introducere
analyse
kommer i
plads
ankomme
sikre
krav
udtalelse
opmærksomhed
kærlighed
princip
trække
sæt
læge
valg
henvise
særpræg
par
skridt
følgende
takke
maskine
indkomst
lista
gräla
slutlig
framtid
introducera
analys
komma in
rymd
anlända
försäkra
krav
uttalande
uppmärksamhet
kärlek
princip
dra
set
läkare
val
hänvisa
särdrag
par
steg
följande
tacka
maskin
inkomst
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
n
v
n
n
n
n
n
pron
a
n
n
v
n
n
a
n
a
n
a
a
a
n
adv
n
n
n
n
training
present
association
film
region
effort
player
everyone
present
award
village
control
organisation
news
nice
difficulty
modern
cell
close
current
legal
energy
finally
degree
mile
means
growth
training
presenteren
associatie
film
regio
moeite
speler
iedereen
tegenwoordig
onderscheiding
dorp
beheren
organisatie
nieuws
leuk
moeilijkheid
modern
cel
dicht
huidig
legaal
energie
uiteindelijk
graad
mijl
middelen
groei
ausbildung
präsentieren
assoziation
film
region
mühe
spieler
jeder
vorhanden
auszeichnung
dorf
kontrollieren
organisation
nachrichten
nett
schwierigkeit
modern
zelle
nah
gegenwärtig
legal
energie
schließlich
grad
meile
mittel
wachstum
123
uddannelse
præsentere
forening
film
region
indsats
spiller
alle
nuværende
pris
landsby
kontrollere
organisation
nyheder
dejlig
vanskelighed
moderne
celle
tæt
aktuel
legal
energi
endelig
grad
mil
midler
vækst
träning
presentera
förening
film
region
ansträngning
spelare
alla
nuvarande
pris
by
kontrollera
organisation
nyheter
trevlig
svårighet
modern
cell
nära
aktuell
laglig
energi
slutligen
grad
mil
medel
växt
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
n
n
prep
n
v
adv
a
a
n
a
v
n
v
n
n
n
n
n
n
n
n
v
n
n
v
n
v
treatment
sound
above
task
affect
please
red
happy
behaviour
concerned
point
function
identify
resource
defence
garden
floor
technology
style
feeling
science
relate
doubt
horse
force
answer
compare
behandeling
geluid
boven
taak
beïnvloeden
alstublieft
rood
gelukkig
gedrag
bezorgd
wijzen
functie
identificeren
bron
verdediging
tuin
vloer
technologie
stijl
gevoel
wetenschap
betreffen
twijfel
paard
dwingen
antwoord
vergelijken
behandlung
klang
oben
aufgabe
beeinflussen
bitte
rot
glücklich
verhalten
besorgt
weisen
funktion
identifizieren
ressource
abwehr
garten
boden
technologie
stil
gefühl
wissenschaft
betreffen
zweifel
pferd
zwingen
antwort
vergleichen
124
behandling
lyd
over
opgave
påvirke
vær så venlig
rød
lykkelig
adfærd
bekymret
pege
funktion
identificere
ressource
forsvar
have
gulv
teknologi
stil
følelse
videnskab
relatere
tvivl
hest
tvinge
svar
sammenligne
behandling
ljud
ovan
uppgift
påverka
snälla
röd
lycklig
beteende
bekymrad
peka
funktion
identifiera
resurs
försvar
trädgård
golv
teknologi
stil
känsla
vetenskap
relatera
tvivel
häst
tvinga
svar
jämföra
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
v
a
adv
v
n
n
n
n
a
conj
n
v
adv
n
v
v
n
n
n
a
n
adv
v
a
adv
v
n
suffer
individual
forward
announce
user
fund
character
risk
normal
nor
dog
obtain
quickly
army
indicate
forget
station
glass
cup
previous
husband
recently
publish
serious
anyway
visit
capital
lijden
individueel
vooruit
aankondigen
gebruiker
fonds
karakter
risico
normaal
noch
hond
verkrijgen
snel
leger
aangeven
vergeten
station
glas
kop
vorig
echtgenoot
recent
publiceren
serieus
hoe dan ook
bezoeken
kapitaal
leiden
individuell
vorwärts
ankündigen
benutzer
fonds
charakter
risiko
normal
noch
hund
erhalten
schnell
heer
angeben
vergessen
bahnhof
glas
tasse
früher
ehemann
kürzlich
veröffentlichen
ernst
irgendwie
besuchen
kapital
125
lide
individuel
fremad
annoncere
bruger
fond
karakter
risiko
normal
heller ikke
hund
opnå
hurtigt
hær
angive
glemme
station
glas
kop
tidligere
mand
nylig
udgive
seriøs
i hvert falt
besøge
kapital
lida
individuell
framåt
tillkännage
användare
fond
karaktär
risk
normal
inte heller
hund
erhålla
snabbt
armé
ange
glömma
station
glas
kopp
tidigare
man
nyligen
publicera
seriös
i alla fall
besöka
kapital
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
n
n
n
v
n
n
a
n
n
n
v
n
v
v
v
n
n
n
v
prep
v
adv
n
v
v
v
v
note
season
argument
listen
show
responsibility
significant
deal
economy
element
finish
duty
fight
train
maintain
attempt
leg
investment
save
throughout
design
suddenly
brother
improve
avoid
wonder
tend
aantekening
seizoen
argument
luisteren
show
verantwoordelijkheid
significant
overeenkomst
economie
element
beëindigen
plicht
vechten
trainen
onderhouden
poging
been
investering
redden
gedurende
ontwerpen
plotseling
broer
verbeteren
ontwijken
afvragen
neigen
126
anmerkung
jahreszeit
argument
hören
show
verantwortlichkeit
signifikant
deal
wirtschaft
element
beenden
pflicht
kämpfen
trainieren
pflegen
versuch
bein
investition
retten
während
entwerfen
plötzlich
bruder
verbessern
vermeiden
sich fragen
neigen
note
årstid
argument
lytte
show
ansvar
signifikant
deal
økonomi
element
slutte
pligt
kæmpe
træne
vedligeholde
forsøg
ben
investering
redde
hele vejen
designe
pludseligt
bror
forbedre
undgå
undre sig
have tendens
anteckning
årstid
argument
lyssna
show
ansvar
signifikant
överenskommelse
ekonomi
element
sluta
plikt
kämpa
träna
underhålla
försök
ben
investering
rädda
över hela
designa
plötsligt
bror
förbättra
undvika
undra
tendera
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
n
n
n
n
n
a
v
n
v
adv
n
v
v
n
adv
v
a
n
n
v
v
n
n
a
a
n
a
title
hotel
aspect
increase
help
industrial
express
summer
determine
generally
daughter
exist
share
baby
nearly
smile
sorry
sea
skill
treat
remove
concern
university
left
dead
discussion
specific
titel
hotel
aspect
toename
hulp
industrieel
uiten
zomer
vaststellen
in het algemeen
dochter
bestaan
delen
baby
bijna
glimlachen
armzalig
zee
vaardigheid
behandelen
verwijderen
zorg
universiteit
links
dood
discussie
specifiek
titel
hotel
aspekt
zunahme
hilfe
industriell
äußern
sommer
bestimmen
im allgemeinen
tochter
existieren
teilen
baby
fast
lächeln
traurig
meer
fähigkeit
behandeln
entfernen
sorge
universität
links
tot
diskussion
spezifisch
127
titel
hotel
aspekt
stigning
hjælp
industriel
udtrykke
sommer
bestemme
generelt
datter
findes
dele
baby
næsten
smile
bedrøvelig
hav
færdighed
behandle
fjerne
bekymring
universitet
venstre
død
diskussion
specifik
titel
hotell
aspekt
ökning
hjälp
industriell
uttrycka
sommar
fastställa
generellt
dotter
finnas
dela
baby
nästan
le
ledsen
hav
färdighet
behandla
förflytta
oro
universitet
vänster
död
diskussion
specifik
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
n
n
prep
v
n
n
a
n
n
v
n
v
n
v
n
n
adv
conj
a
n
a
n
v
n
n
n
v
customer
box
outside
state
conference
whole
total
profit
division
throw
procedure
fill
king
assume
image
oil
obviously
unless
appropriate
circumstance
military
proposal
mention
client
sector
direction
admit
klant
doos
buiten
mededelen
conferentie
geheel
totaal
profijt
verdeling
gooien
procedure
vullen
koning
aannemen
beeld
olie
duidelijk
tenzij
gepast
omstandigheid
militair
voorstel
noemen
cliënt
sector
richting
toegeven
kunde
karton
außerhalb
erklären
konferenz
ganze
total
gewinn
teilung
werfen
verfahren
füllen
könig
annehmen
bild
öl
offenbar
wenn nicht
geeignet
umstand
militärisch
vorschlag
nennen
klient
sektor
richtung
zugeben
128
kunde
kasse
udenfor
meddele
konference
hele
total
profit
inddeling
smide
procedure
fylde
konge
antage
billede
olie
åbenbart
medmindre
passende
omstændighed
militær
forslag
nævne
klient
sektor
retning
indrømme
kund
låda
utanför
uppge
konferens
helhet
total
vinst
uppdelning
kasta
procedur
fylla
kung
anta
bild
olja
uppenbarligen
om inte
lämplig
omständighet
militärisk
förslag
nämna
klient
sektor
riktning
erkänna
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
adv
v
a
adv
n
n
a
a
adv
v
a
n
n
n
adv
adv
n
v
prep
n
n
v
n
n
prep
v
a
though
replace
basic
hard
instance
sign
original
successful
okay
reflect
aware
measure
attitude
disease
exactly
above
commission
intend
beyond
seat
president
encourage
addition
goal
round
miss
popular
doch
vervangen
basis
moeilijk
voorbeeld
teken
origineel
succesvol
oké
reflecteren
bewust
maat
houding
ziekte
precies
boven
provisie
bedoelen
voorbij
zitplaats
president
bemoedigen
toevoeging
doel
rond
missen
populair
doch
ersetzen
grundlegend
schwer
fall
zeichen
original
erfolgreich
ok
reflektieren
bewusst
maß
haltung
krankheit
genau
oben
kommission
beabsichtigen
über
sitz
präsident
ermutigen
zusatz
ziel
rund
vermissen
populär
129
dog
erstatte
grundlæggende
hårdt
eksempel
tegn
original
succesfuld
ok
reflektere
bevidst
mål
holdning
sygdom
præcis
over
kommission
agte
over
sæde
præsident
tilskynde
tilføjelse
mål
rund
savne
populær
fast
ersätta
grundläggande
hårt
exempel
tecken
originell
framgångsrik
okej
reflektera
medveten
mått
hållning
sjukdom
precis
ovan
kommission
ämna
bortom
sittplats
president
uppmuntra
tillägg
mål
runt
sakna
populär
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
n
n
n
v
a
det
conj
v
v
n
adv
n
v
n
a
n
n
a
v
n
n
n
v
v
v
n
a
affair
technique
respect
drop
professional
less
once
fly
reveal
version
maybe
ability
operate
campaign
heavy
advice
institution
top
discover
surface
library
pupil
record
refuse
prevent
advantage
dark
affaire
techniek
respect
laten vallen
professioneel
minder
eens
vliegen
onthullen
versie
misschien
vermogen
bedienen
campagne
zwaar
advies
instituut
top
ontdekken
oppervlak
bibliotheek
pupil
opnemen
weigeren
voorkomen
voordeel
donker
affäre
technik
respekt
fallen lassen
professionell
weniger
einmal
fliegen
enthüllen
version
vielleicht
vermögen
betreiben
kampagne
schwer
rat
institution
ober
entdecken
oberfläche
bibliothek
schüler
aufnehmen
verweigern
verhindern
vorteil
dunkel
130
affære
teknik
respekt
lade falde
professionel
mindre
engang
flyve
afsløre
version
måske
evne
betjene
kampagne
tung
råd
institution
top
opdage
overflade
bibliotek
elev
optage
nægte
forhindre
fordel
mørk
affär
teknik
respekt
tappa
professionell
mindre
en gång
flyga
avslöja
version
kanske
förmåga
fungera
kampanj
tung
råd
institution
övre
upptäcka
yta
bibliotek
elev
registrera
vägra
förhindra
fördel
mörk
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
v
n
n
n
v
n
v
n
n
v
n
n
a
n
n
v
n
a
n
n
v
adv
n
n
n
n
n
teach
memory
culture
blood
cost
majority
answer
variety
press
depend
bill
competition
ready
general
access
hit
stone
useful
extent
employment
regard
apart
present
appeal
text
parliament
cause
leren
geheugen
cultuur
bloed
kosten
meerderheid
antwoorden
variëteit
pers
afhangen
rekening
competitie
klaar
generaal
toegang
raken
steen
nuttig
omvang
dienst
beschouwen
uit elkaar
cadeau
appel
tekst
parlement
oorzaak
lehren
speicher
kultur
blut
kosten
mehrheit
antworten
varietät
presse
abhängen
rechnung
wettbewerb
fertig
general
zugreifen
treffen
stein
nützlich
umfang
beschäftigung
betrachten
auseinander
geschenk
beschwerde
text
parlament
ursache
131
lære
hukommelse
kultur
blod
koste
flertal
svare
sort
presse
afhænge
regning
konkurrence
klar
general
adgang
træffe
sten
nyttig
omfang
beskæftigelse
betragte
hinanden
gave
appel
tekst
parlament
årsag
lära
minne
kultur
blod
kosta
majoritet
svara
variant
press
bero
räkning
konkurrens
klar
general
tillgång
träffa
sten
nyttig
omfattning
sysselsättning
betrakta
isär
gåva
vädja
text
parlament
orsak
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
n
n
n
a
n
n
a
n
adv
adv
v
v
n
a
n
adv
n
n
n
n
v
n
n
n
v
n
n
terms
bar
attack
effective
mouth
fish
future
visit
little
easily
attempt
enable
trouble
traditional
payment
best
post
county
lady
holiday
realise
importance
chair
facility
complete
article
object
voorwaarden
bar
aanval
effectief
mond
vis
toekomstig
bezoek
weinig
gemakkelijk
proberen
in staat stellen
problemen
traditioneel
betaling
best
post
graafschap
dame
vakantie
realiseren
belang
stoel
faciliteit
voltooien
artikel
object
bedingungen
bar
angriff
effektiv
mund
fisch
zukünftig
besuch
wenig
leicht
versuchen
ermöglichen
schwierigkeiten
traditionell
zahlung
best
post
grafschaft
dame
urlaub
realisieren
wichtigkeit
stuhl
einrichtung
vervollständigen
artikel
objekt
132
vilkår
bar
angreb
effektiv
mund
fisk
fremtidig
besøg
lidt
nemt
forsøge
muliggøre
problemer
traditionel
betaling
bedst
post
grevskab
dame
ferie
realisere
betydning
stol
facilitet
fuldende
artikel
objekt
villkor
bar
angrepp
effektiv
mun
fisk
framtida
besök
litet
lätt
försöka
möjliggöra
svårigheter
traditionell
betalning
bäst
post
grevskap
dam
semester
realisera
betydelse
stol
facilitet
fullborda
artikel
objekt
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
n
n
v
a
n
a
adv
n
n
n
a
a
n
v
n
n
a
conj
v
n
n
n
n
n
n
n
n
context
survey
notice
complete
turn
direct
immediately
collection
reference
card
interesting
considerable
television
extend
communication
agency
physical
except
check
sun
species
possibility
official
chairman
speaker
second
career
context
enquête
opmerken
compleet
draai
direct
onmiddellijk
collectie
referentie
kaart
interessant
aanzienlijk
televisie
uitbreiden
communicatie
agentschap
fysiek
behalve
controleren
zon
soort
mogelijkheid
ambtenaar
voorzitter
spreker
seconde
carrière
kontext
umfrage
bemerken
komplett
wende
direkt
sofort
sammlung
referenz
karte
interessant
erheblich
fernseher
erweitern
kommunikation
agentur
physisch
außer
prüfen
sonne
art
möglichkeit
beamte
vorsitzende
sprecher
sekunde
karriere
133
kontekst
undersøgelse
bemærke
komplet
drej
direkte
straks
samling
reference
kort
interessant
betydelig
fjernsyn
udvide
kommunikation
agentur
fysisk
undtagen
kontrollere
sol
art
mulighed
officiel
formand
taler
sekund
karriere
kontext
undersökning
märka
komplett
vändning
direkt
genast
samling
referens
kort
intressant
betydlig
television
utvidga
kommunikation
agentur
fysisk
utom
kontrollera
sol
art
möjlighet
tjänsteman
ordförande
talare
sekund
karriär
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
v
n
v
a
n
n
n
n
a
a
v
n
n
n
v
n
a
n
v
adv
n
a
n
n
n
n
n
n
laugh
weight
sound
responsible
base
document
solution
return
medical
hot
recognise
talk
budget
river
fit
organization
existing
start
push
tomorrow
requirement
cold
edge
opposition
opinion
drug
quarter
option
lachen
gewicht
klinken
verantwoordelijk
basis
document
oplossing
terugkeer
medisch
heet
herkennen
lezing
budget
rivier
passen
organisatie
bestaand
start
duwen
morgen
vereiste
koud
rand
oppositie
mening
medicijn
kwart
optie
lachen
gewicht
klingen
verantwortlich
basis
dokument
lösung
rückkehr
medizinisch
heiß
erkennen
vortrag
budget
fluss
passen
organisation
vorhanden
start
drücken
morgen
voraussetzung
kalt
rand
opposition
meinung
droge
viertel
option
134
le
vægt
lyde
ansvarlig
basis
dokument
løsning
tilbagevenden
medicinsk
hed
genkende
foredrag
budget
flod
passe
organisation
eksisterende
start
skubbe
i morgen
krav
kold
kant
opposition
mening
medicin
kvart
option
skratta
vikt
ljuda
ansvarig
bas
dokument
lösning
återkomst
medicinsk
het
erkänna
föreläsning
budget
flod
passa
organisation
existerande
start
trycka
i morgon
krav
kall
kant
opposition
mening
läkemedel
kvart
option
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
v
prep
n
v
n
n
n
adv
n
adv
n
n
v
v
n
a
n
n
pron
n
v
n
a
n
n
n
n
a
sign
worth
call
define
stock
influence
occasion
eventually
software
highly
exchange
lack
shake
study
concept
blue
star
radio
no-one
arrangement
examine
bird
green
band
sex
finger
past
independent
gebaren
waard
oproep
definiëren
voorraad
invloed
gelegenheid
uiteindelijk
software
zeer
uitwisseling
tekort
schudden
studeren
concept
blauw
ster
radio
niemand
regeling
onderzoeken
vogel
groen
band
seks
vinger
verleden
onafhankelijk
gebärden
wert
anruf
definieren
vorrat
einfluss
gelegenheit
schliesslich
software
sehr
austausch
mangel
schütteln
studieren
konzept
blau
stern
radio
niemand
anordnung
untersuchen
vogel
grün
band
sex
finger
vergangenheit
unabhängig
135
gøre tegn
værd
opkald
definere
lager
indflydelse
lejlighed
til sidst
software
meget
udveksling
mangel
ryste
studere
koncept
blå
stjerne
radio
ingen
arrangement
undersøge
fugl
grøn
band
køn
finger
fortid
uafhængig
vinka
värd
samtal
definiera
lager
inflytande
tillfälle
slutligen
programvara
mycket
utbyte
brist
skaka
studera
koncept
blå
stjärna
radio
ingen
arrangemang
undersöka
fågel
grön
band
kön
finger
förflutna
oberoende
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
n
n
n
n
n
n
v
adv
n
v
n
a
n
adv
n
n
v
n
n
n
n
n
adv
n
adv
v
n
adv
equipment
north
move
message
fear
afternoon
drink
fully
race
gain
strategy
extra
scene
slightly
kitchen
speech
arise
network
tea
peace
failure
employee
ahead
scale
hardly
attend
shoulder
otherwise
uitrusting
noord
verhuizing
bericht
angst
middag
drinken
volledig
ras
verkrijgen
strategie
extra
scène
enigszins
keuken
toespraak
opkomen
netwerk
thee
vrede
mislukking
werknemer
vooruit
schaal
nauwelijks
bijwonen
schouder
anders
ausrüstung
Norden
umzug
nachricht
angst
nachmittag
trinken
völlig
rasse
gewinnen
strategie
extra
szene
ein wenig
küche
rede
entstehen
netzwerk
tee
frieden
ausfall
arbeitnehmer
voraus
umfang
kaum
besuchen
schulter
sonst
136
udstyr
nord
flytning
besked
angst
eftermiddag
drikke
fuldt
race
vinde
strategi
ekstra
scene
lidt
køkken
tale
opstå
netværk
te
fred
fiasko
medarbejder
forude
skala
næppe
deltage
skulder
ellers
utrustning
nord
flyttning
meddelande
rädsla
eftermiddag
dricka
fullt
ras
vinna
strategi
extra
scen
lätt
kök
tal
uppstå
nätverk
te
fred
misslyckande
arbetstagare
framåt
skala
knappt
delta
skuldra
annars
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
n
adv
n
n
n
v
n
a
n
n
n
a
n
n
v
a
v
n
n
n
v
n
n
v
n
n
adv
n
railway
directly
supply
expression
owner
associate
corner
past
match
sport
status
beautiful
offer
marriage
hang
civil
perform
sentence
crime
ball
marry
wind
truth
protect
safety
partner
completely
copy
spoorwegen
direct
voorziening
uitdrukking
eigenaar
associëren
hoek
vroeger
wedstrijd
sport
status
mooi
aanbod
huwelijk
hangen
burgerlijk
uitvoeren
zin
misdaad
bal
trouwen
wind
waarheid
beschermen
veiligheid
partner
volledig
kopie
eisenbahn
direkt
belieferung
ausdruck
besitzer
assoziieren
ecke
vergangen
match
sport
status
schön
angebot
ehe
hängen
bürgerlich
ausführen
satz
verbrechen
ball
heiraten
wind
wahrheit
schützen
sicherheit
partner
völlig
kopie
137
jernbane
direkte
forsyning
udtryk
ejer
associere
hjørne
tidligere
match
sport
status
smuk
tilbud
ægteskab
hænge
civil
udføre
sætning
forbrydelse
bold
gifte sig
vind
sandhed
beskytte
sikkerhed
partner
helt
kopi
järnväg
direkt
tillförsel
uttryck
ägare
associera
hörn
förgången
match
sport
status
vacker
anbud
äktenskap
hänga
civil
utföra
mening
brott
boll
gifta sig
vind
sanning
skydda
säkerhet
partner
helt
kopia
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
n
n
n
adv
n
n
n
v
n
n
a
a
a
n
v
n
n
n
a
pron
n
n
a
n
n
a
n
n
balance
sister
reader
below
trial
rock
damage
adopt
newspaper
meaning
light
essential
obvious
nation
confirm
south
length
branch
deep
none
planning
trust
working
pain
studio
positive
spirit
college
balans
zus
lezer
onder
proces
rots
schade
adopteren
krant
betekenis
licht
essentieel
duidelijk
natie
bevestigen
zuid
lengte
tak
diep
geen
planning
vertrouwen
werkend
pijn
studio
positief
geest
college
balance
schwester
leser
unten
prozess
fels
schaden
adoptieren
zeitung
bedeutung
licht
wesentlich
offensichtlich
nation
bestätigen
süden
länge
ast
tief
kein
planung
vertrauen
arbeitend
schmerz
studio
positiv
geist
college
138
balance
søster
læser
nedenunder
proces
klippe
skade
adoptere
avis
betydning
lys
afgørende
åbenbar
nation
bekræfte
syd
længde
gren
dyb
ingen
planlægning
tillid
arbejdende
smerte
studie
positiv
ånd
kollegium
balans
syster
läsare
nedan
process
klippa
skada
adoptera
tidning
betydelse
ljus
väsentlig
uppenbar
nation
bekräfta
syd
längd
gren
djup
ingen
planering
tillit
arbetande
smärta
studio
positiv
ande
college
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
n
n
v
n
n
v
v
adv
n
adv
n
n
n
v
n
a
n
a
n
v
n
a
v
v
n
n
n
adv
accident
hope
mark
works
league
clear
imagine
through
cash
normally
play
strength
train
travel
target
very
pair
male
gas
issue
contribution
complex
supply
beat
artist
agent
presence
along
ongeluk
hoop
markeren
werken
competitie
ontruimen
voorstellen
door
kleingeld
normaal gesproken
toneelstuk
kracht
trein
reizen
doel
zeer
paar
mannelijk
gas
uitgeven
contributie
complex
voorzien
slaan
artiest
agent
aanwezigheid
langs
139
unfall
hoffnung
markieren
werke
liga
löschen
vorstellen
durch
bargeld
normalerweise
theaterstück
kraft
zug
reisen
ziel
sehr
paar
männlich
gas
ausgeben
beitrag
komplex
beliefern
schlagen
artist
agent
anwesenheit
entlang
ulykke
håb
mærke
værker
liga
rense
forestille
gennem
kontanter
normalt
spil
styrke
tog
rejse
mål
meget
par
mandlig
gas
udstede
bidrag
kompleks
forsyne
slå
kunstner
agent
tilstedeværelse
langs
olycka
hopp
markera
verk
liga
rensa
föreställa
igenom
kontanter
normalt
pjäs
styrka
tåg
resa
mål
mycket
par
manlig
gas
ge ut
bidrag
komplex
förse
slå
artist
agent
närvaro
längs
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
a
v
n
n
n
v
n
a
v
v
n
adv
adv
n
a
adv
n
n
v
v
n
n
n
v
n
a
n
n
environmental
strike
contact
protection
beginning
demand
media
relevant
employ
shoot
executive
slowly
relatively
aid
huge
late
speed
review
test
order
route
consequence
telephone
release
proportion
primary
consideration
reform
betreffende het milieu
staken
contact
bescherming
begin
eisen
media
relevant
in dienst hebben
schieten
bestuur
langzaam
relatief
hulp
enorm
laat
snelheid
recensie
testen
bestellen
route
gevolg
telefoon
loslaten
proportie
primair
overweging
hervorming
140
umweltstreiken
kontakt
schutz
anfang
fordern
medien
relevant
beschäftigen
schiessen
exekutive
langsam
relativ
hilfe
riesig
spät
geschwindigkeit
rezension
testen
bestellen
route
folge
telefon
loslassen
proportion
primär
überlegung
reform
miljømæssige
strejke
kontakt
beskyttelse
begyndelse
kræve
medier
relevant
ansætte
skyde
udøvende
langsomt
relativt
bistand
kæmpe
sent
hastighed
bedømmelse
teste
bestille
rute
konsekvens
telefon
løslade
andel
primær
overvejelse
reform
miljöbetingad
strejka
kontakt
skydd
början
kräva
media
relevant
anställa
skjuta
utövande
långsamt
relativt
bistånd
enorm
sent
hastighet
recension
testa
beställa
rutt
konsekvens
telefon
släppa
proportion
primär
övervägande
reform
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
n
a
a
det
a
a
a
v
adv
v
n
n
adv
n
n
a
v
n
n
a
v
n
n
a
a
n
n
adv
driver
annual
nuclear
latter
practical
commercial
rich
emerge
apparently
ring
distance
exercise
close
skin
island
separate
aim
danger
credit
usual
link
candidate
track
safe
interested
assessment
path
merely
bestuurder
jaarlijks
nucleair
laatste
praktisch
commercieel
rijk
tevoorschijn komen
blijkbaar
rinkelen
afstand
oefening
dichtbij
huid
eiland
verschillend
richten
gevaar
krediet
gewoonlijk
koppelen
kandidaat
spoor
veilig
geïnteresseerd
beoordeling
pad
slechts
141
fahrer
jährlich
nuklear
letzter
praktisch
kommerziell
reich
entstehen
offenbar
klingeln
abstand
übung
nah
haut
insel
getrennt
zielen
gefahr
kredit
gewöhnlich
verbinden
kandidat
spur
sicher
interessiert
beurteilung
pfad
nur
chauffør
årlig
nukleare
sidstnævnte
praktisk
kommerciel
rig
opstå
tilsyneladende
ringe
afstand
øvelse
tæt
hud
ø
særskilt
sigte
fare
kredit
sædvanlig
forbinde
kandidat
spor
sikker
interesserede
vurdering
sti
blot
förare
årlig
nukleär
senare
praktisk
kommersiell
rik
uppstå
tydligen
ringa
avstånd
övning
tätt
hud
ö
särskild
sikta
fara
kredit
vanlig
länka
kandidat
spår
säker
intresserad
bedömning
stig
endast
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
prep
n
a
n
n
v
n
v
n
n
n
n
n
n
n
v
v
v
n
a
n
n
v
n
n
n
a
v
plus
district
regular
reaction
impact
collect
debate
lay
rise
belief
conclusion
shape
vote
aim
politics
reply
press
approach
file
western
earth
public
survive
estate
boat
prison
additional
settle
plus
district
regulier
reactie
impact
verzamelen
debat
leggen
stijging
geloof
conclusie
vorm
stem
doel
politiek
beantwoorden
drukken
benaderen
bestand
westers
aarde
publiek
overleven
landgoed
boot
gevangenis
extra
vestigen
plus
bezirk
regulär
reaktion
auswirkung
sammlen
debatte
legen
anstieg
glaube
schlussfolgerung
form
stimme
ziel
politik
antworten
drücken
angehen
datei
westlich
erde
publik
überleben
landgut
boot
gefängnis
zusätzlich
siedeln
142
plus
distrikt
regelmæssig
reaktion
indvirkning
samle
debat
lægge
stigning
tro
konklusion
form
stemme
mål
politik
besvare
trykke
nærme sig
fil
vestlig
jord
publikum
overleve
ejendom
båd
fængsel
ekstra
bosætte sig
plus
distrikt
regelbunden
reaktion
inverkan
samla
debatt
lägga
stigande
tro
slutsats
form
röst
mål
politik
svara
trycka
närma sig
fil
västlig
jord
publik
överleva
egendom
båt
fängelse
ytterligare
bosätta sig
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
adv
n
v
v
v
conj
adv
pron
n
n
n
a
n
n
n
n
adv
a
n
v
a
adv
v
n
v
n
n
v
largely
wine
observe
limit
deny
for
straight
somebody
writer
weekend
clothes
active
sight
video
reality
hall
nevertheless
regional
vehicle
worry
powerful
possibly
cross
colleague
charge
lead
farm
respond
grotendeels
wijn
observeren
beperken
ontkennen
want
recht
iemand
schrijver
weekend
kleren
actief
zicht
video
realiteit
hal
desondanks
regionaal
vervoermiddel
zorgen maken
krachtig
eventueel
kruisen
collega
opladen
leiding
boerderij
reageren
grossenteils
wein
beobachten
begrenzen
leugnen
denn
gerade
jemand
autor
wochenende
kleider
aktiv
sicht
video
realität
halle
trotzdem
regional
fahrzeug
sich sorgen
mächtig
möglicherweise
überqueren
kollege
aufladen
führung
bauernhof
reagieren
143
i vid udstrækning
vin
observere
begrænse
benægte
for
lige
nogen
forfatter
weekend
tøj
aktiv
syn
video
realitet
hal
alligevel
regional
køretøj
bekymre
kraftfuld
muligvis
krydse
kollega
oplade
ledelse
gård
reagere
till stor del
vin
observera
begränsa
förneka
för
rakt
någon
författere
helg
kläder
aktiv
syn
video
realitet
hall
icke destro mindre
regional
fordon
bekymra
kraftfull
möjligtvis
korsa
kollega
ladda
ledning
gård
reagera
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
n
adv
n
n
n
v
v
v
n
n
n
n
n
n
n
n
a
n
n
v
v
a
adv
n
n
adv
n
prep
employer
carefully
understanding
connection
comment
grant
concentrate
ignore
phone
hole
insurance
content
confidence
sample
transport
objective
alone
flower
injury
lift
stick
front
mainly
battle
generation
currently
winter
inside
werkgever
voorzichtig
begrip
verbinding
commentaar
toewijzen
concentreren
negeren
telefoon
gat
verzekering
inhoud
vertrouwen
monster
transport
doelstelling
alleen
bloem
blessure
optillen
steken
voor
hoofdzakelijk
gevecht
generatie
huidig
winter
binnen
arbeitgeber
vorsichtig
verständnis
verbindung
kommentar
gewähren
konzentrieren
ignorieren
telefon
loch
versicherung
inhalt
vertrauen
probe
transport
zielsetzung
allein
blume
verletzung
heben
stecken
vordere
hauptsächlich
schlacht
generation
momentan
winter
innerhalb
144
arbejdsgiver
omhyggeligt
forståelse
forbindelse
kommentar
skænke
koncentrere
ignorere
telefon
hul
forsikring
indhold
tillid
prøve
transport
hensigt
alene
blomst
kvæstelse
løfte
stikke
for
hovedsagelig
kamp
generation
øjeblikket
vinter
indenfor
arbetsgivare
försiktigt
förståelse
förbindelse
kommentar
tillmötesgå
koncentrera
ignorera
telefon
hål
försäkring
innehåll
tillit
prov
transport
mål
allena
blomma
skada
lyfta
sticka
främre
huvudsakligen
strid
generation
för närvarande
vinter
inuti
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
a
adv
v
n
v
n
n
n
n
n
det
n
n
a
n
n
a
v
n
a
n
n
n
v
v
a
a
n
impossible
somewhere
arrange
will
sleep
progress
volume
ship
legislation
commitment
enough
conflict
bag
fresh
entry
smile
fair
promise
introduction
senior
manner
background
key
touch
vary
sexual
ordinary
cabinet
onmogelijk
ergens
regelen
testament
slapen
voortgang
volume
schip
wetgeving
verplichting
genoeg
conflict
tas
vers
binnenkomst
glimlach
eerlijk
beloven
introductie
senior
manier
achtergrond
sleutel
aanraken
variëren
seksueel
gewoon
kabinet
unmöglich
irgendwo
anordnen
testament
schlafen
fortschritt
volumen
schiff
gesetzgebung
verpflichtung
genug
konflikt
tasche
frisch
eintritt
lächeln
fair
versprechen
einleitung
älter
weise
hintergrund
schlüssel
berühren
variieren
sexuell
gewöhnlich
kabinett
145
umulig
et eller andet sted
arrangere
testamente
sove
fremskridt
volumen
skib
lovgivning
forpligtelse
nok
konflikt
taske
frisk
indgang
smil
retfærdig
love
introduktion
senior
måde
baggrund
nøgle
berøre
variere
seksuel
almindelig
kabinet
omöjlig
någonstans
ordna
testamente
sova
framsteg
volym
skepp
lagstiftning
förpliktelse
nog
konflikt
påse
färsk
inträde
leende
rättvis
lova
introduktion
senior
sätt
bakgrund
nyckel
beröra
variera
sexuell
vanlig
kabinett
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
n
adv
n
adv
n
adv
n
v
n
a
n
n
n
n
n
v
n
n
adv
n
a
v
n
n
a
n
n
n
painting
entirely
engine
previously
administration
tonight
adult
prefer
author
actual
song
investigation
debt
visitor
forest
repeat
wood
contrast
extremely
wage
domestic
commit
threat
bus
warm
sir
regulation
drink
schilderij
geheel
motor
vorig
administratie
vanavond
volwassene
de voorkeur geven
auteur
eigenlijk
lied
onderzoek
schuld
bezoeker
bos
herhalen
hout
contrast
extreem
loon
huiselijk
begaan
bedreiging
bus
warm
mijnheer
regulatie
drank
146
malerei
vollständig
motor
vorher
administration
heute abend
erwachsene
bevorzugen
autor
tatsächlich
lied
untersuchung
schuld
besucher
wald
wiederholen
holz
kontrast
äusserst
lohn
häuslich
begehen
bedrohung
bus
warm
herr
regulierung
getränk
maleri
helt
motor
tidligere
administration
i aften
voksen
foretrække
forfatter
faktisk
sang
undersøgelse
gæld
besøgende
skov
gentage
træ
kontrast
ekstremt
løn
huslig
begå
trussel
bus
varm
herre
regulering
drik
målning
helt
motor
förut
administration
i kväll
vuxen
föredra
författere
faktisk
sång
undersökning
skuld
besökare
skog
upprepa
trä
kontrast
extremt
lön
huslig
begå
hot
buss
varm
min herre
reglering
dryck
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
n
a
a
a
n
adv
a
n
v
v
pron
n
prep
n
n
n
n
v
a
n
n
adv
n
a
n
v
n
n
relief
internal
strange
excellent
run
fairly
technical
tradition
measure
insist
his
farmer
until
traffic
dinner
consumer
meal
warn
living
package
half
increasingly
description
soft
stuff
award
existence
improvement
opluchting
intern
vreemd
uitmuntend
loop
aardig
technisch
traditie
meten
staan op
zijn
boer
tot
verkeer
diner
consument
maaltijd
waarschuwen
levend
pakket
helft
in toenemende mate
beschrijving
zacht
spullen
toekennen
bestaan
verbetering
147
erleichterung
intern
seltsam
ausgezeichnet
lauf
ziemlich
technisch
tradition
messen
bestehen
sein
bauer
bis
verkehr
abendessen
verbraucher
mahlzeit
warnen
lebendig
paket
hälfte
zunehmend
beschreibung
weich
sachen
vergeben
existenz
verbesserung
lettelse
intern
mærkelig
fremragende
løbe
temmelig
teknisk
tradition
måle
insistere
hans
landmand
indtil
trafik
middag
forbruger
måltid
advare
levende
pakke
halvdel
i stigende grad
beskrivelse
blød
ting
tilkende
eksistens
forbedring
lättnad
intern
konstig
utmärkt
lopp
ganska
teknisk
tradition
mäta
insistera
hans
bonde
till
trafik
middag
konsument
måltid
varna
levande
paket
halva
alltmer
beskrivning
mjuk
saker
tilldela
existens
förbättring
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
n
n
a
v
n
n
n
adv
n
a
n
v
n
n
v
n
v
a
n
n
a
n
n
a
a
v
a
a
coffee
appearance
standard
attack
sheet
category
distribution
equally
session
cultural
loan
bind
museum
conversation
threaten
link
launch
proper
victim
audience
famous
master
lip
religious
joint
cry
potential
broad
koffie
voorkomen
standaard
aanvallen
vel
categorie
verdeling
gelijk
sessie
cultureel
lening
binden
museum
conversatie
bedreigen
link
lanceren
eigenlijk
slachtoffer
publiek
beroemd
meester
lip
religieus
gezamenlijk
huilen
potentieel
breed
kaffee
aussehen
üblich
angreifen
blatt
kategorie
verteilung
gleichermassen
sitzung
kulturell
darlehen
binden
museum
gespräch
bedrohen
link
lancieren
richtig
opfer
publikum
berühmt
meister
lippe
religiös
gemeinsam
weinen
potenziell
breit
148
kaffe
udseende
standard
angribe
ark
kategori
fordeling
ligelig
session
kulturel
lån
binde
museum
samtale
true
link
lancere
korrekt
offer
publikum
berømt
mester
læbe
religiøs
fælles
græde
potentiel
bred
kaffe
utseende
standard
anfalla
ark
kategori
fördelning
lika
session
kulturell
lån
binda
museum
konversation
hota
länk
lansera
rätt
offer
publik
berömd
mästare
läpp
religiös
gemensam
gråta
potential
bred
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
n
v
n
a
n
prep
v
n
n
v
n
v
v
n
a
a
v
v
n
n
n
a
v
adv
n
n
v
v
exhibition
experience
judge
formal
housing
past
concern
freedom
gentleman
attract
explanation
appoint
note
total
lovely
official
date
demonstrate
construction
middle
yard
unable
acquire
surely
crisis
west
impose
care
tentoonstelling
ervaren
rechter
formeel
huisvesting
voorbij
betreffen
vrijheid
heer
aantrekken
uitleg
benoemen
opmerken
totaal
lieflijk
officieel
daten
demonstreren
constructie
midden
tuin
niet in staat
verkrijgen
vast
crisis
westen
opleggen
zorgen
ausstellung
erfahren
richter
formal
gehäuse
vorüber
betreffen
freiheit
herr
anziehen
erklärung
ernennen
beachten
summe
lieblich
offiziell
ausgehen mit
demonstrieren
konstruktion
mitte
hof
unfähig
erwerben
sicherlich
krise
westen
verhängen
sich kümmern
149
udstilling
opleve
dommer
formel
boliger
forbi
vedrøre
frihed
herre
tiltrække
forklaring
udnævne
bemærke
total
dejlig
officiel
date
demonstrere
konstruktion
midte
gård
ude af stand
erhverve
sikkert
krise
vest
pålægge
pleje
utställning
uppleva
domare
formell
bostäder
förbi
angå
frihet
herre
attrahera
förklaring
utnämna
märka
summa
härlig
officiell
sällskapa
demonstrera
konstruktion
mitt
gård
oförmögen
förvärva
säkert
kris
väst
ålägga
bry sig om
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
n
n
adv
v
a
n
n
n
adv
n
n
n
a
adv
v
conj
v
v
n
v
pron
a
v
n
n
a
prep
v
god
favour
before
name
equal
capacity
flat
selection
alone
football
victory
factory
rural
twice
sing
whereas
own
head
examination
deliver
nobody
substantial
invite
intention
egg
reasonable
onto
retain
god
gunst
voor
noemen
gelijk
capaciteit
flat
selectie
alleen
voetbal
overwinning
fabriek
landelijk
twee keer
zingen
terwijl
bezitten
leiden
examinatie
bezorgen
niemand
substantieel
uitnodigen
intentie
ei
redelijk
op
behouden
gott
gunst
zuvor
nennen
gleich
kapazität
wohnung
auswahl
allein
fussball
sieg
fabrik
ländlich
zweimal
singen
während
besitzen
leiten
prüfung
liefern
niemand
wesentlich
einladen
intention
ei
angemessen
auf
behalten
150
gud
gunst
førend
nævne
lig
kapacitet
lejlighed
udvælgelse
alene
fodbold
sejr
fabrik
landlig
to gange
synge
hvorimod
eje
lede
gennemgang
levere
ingen
væsentlig
indbyde
hensigt
æg
rimelig
på
beholde
gud
gunst
före
nämna
lika
kapacitet
lägenhet
urval
endast
fotboll
seger
fabrik
lantlig
två gånger
sjunga
då däremot
äga
leda
granskning
leverera
ingen
väsentlig
inbjuda
avsikt
ägg
rimlig
på
behålla
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
n
n
a
a
a
v
n
n
n
n
n
v
n
aircraft
decade
cheap
quiet
bright
contribute
row
search
limit
definition
unemployment
spread
mark
vliegtuig
decennium
goedkoop
stil
helder
bijdragen
rij
zoektocht
limiet
definitie
werkloosheid
spreiden
merkteken
flugzeug
jahrzehnt
billig
still
hell
beitragen
reihe
suche
limit
definition
arbeitslosigkeit
verbreiten
marke
151
fly
årti
billig
stille
lys
bidrage
række
eftersøgning
grænse
definition
arbejdsløshed
sprede
mærke
flygplan
decennium
billig
stilla
ljus
bidra
rad
sökande
gräns
definition
arbetslöshet
sprida
märke