Machine Learning for Machine Translation

Transcription

Machine Learning for
Machine Translation
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
ISI Kolkata,
Jan 6, 2014
Jan 6, 2014
ISI Kolkata, ML for MT
1
NLP Trinity
Problem
NLP
Trinity
Semantics
Parsing
Part of Speech
Tagging
Morph
Analysis
Marathi
French
HMM
Hindi
CRF
English
Language
MEMM
Algorithm
Jan 6, 2014
2
NLP Layer
Discourse and Corefernce
Increased
Complexity
Of
Processing
Semantics Extraction
Parsing
Chunking
POS tagging
Morphology
Jan 6, 2014
3
Motivation for MT





MT: NLP Complete
NLP: AI complete
AI: CS complete
How will the world be different when the
language barrier disappears?
Volume of text required to be translated
currently exceeds translators’ capacity
(demand > supply).
 Solution: automation
Jan 6, 2014
4
Taxonomy of MT systems
MT
Approaches
Data driven;
Machine
Learning
Based
Knowledge
Based;
Rule Based MT
Interlingua Based
Jan 6, 2014
Transfer Based
Example Based
MT (EBMT)
Statistical MT
5
Why is MT difficult?
Language divergence
Jan 6, 2014
6
Why is MT difficult: Language
Divergence

One of the main complexities of MT:
Language Divergence

Languages have different ways of
expressing meaning


Lexico-Semantic Divergence
Structural Divergence
Our work on English-IL Language Divergence with
illustrations from Hindi
(Dave, Parikh, Bhattacharyya, Journal of MT, 2002)
Jan 6, 2014
7
Languages differ in expressing
thoughts: Agglutination
Finnish: “istahtaisinkohan”
English: "I wonder if I should sit down for a while“
Analysis:

ist +
"sit", verb stem

ahta + verb derivation morpheme, "to do something for a
while"

isi +
conditional affix

n+
1st person singular suffix

ko +
question particle

han
a particle for things like reminder (with declaratives) or
"softening" (with questions and imperatives)
Jan 6, 2014
8
Language Divergence Theory: LexicoSemantic Divergences (few examples)


Conflational divergence
 F: vomir; E: to be sick
 E: stab; H: chure se maaranaa (knife-with hit)
 S: Utrymningsplan; E: escape plan
Categorial divergence
 Change is in POS category:


The play is on_PREP (vs. The play is Sunday)
Khel chal_rahaa_haai_VM (vs. khel ravivaar ko haai)
Jan 6, 2014
9
Language Divergence Theory:
Structural Divergences


SVOSOV
 E: Peter plays basketball
 H: piitar basketball kheltaa haai
Head swapping divergence
 E: Prime Minister of India
 H: bhaarat ke pradhaan mantrii (India-of Prime
Minister)
Jan 6, 2014
10
Language Divergence Theory:
Syntactic
Divergences (few examples)

Constituent Order divergence
 E: Singh, the PM of India, will address the nation


today
H: bhaarat ke pradhaan mantrii, singh, … (India-of
PM, Singh…)
Adjunction Divergence
 E: She will visit here in the summer
 H: vah yahaa garmii meM aayegii (she here summer-
in will come)

Preposition-Stranding divergence
 E: Who do you want to go with?
 H: kisake saath aap jaanaa chaahate ho? (who
with…)
Jan 6, 2014
11
Vauquois Triangle
Jan 6, 2014
12
Kinds of MT Systems
(point of entry from source to the target text)
Jan 6, 2014
13
Illustration of transfer
SVOSOV
S
S
NP
VP
NP
N
V
NP
John
eats
N
(transfer
N
svo sov)
John
NP
N
V
eats
bread
bread
Jan 6, 2014
VP
14
Universality hypothesis
Universality hypothesis: At the
level of “deep meaning”, all texts
are the “same”, whatever the
language.
Jan 6, 2014
15
Understanding the Analysis-TransferGeneration over Vauquois triangle (1/4)
H1.1: सरकार_ने चन
ु ावो_के_बाद मुंब
ु ई में करों_के_माध्यम_से
अपने राजस्व_को बढ़ाया |
T1.1: Sarkaar ne chunaawo ke baad Mumbai me karoM ke
maadhyam se apne raajaswa ko badhaayaa
G1.1: Government_(ergative) elections_after Mumbai_in
taxes_through its revenue_(accusative) increased
E1.1: The Government increased its revenue after the
elections through taxes in Mumbai
Jan 6, 2014
16
Interlingual representation:
complete disambiguation

Washington voted Washington to power
Vote
@past
<is-a >
action
goal
Washingto
n
<is-a >
place
Jan 6, 2014
power
<is-a >
capability
<is-a > …
Washington
@emphasis
<is-a >
person
17
Kinds of disambiguation needed for a
complete and correct interlingua graph






N: Name
P: POS
A: Attachment
S: Sense
C: Co-reference
R: Semantic Role
Jan 6, 2014
18
Issues to handle
Sentence: I went with my friend, John, to the bank to withdraw
some money but was disappointed to find it closed.
ISSUES
Jan 6, 2014
Part Of Speech
Noun or Verb
19
Issues to handle
ISSUES
Part Of Speech
NER
Jan 6, 2014
John is the
name of a
PERSON
20
Issues to handle
ISSUES
Part Of Speech
NER
WSD
Jan 6, 2014
Financial
bank or River
bank
21
Issues to handle
ISSUES
Part Of Speech
“it”  “bank” .
NER
WSD
Co-reference
Jan 6, 2014
22
Issues to handle
ISSUES
Part Of Speech
NER
Pro drop
(subject “I”)
WSD
Co-reference
Subject Drop
Jan 6, 2014
23
Typical NLP tools used





POS tagger
Stanford Named Entity Recognizer
Stanford Dependency Parser
XLE Dependency Parser
Lexical Resource


WordNet
Universal Word Dictionary (UW++)
Jan 6, 2014
24
System Architecture
NER
Simple Sentence
Analyser
Stanford
Dependency
Parser
Stanford Dependency Parser
XLE Parser
Clause
Marker
Feature
Generation
WSD
Simplifier
Simple
Enco.
Simple
Enco.
Simple
Enco.
Simple
Enco.
Simple
Enco.
Relation
Generation
Merger
Jan 6, 2014
Attribute
Generation
25
Target Sentence Generation
from interlingua
Target Sentence
Generation
Lexical Transfer
(Word/Phrase
Translation )
Jan 6, 2014
Morphological
Synthesis
(Word form
Generation)
Syntax
Planning
(Sequence)
26
Generation Architecture
Deconversion = Transfer + Generation
Jan 6, 2014
27
Transfer Based MT
Marathi-Hindi
Jan 6, 2014
28
Indian Language to Indian Language
Machine Translation (ILILMT)



Bidirectional Machine Translation System
Developed for nine Indian language pairs
Approach:


Transfer based
Modules developed using both rule based and
statistical approach
Jan 6, 2014
29
Architecture of ILILMT System
Source Text
Target Text
Morphological
Analyzer
Word
Generator
POS Tagger
Analysis
Interchunk
Chunker
Intrachunk
Vibhakti
Computation
Name Entity
Recognizer
Word Sense
Disambiguatio
n
Jan 6, 2014
Generation
Agreement
Feature
Transfer
Lexical
Transfer
30
M-H MT system:
Evaluation


Subjective evaluation based on machine
translation quality
Accuracy calculated based on score given by
linguists
Score : 5 Correct Translation
Score : 4 Understandable
with
minor errors
Score : 3 Understandable
major errors
Score : 2 Not Understandable
S5: Number of score 5 Sentences,
S4: Number of score 4 sentences,
S3: Number of score 3 sentences,
N: Total Number of sentences
with
Accuracy =
Score : 1 Non sense translation
Jan 6, 2014
31
Evaluation of Marathi to Hindi
MT System
1.2
1
0.8
0.6
Precision
Recall
0.4
0.2
0
Morph
Analyzer
POS Tagger
Chunker
Vibhakti
Compute
WSD
Lexical
Transfer
Word
Generator
Module-wise precision and recall
Jan 6, 2014
32
Evaluation of Marathi to Hindi
MT System (cont..)

Subjective evaluation on translation quality




Evaluated on 500 web sentences
Accuracy calculated based on score given according to the
translation quality.
Accuracy: 65.32 %
Result analysis:


Morph, POS tagger, chunker gives more than 90% precision
but Transfer, WSD, generator modules are below 80%
hence degrades MT quality.
Also, morph disambiguation, parsing, transfer grammar and
FW disambiguation modules are required to improve
accuracy.
Jan 6, 2014
33
Statistical Machine Translation
Jan 6, 2014
34
Czeck-English data






[nesu]
[ponese]
[nese]
[nesou]
[yedu]
[plavou]
Jan 6, 2014
“I carry”
“He will carry”
“He carries”
“They carry”
“I drive”
“They swim”
35
To translate …




I will carry.
They drive.
He swims.
They will drive.
Jan 6, 2014
36
Hindi-English data






[DhotA huM]
[DhoegA]
[DhotA hAi]
[Dhote hAi]
[chalAtA huM]
[tErte hEM]
Jan 6, 2014
“I carry”
“He will carry”
“He carries”
“They carry”
“I drive”
“They swim”
37
Bangla-English data






[bai]
[baibe]
[bay]
[bay]
[chAlAi]
[sAMtrAy]
Jan 6, 2014
“I carry”
“He will carry”
“He carries”
“They carry”
“I drive”
“They swim”
38
To translate … (repeated)




I will carry.
They drive.
He swims.
They will drive.
Jan 6, 2014
39
Foundation




Data driven approach
Goal is to find out the English sentence e
given foreign language sentence f whose
p(e|f) is maximum.
Translations are generated on the basis
of statistical model
Parameters are estimated using bilingual
parallel corpora
Jan 6, 2014
40
SMT: Language Model


To detect good English sentences
Probability of an English sentence w1w2 …… wn can be
written as
Pr(w1w2 …… wn) = Pr(w1) * Pr(w2|w1) *. . . * Pr(wn|w1 w2 . . . wn-1)

Here Pr(wn|w1 w2 . . . wn-1) is the probability that word wn
follows word string w1 w2 . . . wn-1.


N-gram model probability
Trigram model probability calculation
Jan 6, 2014
41
SMT: Translation Model


P(f|e): Probability of some f given hypothesis English
translation e
How to assign the values to p(e|f) ?
Sentence level


Sentences are infinite, not possible to find pair(e,f) for all sentences
Introduce a hidden variable a, that represents alignments
between the individual words in the sentence pair
Word level
Jan 6, 2014
42
Alignment


If the string, e= e1l= e1 e2 …el, has l words, and
the string, f= f1m=f1f2...fm, has m words,
then the alignment, a, can be represented by a
series, a1m= a1a2...am , of m values, each between
0 and l such that if the word in position j of the
f-string is connected to the word in position i of
the e-string, then

aj= i, and

if it is not connected to any English word, then
aj= O
Jan 6, 2014
43
Example of alignment
English: Ram went to school
Hindi: Raama paathashaalaa gayaa
Ram
went
<Null> Raama
Jan 6, 2014
to
school
paathashaalaa gayaa
44
Translation Model: Exact expression
Choose the length
of foreign language
string given e
Choose alignment
given e and m
Choose the identity
of foreign word
given e, m, a

Five models for estimating parameters in the expression [2]

Model-1, Model-2, Model-3, Model-4, Model-5
Jan 6, 2014
45
Proof of Translation Model: Exact
expression
Pr( f | e)   Pr( f , a | e)
; marginalization
a
; marginalization
Pr( f , a | e)   Pr( f , a, m | e)
m
Pr( f , a, m | e)   Pr(m | e) Pr( f , a | m, e)
m
  Pr(m | e) Pr( f , a | m, e)
m
m
  Pr(m | e) Pr( f j , a j | a1j 1 , f1 j 1 , m, e)
m
j 1
m
  Pr(m | e) Pr(a j | a1j 1 , f1 j 1 , m, e) Pr( f j | a1j , f1 j 1 , m, e)
m
j 1
m is fixed for a particular f, hence
m
Pr( f , a, m | e)  Pr(m | e)  Pr(a j | a1j 1 , f1 j 1 , m, e) Pr( f j | a1j , f1 j 1 , m, e)
j 1
Jan 6, 2014
46
Alignment
Jan 6, 2014
47
Fundamental and ubiquitous





Spell checking
Translation
Transliteration
Speech to text
Text to speeh
Jan 6, 2014
48
EM for word alignment from
sentence alignment: example
English
(1) three rabbits
a
b
French
(1) trois lapins
w
x
(2) rabbits of Grenoble
b
c
d
(2) lapins de Grenoble
x
y
z
Jan 6, 2014
49
Initial Probabilities:
each cell denotes t(a w), t(a x) etc.
a
b
c
d
w
1/4
1/4
1/4
1/4
x
1/4
1/4
1/4
1/4
y
1/4
1/4
1/4
1/4
z
1/4
1/4
1/4
1/4
The counts in IBM Model 1
Works by maximizing P(f|e) over the entire corpus
For IBM Model 1, we get the following relationship:
t (w f | w e )
c (w | w ; f , e ) =
.
e0
el
f
f
t (w | w ) +… + t (w | w )
f
e
c (w f | w e ; f , e ) is the fractional count of the alignment of w f
with w e in f and e
t (w f | w e ) is the probability of w f being the translation of w e
is the count of w f in f
is the count of w e in e
Jan 6, 2014
51
Example of expected count
C[aw; (a b)(w x)]
t(aw)
=
------------------------- X #(a in ‘a b’) X #(w in ‘w x’)
t(aw)+t(ax)
1/4
= ----------------- X 1 X 1= 1/2
1/4+1/4
Jan 6, 2014
52
“counts”
ab
a
b
c
d
bcd
a
b
c
d


wx
w
1/2
1/2
0
0
xyz
w
0
0
0
0
x
1/2
1/2
0
0
x
0
1/3
1/3
1/3
y
0
0
0
0
y
0
1/3
1/3
1/3
z
0
0
0
0
z
0
1/3
1/3
1/3
Jan 6, 2014
53
Revised probability: example
trevised(a w)
1/2
= ------------------------------------------------------------------(1/2+1/2 +0+0 )(a b)( w x) +(0+0+0+0 )(b c d) (x y z)
Jan 6, 2014
54
Revised probabilities table
a
b
c
d
w
1/2
1/4
0
0
x
1/2
5/12
1/3
1/3
y
0
1/6
1/3
1/3
z
0
1/6
1/3
1/3
“revised counts”
ab
a
b
c
d
bcd
a
b
c
d


wx
w
1/2
3/8
0
0
xyz
w
0
0
0
0
x
1/2
5/8
0
0
x
0
5/9
1/3
1/3
y
0
0
0
0
y
0
2/9
1/3
1/3
z
0
0
0
0
z
0
2/9
1/3
1/3
Jan 6, 2014
56
Re-Revised probabilities table
a
b
c
d
w
1/2
3/16
0
0
x
1/2
85/144
1/3
1/3
y
0
1/9
1/3
1/3
z
0
1/9
1/3
1/3
Continue until convergence; notice that (b,x) binding gets progressively stronger;
b=rabbits, x=lapins
Derivation of EM based Alignment
Expressions
VE  vocalbula ry of language L1 (Say English)
VF  vocabular y of language L2 (Say Hindi)
E1 what
F1
is in a
नाम
में
क्या
naam meM kya
name in what
what is in a
name ?
है ?
hai ?
is ?
name ?
That which we call rose, by any other name will smell as sweet.
E2 जजसे हम गुऱाब कहते हैं, और भी ककसी नाम से उसकी कुशबू सामान मीठा होगी
Jise
hum gulab kahte hai, aur bhi kisi naam se uski khushbu samaan mitha hogii
F2
That which we rose say
, any
other name by its smell as
sweet
That which we call rose, by any other name will smell as sweet.
Jan 6, 2014
58
Vocabulary mapping
Vocabulary
VE
VF
what , is , in, a , name ,
that, which, we , call ,rose,
by, any, other, will, smell,
as, sweet
naam, meM, kya, hai, jise,
hum, gulab, kahte, hai, aur,
bhi, kisi, bhi, uski, khushbu,
saman, mitha, hogii
Jan 6, 2014
59
Key Notations
English vocabulary : 𝑉𝐸
French vocabulary : 𝑉𝐹
No. of observations / sentence pairs : 𝑆
Data 𝐷 which consists of 𝑆 observations looks like,
𝑒11 , 𝑒12 , … , 𝑒1𝑙1 𝑓11 , 𝑓12 , … , 𝑓1𝑚1
𝑒21 , 𝑒22 , … , 𝑒2𝑙2
𝑓21 , 𝑓22 , … , 𝑓2𝑚2
.....
𝑒𝑠1 , 𝑒𝑠2 , … , 𝑒𝑠𝑙𝑠
𝑓𝑠1 , 𝑓𝑠2 , … , 𝑓𝑠𝑚𝑠
.....
𝑒𝑆1 , 𝑒𝑆2 , … , 𝑒𝑆𝑙𝑆
𝑓𝑆1 , 𝑓𝑆2 , … , 𝑓𝑆𝑚𝑆
No. words on English side in 𝑠 𝑡𝑕 sentence : 𝑙 𝑠
No. words on French side in 𝑠 𝑡𝑕 sentence : 𝑚 𝑠
𝑖𝑛𝑑𝑒𝑥𝐸 𝑒𝑠𝑝 =Index of English word 𝑒𝑠𝑝in English vocabulary/dictionary
𝑖𝑛𝑑𝑒𝑥𝐹 𝑓𝑠𝑞 =Index of French word 𝑓𝑠𝑞in French vocabulary/dictionary
(Thanks to Sachin Pawar for helping with the maths formulae processing)
Jan 6, 2014
60
Hidden variables and
parameters
Hidden Variables (Z) :
Total no. of hidden variables = 𝑆𝑠=1 𝑙 𝑠 𝑚 𝑠 where each hidden variable is
as follows:
𝑠 = 1 , if in 𝑠 𝑡𝑕 sentence, 𝑝 𝑡𝑕 English word is mapped to 𝑞𝑡𝑕 French
𝑧𝑝𝑞
word.
𝑠 = 0 , otherwise
𝑧𝑝𝑞
Parameters (Θ) :
Total no. of parameters = 𝑉𝐸 × 𝑉𝐹 , where each parameter is as
follows:
𝑃𝑖,𝑗 = Probability that 𝑖𝑡𝑕 word in English vocabulary is mapped to 𝑗𝑡𝑕
word in French vocabulary
Jan 6, 2014
61
Likelihoods
Data Likelihood L(D; Θ) :
Data Log-Likelihood LL(D; Θ) :
Expected value of Data Log-Likelihood E(LL(D; Θ)) :
Jan 6, 2014
62
Constraint and Lagrangian
𝑉𝐹
𝑃𝑖,𝑗 = 1 , ∀𝑖
𝑗=1
Jan 6, 2014
63
Differentiating wrt Pij
Jan 6, 2014
64
Final E and M steps
M-step
E-step
Jan 6, 2014
65
Combinatorial considerations
Jan 6, 2014
66
Example
Jan 6, 2014
67
All possible alignments
Jan 6, 2014
68
First fundamental requirement
of SMT
Alignment requires evidence of:
• firstly, a translation pair to introduce the
POSSIBILITY of a mapping.
• then, another pair to establish with
CERTAINTY the mapping
Jan 6, 2014
69
For the “certainty”

We have a translation pair containing
alignment candidates and none of the
other words in the translation pair
OR

We have a translation pair containing
all words in the translation pair, except
the alignment candidates
Jan 6, 2014
70
Therefore…

If M valid bilingual mappings exist in a
translation pair then an additional M-1
pairs of translations will decide these
mappings with certainty.
Jan 6, 2014
71
Rough estimate of data
requirement




SMT system between two languages L1 and L2
Assume no a-priori linguistic or world knowledge,
i.e., no meanings or grammatical properties of any
words, phrases or sentences
Each language has a vocabulary of 100,000 words
can give rise to about 500,000 word forms,
through various morphological processes,
assuming, each word appearing in 5 different
forms, on the average

For example, the word „go‟ appearing in „go‟, „going‟,
„went‟ and „gone‟.
Jan 6, 2014
72
Reasons for mapping to
multiple words



Synonymy on the target side (e.g., “to go” in
English translating to “jaanaa”, “gaman karnaa”,
“chalnaa” etc. in Hindi), a phenomenon called
lexical choice or register
polysemy on the source side (e.g., “to go”
translating to “ho jaanaa” as in “her face went red
in anger””usakaa cheharaa gusse se laal ho
gayaa”)
syncretism (“went” translating to “gayaa”, “gayii”,
or “gaye”). Masculine Gender, 1st or 3rd person,
singular number, past tense, non-progressive
aspect, declarative mood
Jan 6, 2014
73
Estimate of corpora
requirement






Assume that on an average a sentence is 10 words
long.
 an additional 9 translation pairs for getting at one
of the 5 mappings
 10 sentences per mapping per word
 a first approximation puts the data requirement at
5 X 10 X 500000= 25 million parallel sentences
Estimate is not wide off the mark
Successful SMT systems like Google and Bing
reportedly use 100s of millions of translation pairs.
Jan 6, 2014
74
Our work on factor based SMT
Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek
Ghosh and Pushpak Bhattacharyya, Case markers and
Morphology: Addressing the crux of the fluency problem in
English-Hindi SMT, ACL-IJCNLP 2009, Singapore, August,
2009.
Jan 6, 2014
75
Case Marker and Morphology crucial
in E-H MT



Order of magnitiude facelift in Fluency
and fidelity
Determined by the combination of
suffixes and semantic relations on the
English side
Augment the aligned corpus of the two
languages, with the correspondence of
English suffixes and semantic relations
with Hindi suffixes and case markers
Jan 6, 2014
76
Semantic relations+SuffixesCase
Markers+inflections
I
ate
mangoes
I {<agt}
ate {eat@past}
I {<agt}
mangoes {<obj.@pl} {eat@past}
mei_ne
aam
Jan 6, 2014
mangoes {<obj}
khaa_yaa
77
Our Approach


Factored model (Koehn and Hoang, 2007) with
the following translation factor:
 suffix
+ semantic relation  case
marker/suffix
Experiments with the following relations:
 Dependency relations from the stanford
parser
 Deeper semantic roles from Universal
Networking Language (UNL)
Jan 6, 2014
78
Our Factorization
Jan 6, 2014
79
Experiments
Jan 6, 2014
80
Corpus Statistics
Jan 6, 2014
81
Results: The impact of suffix and
semantic factors
Jan 6, 2014
82
Results: The impact of reordering and
semantic relations
Jan 6, 2014
83
Subjective Evaluation: The impact of
reordering and semantic relations
Jan 6, 2014
84
Impact of sentence length (F: Fluency;
A:Adequacy; E:# Errors)
Jan 6, 2014
85
A feel for the improvementbaseline
Jan 6, 2014
86
A feel for the improvementreorder
Jan 6, 2014
87
A feel for the improvement-Semantic
relation
Jan 6, 2014
88
A recent study
PAN Indian SMT
Jan 6, 2014
89
Pan-Indian Language SMT
http://www.cfilt.iitb.ac.in/indic-translator

SMT systems between 11 languages




Corpus




7 Indo-Aryan: Hindi, Gujarati, Bengali, Oriya, Punjabi,
Marathi, Konkani
3 Dravidian languages: Malayalam, Tamil, Telugu
English
Indian Language Corpora Initiative (ILCI) Corpus
Tourism and Health Domains
50,000 parallel sentences
Evaluation with BLEU

METEOR scores also show high correlation with BLEU
Jan 6, 2014
90
SMT Systems Trained




Phrase-based (PBSMT) baseline system
(S1)
E-IL PBSMT with Source side
reordering rules (Ramanathan et al.,
2008) (S2)
E-IL PBSMT with Source side
reordering rules (Patel et al., 2013) (S3)
IL-IL PBSMT with transliteration postediting (S4)
Jan 6, 2014
91
Natural Partitioning of SMT systems
Baseline PBSMT - % BLEU scores (S1)

Clear partitioning of translation pairs by language family pairs, based on
translation accuracy.

Shared characteristics within language families make translation simpler

Divergences among language families make translation difficult
Jan 6, 2014
92
The Challenge of Morphology
Morphological complexity vs
BLEU
Training Corpus size vs
BLEU
Vocabulary size is a proxy for morphological complexity
*Note: For Tamil, a smaller corpus was used for computing vocab
size
•
•
Translation accuracy decreases with increasing morphology
Even if training corpus is increased, commensurate improvement in translation
accuracy is not seen for morphologically rich languages
•
Handling morphology in SMT is critical
Jan 6, 2014
93
Common Divergences, Shared Solutions
Comparison of source reordering methods for E-IL SMT - % BLEU scores
(S1,S2,S3)




All Indian languages have similar word order
The same structural divergence between English and Indian
languages SOV<->SVO, etc.
Common source side reordering rules improve E-IL
translation by 11.4% (generic) and 18.6% (Hindi-adapted)
Common divergences can be handled in a common
framework in SMT systems ( This idea has been used for
knowledge based MT systems e.g. Anglabharati )
Jan 6, 2014
94
Harnessing Shared Characteristics
PBSMT+ transliteration post-editing for E-IL SMT - % BLEU scores (S4)




Out of Vocabulary words are transliterated in a post-editing step
Done using a simple transliteration scheme which harnesses the common
phonetic organization of Indic scripts
Accuracy Improvements of 0.5 BLEU points with this simple approach
Harnessing common characteristics can improve SMT output
Jan 6, 2014
95
Cognition and Translation:
Measuring Translation
Difficulty
Abhijit Mishra and Pushpak Bhattacharyya, Automatically
Predicting Sentence Translation Difficulty, ACL 2013, Sofia,
Bulgaria, 4-9 August, 2013
Jan 6, 2014
96
Scenario
Subjective notion of
difficulty
Sentences



John ate jam
John ate jam made
from apples
John is in a jam
Jan 6, 2014

Easy

Moderate

Difficult?
97
Use behavioural data


Use behavioural data to decipher strong
AI algorithms
Specifically,
 For WSD by humans, see where the
eye rests for clues
 For the innate translation difficulty of
sentences, see how the eye moves
back and forth over the sentences
Jan 6, 2014
98
Saccades
Fixations
Jan 6, 2014
99
Image Courtesy: http://www.smashingmagazine.com/2007/10/09/30-usability-issues-to-be-aware-of/
Eye Tracking data





Gaze points : Position of eye-gaze on the
screen
Fixations : A long stay of the gaze on a
particular object on the screen.
 Fixations have both Spatial (coordinates)
and Temporal (duration) properties.
Saccade : A very rapid movement of eye
between the positions of rest.
Scanpath: A path connecting a series of
fixations.
Regression: Revisiting a previously read
segment
Jan 6, 2014
100
Controlling the experimental setup
for eye-tracking


Eye movement patterns influenced by factors like age,
working proficiency, environmental distractions etc.
Guidelines for eye tracking





Participants metadata (age, expertise, occupation)
etc.
Performing a fresh calibration before each new
experiment
Minimizing the head movement
Introduce adequate line spacing in the text and
avoid scrolling
Carrying out the experiments in a relatively low
light environment
Jan 6, 2014
101
Use of eye tracking

Used extensively in Psychology


Mainly to study reading processes
Seminal work: Just, M.A. and Carpenter,
P.A. (1980). A theory of reading: from
eye fixations to comprehension.
Psychological Review 87(4):329–354
Used in flight simulators for pilot
training
Jan 6, 2014

102
NLP and Eye Tracking
research




Kliegl (2011)- Predict word frequency
and pattern from eye movements
Doherty et. al (2010)- Eye-tracking as an
automatic Machine Translation Evaluation
Technique
Stymne et al. (2012)- Eye-tracking as a
tool for Machine Translation (MT) error
analysis
Dragsted (2010)- Co-ordination of
reading and writing process during
translation.
Relatively new and open research direction
Jan 6, 2014
103
Translation Difficulty Index
(TDI)

Motivation: route sentences to
translators with right competence, as
per difficulty of translating


On a crowdsourcing platform, e.g.
TDI is a function of



sentence length (l),
degree of polysemy of constituent words
(p) and
structural complexity (s)
Jan 6, 2014
104
Contributor to TDI: length

What is more difficult to translate?

John eats jam


John eats jam made from apples


vs.
John eats jam made from apples grown in
orchards


vs.
vs.
John eats bread made from apples grown
in orchards on black soil
Jan 6, 2014
105
Contributor to TDI: polysemy


John is in a jam



vs.
John is in difficulty
Jam has 4 diverse senses, difficulty has 4
related senses
Jan 6, 2014
106
Contributor to TDI: structural
complexity


John is in a jam. His debt is huge. The lenders
cause him to shy from them, every moment
he sees them.


vs.
John is in a jam, caused by his huge debt,
which forces him to shy from his lenders
every moment he sees them.
Jan 6, 2014
107
Measuring translation through
Gaze data

Translation difficulty indicated by


staying of eye on segments
Jumping back and forth between segments
Example:

Jan 6, 2014
The horse raced past the garden fell
108
Measuring translation difficulty
through Gaze data

Translation difficulty indicated by
staying of eye on segments
 Jumping back and forth between segments
Example:




The horse raced past the garden fell
बगीचा के पास से दौडाया गया घोड़ा गगर गया
bagiichaa ke pas se doudaayaa gayaa ghodaa
gir gayaa
The translation process will complete the task till
garden, and then backtrack, revise, restart and
Jan
6, 2014
Kolkata, ML for MT
translate
in a differentISIway
109
Scanpaths: indicator of translation
difficulty


(Malsburg et. al, 2007)
Sentence 2 is a clear case of “Garden pathing”
which imposes cognitive load on participants and
the prefer syntactic re-analysis.
Jan 6, 2014
110
Translog : A tool for recording
Translation Process Data




Translog (Carl, 2012) : A Windows based program
Built with a purpose of recording gaze and key-stroke
data during translation
Can be used for other reading and writing related
studies
Using Translog, one can:




Create and Customize translation/reading and writing
experiments involving eye-tracking and keystroke logging
Calibrate the eye-tracker
Replay and analyze the recorded log files
Manually correct errors in gaze recording
Jan 6, 2014
111
TPR Database






The Translation Process Research (TPR) database
(Carl, 2012) is a database containing behavioral data
for translation activities
Contains Gaze and Keystroke information for more
than 450 experiments
40 different paragraphs are translated into 7 different
languages from English by multiple translators
At least 5 translators per language
Source and target paragraphs are annotated with POS
tags, lemmas, dependency relations etc
Easy to use XML data format
Jan 6, 2014
112
Experimental setup (1/2)



Translators translate sentence by
sentence typing to a text box
The display screen is attached with
a remote eye-tracker which
constantly records the eye
movement of the translator
Jan 6, 2014
113
Experimental setup (2/2)



Extracted 20 different text categories
from the data
Each piece of text contains 5-10
sentences
For each category we had at least 10
participants who translated the text into
different target languages .
Jan 6, 2014
114
A predictive framework
for TDI
Test Data
Labeling through gaze
analysis
Training
data


Features
Regressor
TDI
Direct annotation of TDI is fraught with
subjectivity and ad-hocism.
We use translator‟s gaze data as annotation to
prepare training data.
Jan 6, 2014
115
Annotation of TDI (1/4)


First approximation -> TDI equivalent to
“time taken to translate”.
However, time taken to translate may not
be strongly related to translation
difficulty.


It is difficult to know what fraction of the
total time is spent on translation related
thinking.
Sensitive to distractions from the
environment.
Jan 6, 2014
116


Instead of the “time taken to translate”,
consider “time for which translation
related processing is carried out by the
brain”
This is called Translation Processing Time,
given by:
𝑇𝑝 = 𝑇𝑐𝑜𝑚𝑝 +𝑇𝑔𝑒𝑛

Tcomp and Tgen are the comprehension of
source text comprehension and target
text generation respectively.
Jan 6, 2014
117
Humans spend time on what they see, and
this “time” is correlated with the complexity
of the information being processed
f- fixation, s- saccade, Fs- source, Ft- target
𝑇𝑝 =
𝑑𝑢𝑟 𝑓 +
𝑓 ∈ 𝐹𝑠
𝑑𝑢𝑟 𝑠 +
𝑠 ∈ 𝑆𝑠
𝑑𝑢𝑟 𝑓 +
𝑓 ∈ 𝐹𝑡
Jan 6, 2014
𝑑𝑢𝑟 𝑠
𝑠 ∈ 𝑆𝑡
118

The measured TDI score is the Tp
normalized over sentence length
𝑇𝐷𝐼𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑
Jan 6, 2014
𝑇𝑝
=
𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒_𝑙𝑒𝑛𝑔𝑡𝑕
119
Features

Length: Word count of the sentences

Degree of Polysemy: Sum of number of senses
of each word in the WordNet normalized by length

Structural Complexity: If the attachment units
lie far from each other, the sentence has higher
structural complexity. Lin (1996) defines it as the
total length of dependency links in the dependency
structure of the sentence.
Measured TDI for TPR database for 80 sentences.
Jan 6, 2014
120
Experiment and results





Training data of 80 examples; 10-fold cross validation
Features computed using Princeton WordNet and Stanford
Dependency Parser
Support Vector Regression technique (Joachims et al., 1999) along
with different kernels
Error analysis was done by Mean Squared Error estimate
We also computed the correlation of the predicted TDI with the
measured TDI.
Jan 6, 2014
121
Examples from the dataset
Jan 6, 2014
122
Summary



Covered Interlingual based MT: the
oldest approach to MT
Covered SMT: the newest approach to
MT
Presented some recent study in the
context of Indian Languages.
Jan 6, 2014
123
Summary




SMT is the ruling paradigm
But linguistic features can enhance
performance, especially the factored based
SMT with factors coming from interlingua
Large scale effort sponsored by ministry of
IT, TDIL program to create MT systems
Parallel corpora creation is also going on in a
consortium mode
Jan 6, 2014
124
Conclusions






NLP has assumed great importance because
of large amount of text in e-form
Machine learning techniques are increasingly
applied
Highly relevant for India where multilinguality
is way of life
Machine Translation is more fundamental and
ubiquitous than just mapping between two
languages
Utterancethought
Speech to speech online translation
Jan 6, 2014
125
Pubs:
http://ww.cse.iitb.ac.in/~pb
Resources and tools:
http://www.cfilt.iitb.ac.in
Jan 6, 2014
126

Machine Learning for Machine Translation

Transcription

Similar documents

Coke`s `inter-para` tourney concludes in Kolkata

here - Mirage Network Pvt. Ltd.

Brochure

n - Finance Department, Government of West Bengal

PROJECT UDAAN: Prevent Children From Becoming Sex Workers

Increase Sales Revenue, Customer Acquisition, and Retention

Advanced Computing, Networking and Informatics – Volume 1 1 3

Call for Entries: Translation Internship Experience 2014

Untitled - PSAFL SPEEDMARK Forwarding Private Limited