PhD Thesis:
Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler / Università di Trento
PSMT decoding overview
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
2
Arianna Bisazza – PhD Thesis – 19 April 2013
PSMT decoding overview
ReoM scores
ReoM scor es
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s
cores
Freedom of movement must be encouraged LM scores
3
LM scores
Arianna Bisazza – PhD Thesis – 19 April 2013
PSMT decoding overview
ReoM scores
ReoM sco res
ReoM scor es
ReoM scores
ReoM scor es
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s
cores
Freedom of movement must be encouraged while ensuring that career paths LM scores
4
LM scores
LM scores
Arianna Bisazza – PhD Thesis – 19 April 2013
LM scores
…
PSMT decoding overview
ReoM scores
ReoM sco res
ReoM scor es
ReoM scores
ReoM scor es
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s
cores
Freedom of movement must be encouraged while ensuring that career paths LM scores
5
LM scores
LM scores
Arianna Bisazza – PhD Thesis – 19 April 2013
LM scores
…
Reordering Models
Many solutions have been proposed with different reo. classes, features, train modes, etc.
ReoM scores
ReoM scor es
ReoM sco res
Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … ReoM scores
ReoM scor es
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
6
Arianna Bisazza – PhD Thesis – 19 April 2013
Reordering Models
Many solutions have been proposed with different reo. classes, features, train modes, etc.
ReoM scores
ReoM scor es
ReoM sco res
Tillman 04, Zens&Ney06 Zens & Ney 06 Tillman04, Al Onaizan&&Papineni06 Papineni 06 AlOnaizan Manning 08 Galley & Manning08 & al.10, Feng Feng&al.10 & al.10 Green &al.10, … ReoM scores
ReoM scor es
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
No matter what reordering model is used, the permutation search space must be limited! The power of all reordering models is bound to the reordering constraints in use 7
Arianna Bisazza – PhD Thesis – 19 April 2013
ReoM scores
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
8
Arianna Bisazza – PhD Thesis – 19 April 2013
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
Reordering Constraints #perm = |w|! ≈40,000,000
9
Arianna Bisazza – PhD Thesis – 19 April 2013
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
Reordering Constraints #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1|
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion 10
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
DL: distortion limit
Reordering Constraints #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3 #perm ≈7,000
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion 11
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2
10
The problem with DL… Arabic-English EN
AR
EN
AR
12
Arianna Bisazza – PhD Thesis – 19 April 2013
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2
10
The problem with DL… German-English
EN
EN
DE DE
13
Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1|
Current solution Increasing the DLimit!
DL=3 #perm ≈7,000
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion
14
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1|
Current solution Increasing the DLimit!
DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000
Coarse reordering space definition: slower decoding worse translations
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion
15
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
Observations •
Word reordering is difficult!
•
The existing word reordering models are not perfect, but they are expected to guide search over huge search spaces
one way to go: • •
16
design a perfect model problem: many have already tried and failed
our way: •
simplify the task for the existing reordering models
Arianna Bisazza – PhD Thesis – 19 April 2013
Working hypotheses
•
A better definition of the reordering search space (i.e. constraints) can simplify the task of the reordering model
•
(Shallow) linguistic knowledge can help us to refine the reordering search space for a given language pair
17
Arianna Bisazza – PhD Thesis – 19 April 2013
Outline o The problem o The solutions: • verb reordering lattices • modified distortion matrices • dynamically pruning the reordering space
o Comparative evaluation & conclusions
18
Arianna Bisazza – PhD Thesis – 19 April 2013
Outline o The
Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010 problem
o The solutions: • verb reordering lattices • modified distortion matrices
Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012
• dynamically pruning the reordering space
o Comparative evaluation & conclusions
19
Arianna Bisazza – PhD Thesis – 19 April 2013
Idea: keep a low distortion limit and …
#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000
… modify the input to allow only specific long reorderings
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion
20
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
Reordering patterns in Arabic-English Example of VSO sentences: the Arabic verb is anticipated wrt the English order
Typical PSMT outputs: *The Moroccan monarch King Mohamed VI __ his support to… *He renewed the Moroccan monarch King Mohamed VI his support to…
21
Arianna Bisazza – PhD Thesis – 19 April 2013
Working hypothesis Uneven distribution of long and short-range word movements: • few long: verb-subject-object sentences We try to model them explicitly! • many short: adjective-noun head-initial genitive constructions (idafa)
We assume they are well handled in standard PSMT 22
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Shallow syntax chunking: • cheaper
and easier than deep parsing
• constrains
reorderings in a softer way
Fuzzy (non-determinisic) reordering rules: • generate
N permutations for each matching sequence
• final
reordering decision is taken during translation, guided by all SMT models (reoM, LM...)
Few rules for language pair, to only capture long reordering 23
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Move verb chunk ahead by 1 to N chunks
… CH(*) CH(V) CH(*) CH(*) CH(*) CH(*) CH(*) … Move verb chunk and following chunk ahead by 1 to N chunks … CH(*) CH(V) CH(*) CH(*) CH(*) CH(*) CH(*) …
24
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based verb reordering in parallel data
The optimal reordering is the one that minimizes total distortion
25
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based verb reordering in test data Move verb chunk
Move verb chunk and following chunk
26
Verb chunk Other chunks
Arianna Bisazza – PhD Thesis – 19 April 2013
Experiments • Task: NIST-MT09 (news translation) • Systems based on Moses, include lexicalized phrase reordering models [Tillmann 04; Koehn & al 05] • Non-monotonic lattice decoding [Dyer & al 08] • Evaluation by - BLEU [Papineni & al 01] for lexical match & local order - KRS [Birch & al 10] for global order
27
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English:
Translation Quality +0.5 BLEU +0.4 KRS
Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 28
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English: -0.1 BLEU -0.3 KRS
Translation Quality Translation Time
Decoding
Pruning
Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 29
Arianna Bisazza – PhD Thesis – 19 April 2013
Lessons learned limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down Can we do better? Observation: lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer Can we achieve the same effect more directly?
30
Arianna Bisazza – PhD Thesis – 19 April 2013
Outline o The problem o The
Bisazza and Federico, Modified Distortion Matrices for solutions:Phrase-Based Statistical Machine Translation, ACL 2012
• verb reordering lattices • modified distortion matrices • dynamically pruning the reordering space
o Comparative evaluation & conclusions
31
Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 7 8
w4 4 3 2 1 0 2 3 4 5 6 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 5 4 3 2 1 0 2 3 4
w8 8 7 6 5 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion
32
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 4 3 2 1 0
Idea: modify the distortion matrix for each test sentence!
#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000 DL=3 & modif(D) #perm ≈20,000
Refined reordering search space
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10
w2 2 1 0 2 3 4 5 6 7 8 9
w3 3 2 1 0 2 3 4 5 6 2 8
w4 4 3 2 1 0 2 3 4 5 2 7
w5 5 4 3 2 1 0 2 3 4 5 6
w6 6 5 4 3 2 1 0 2 3 4 5
w7 7 6 0 0 3 2 1 0 2 3 4
w8 8 7 0 0 4 3 2 1 0 2 3
w9 9 8 7 6 5 4 3 2 1 0 2
Source-to-Source distortion
33
Arianna Bisazza – PhD Thesis – 19 April 2013
w10 10 9 8 7 6 5 0 3 2 1 0
Chunk-based fuzzy reordering rules
Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”
w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
CC1 VC2
34
PC3
NC4
PC5
Arianna Bisazza – PhD Thesis – 19 April 2013
Pct6
Chunk-based fuzzy reordering rules
Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”
w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
CC1 VC2 CC1 CC1 CC1
35
PC3 PC3 PC3
PC3 VC2 NC4 NC4
NC4
PC5
Pct6
NC4 VC2 PC5
PC5 PC5 VC2
Pct6 Pct6 Pct6
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules
Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”
w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
CC1 VC2 CC1 CC1 CC1 CC1 CC1
36
PC3 PC3 PC3 NC4 NC4
PC3 VC2 NC4 NC4 VC2 PC5
NC4
PC5
Pct6
NC4 VC2 PC5
PC5 PC5 VC2
Pct6 Pct6 Pct6
PC5 PC3
Pct6 Pct6
PC3 VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules
Reordered source LM
Reordering selection
w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
CC1 VC2 CC1 CC1 CC1 CC1 CC1
37
PC3 PC3 PC3 NC4 NC4
PC3 VC2 NC4 NC4 VC2 PC5
NC4
PC5
Pct6
NC4 VC2 PC5
PC5 PC5 VC2
Pct6 Pct6 Pct6
PC5 PC3
Pct6 Pct6
PC3 VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
0.7 0.1 0.1 0.4 0.9
Chunk-based fuzzy reordering rules
Reordered source LM
Reordering selection
w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
CC1 VC2 CC1
PC3
PC3 VC2
NC4
PC5
Pct6
NC4
PC5
Pct6
Reorderings to include in the distortion matrix CC1
38
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
0.7 0.1 0.1 0.4 0.9
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8
PC3
w2 2 1 0 2 3 4 5 6 7
NC4
w3 3 2 1 0 2 3 4 5 6
NC4
w4 4 3 2 1 0 2 3 4 5
w5 5 4 3 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
39
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 3 2 1 0 2 3 4 5
w5 5 4 3 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
40
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 3 2 1 0 2 3 4 5
w5 5 4 3 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
41
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 3 0 1 0 2 3 4 5
w5 5 4 0 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
42
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 0 0 1 0 2 3 4 5
w5 5 0 0 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
43
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 0 0 1 0 2 3 4 5
w5 5 0 0 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 5 4 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
44
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 0 0 1 0 2 3 4 5
w5 5 0 0 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 0 0 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
45
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
CC1 VC2
Modifying the distortion matrix
CC1 VC2 PC3 NC4 PC5 Pct6
CC1
PC3
VC2
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
NC4
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 0 0 1 0 2 3 4 5
w5 5 0 0 2 1 0
PC5
w6 6 5 4 3 2 1 0
2 3 4
PC5
2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 0 0 3 2 1 0
2
Pct6
Reorderings to include in the distortion matrix CC1
46
NC4
PC5
VC2
Arianna Bisazza – PhD Thesis – 19 April 2013
PC3
Pct6
Modifying the distortion matrix
CC1 VC2
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8
PC3
w2 2 0 0 2 3 4 5 6 7
w3 3 0 1 0 2 3 4 5 6
NC4
w4 4 0 0 1 0 2 3 4 5
w5 5 0 0 2 1 0 2 3 4
PC5
w6 6 5 4 3 2 1 0 2 3
w7 7 6 5 4 3 2 1 0
Pct6
w8 8 7 6 0 0 3 2 1 0
2
“ w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . ”
Decoder input
47
Arianna Bisazza – PhD Thesis – 19 April 2013
Experiments • Tasks: NIST-MT09 for Ar-En, WMT10 for De-En • Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08]
• Baseline Distortion Limits: 5 in Ar-En, 10 in De-En • Evaluation by: - BLEU for lexical match & local order - KRS for global order 48
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English: +0.9 BLEU +0.6 KRS
Translation Quality Translation Time
!"#$%&'(
#%&$
*(($ #)($ #(($ !)($
Test set: eval09-nw Distortion modified with 3-best reorderings per rule-matching sequence 49
!'#$
!"#$
!(($
+,-./012)$
Arianna Bisazza – PhD Thesis – 19 April 2013
+,-./012%$
345.6012)$
German-English:
+0.5 BLEU +0.7 KRS
Translation Quality Translation Time
'(&$
!"#$%&'( '")$
%&%$
'))$ %")$ %))$
!('$
!")$
Test set: newstest10 Distortion modified with 3-best reorderings per rule-matching sequence 50
!"#$
!))$
*+,-./012$ *+,-./01!)$ *+,-./01%)$ 345-6/012$
Arianna Bisazza – PhD Thesis – 19 April 2013
Lessons learned
modified distortion matrices improve reordering without decoding overhead language-specific reordering rules are still needed Can we learn everything from the data?
51
Arianna Bisazza – PhD Thesis – 19 April 2013
Outline o The problem Bisazza and Federico, Dynamically Shaping the Reordering o The solutions:
• verb
Search Space of Phrase-Based Statistical Machine Translation, Transactions of ACL 2013 (accepted with minor revisions) reordering lattices
• modified distortion matrices • dynamically pruning the reordering space
o Comparative evaluation & conclusions
52
Arianna Bisazza – PhD Thesis – 19 April 2013
A fully data-driven approach • Train a binary classifier to learn if an input word wy is to be translated right after another wx Word-after-Word (WaW) reordering model
no
no
no
no
no
yes
“... anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet ”
• No rules required, all is learnt from parallel data • Approach is easily portable to new language pairs with similar reordering characteristics 53
Arianna Bisazza – PhD Thesis – 19 April 2013
Decoder-integration
usual approach] approach additional feature function [usual novel approach [novel approach dynamically prune the reordering space:
➞ use model score to decide (early) if a given reordering path is promising enough to be further explored
54
Arianna Bisazza – PhD Thesis – 19 April 2013
Die Budapester
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
Test time: run classifier for each input sentence
Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . 55
Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.3 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
zum Vorfall eingeleitet . 56
Consider a larger space (DL)
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen
Test time: run classifier for each input sentence
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
zum Vorfall eingeleitet . 57
Consider a larger space (DL)
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen
Test time: run classifier for each input sentence
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 58
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 59
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 60
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 61
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Early reordering pruning
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 62
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013
.
eingeleitet
Vorfall
zum
Ermittlungen
ihre
hat
anwaltschaft
Staat~
Budapester
Die
Decoder-integration
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 63
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
How to reduce early pruning errors? always allow short jumps!
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Improved Word Reordering for PBSMT
eingeleitet
Ermittlungen
Vorfall
Prunable zone .
zum
ihre
Non-prunable zone hat
anwaltschaft
Staat~
Budapester
Die
Decoder-integration
0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10
Off limits
0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1
Die Budapester 0.6 Staat~ anwaltschaft hat
0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1
0.6 0.5
0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2
0.2 0.4 0.3
0.9 0.3 0.4 0.6 0.2 0.5 0.3
0.1 0.3 0.6 0.7
0.9 0.3 0.4 0.6 0.7 0.1
ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 64
0.6 0.8 0.4 0.4 0.2
0.4 0.2 0.3 0.4 0.6 0.2
How to reduce early pruning errors? always allow short jumps!
0.8 0.4 0.1 0.1
0.1 0.1 0.1 0.3 0.5 0.3 0.1
0.9 0.5 0.7
0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4
0.6 0.5
0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3
0.6
0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Improved Word Reordering for PBSMT
Experiments • Same tasks • Similar baselines, but with early distortion cost [Moore & Quirk 07]
• Baseline Distortion Limit: 8 • Evaluation by: - BLEU, KRS - KRS-V Weighted KRS, only sensitive to verbs
65
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English:
!"#$%&
*+"+%&',() *-$./0-12$)
(7/8)
+0.3 BLEU +0.8 KRS-V
(4/()
*+"+%&'()
(4/6)
!"#$%&'()
Translation Quality
*+"+%&',()
(4/5)
!"#$%&',() (3/4) 35/8)
35/6)
35/4)
35/()
3,/5)
3,/8)
'()*&
Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 66
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English:
!"#$%&
*+"+%&',() *-$./0-12$)
(7/8)
+0.6 BLEU +1.2 KRS-V
(4/()
*+"+%&'()
(4/6)
!"#$%&'()
*+"+%&',()
(4/5)
Translation Quality Translation Time
!"#$%&',() (3/4) 35/8)
35/6)
35/4)
35/()
3,/5)
3,/8)
'()*&
Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 67
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English: !"#$%&
*+"+%&',() *-$./0-12$)
+0.2 BLEU +0.7 KRS-V
3(/5) 38/5)
!"#$%&'()
*+"+%&'()
Translation Quality
33/5) 37/5)
!"#$%&',()
36/5)
*+"+%&',() 34/5) ,9/5)
,9/7)
:5/5)
:5/7)
:,/5)
'()*&
Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) 68
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English: !"#$%&
*+"+%&',() *-$./0-12$)
3(/5) 38/5)
!"#$%&'()
+1.3 BLEU +4.0 KRS-V
33/5)
*+"+%&'()
Translation Quality Translation Time
37/5)
!"#$%&',()
36/5)
*+"+%&',() 34/5) ,9/5)
,9/7)
:5/5)
:5/7)
:,/5)
'()*&
Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) 69
Arianna Bisazza – PhD Thesis – 19 April 2013
Outline o The problem o The solutions: • verb reordering lattices • modified distortion matrices • dynamically pruning the reordering space
o Comparative evaluation & conclusions
70
Arianna Bisazza – PhD Thesis – 19 April 2013
Experiments • Same PSMT baselines • Best enhanced PSMT systems: - Ar-En: WaW model & erly reo. pruning - De-En: reo. lattices pruned with reo. source LM
• Hierarchical phrase-based system: - default configuration (max span for rule extract.: 10 words) - max span for decoding: 10 or 20
• Evaluation by: - BLEU, KRS - KRS-V 71
Weighted KRS, only sensitive to verbs Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English:
Translation Quality Translation Time
Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 72
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English:
Translation Quality Translation Time
Test set: newstest10 Lattices pruned with reo. source LM (more metrics and test sets in the thesis) 73
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English examples (1)
74
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English examples (1)
75
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English examples (2)
76
Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English examples (2)
77
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English examples (1)
78
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English examples (1)
79
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English examples (2)
80
Arianna Bisazza – PhD Thesis – 19 April 2013
German-English examples (2)
81
Arianna Bisazza – PhD Thesis – 19 April 2013
Conclusions • Our techniques advance the state of the art in reordering modeling within the PSMT framework: capture long-range reordering patterns without sacrificing decoding efficiency proved importance of refining the reordering search space • Positive results on large-scale news translation task in two difficult language pairs: significant gains in reordering-specific metrics while generic scores are preserved or increased our best PSMT systems compare favorably with a strong tree-based approach (HSMT) - both in quality and efficiency 82
Arianna Bisazza – PhD Thesis – 19 April 2013
Future Directions • Improve the proposed methods by: refining chunk-based reordering rules with POS or lexical clues increasing accuracy of WaW model with new features combining different reordering scores for early pruning • Evaluate on language pairs with similar reordering characteristics • Analyze the effect of improved long reordering on post-editing effort by human translators • Address the problem of reordering search space definition in HSMT, possibly with analogous strategies 83
Arianna Bisazza – PhD Thesis – 19 April 2013
Related publications •
A. Bisazza, M. Federico, “Chunk-based Verb Reordering in VSO Sentences for Arabic-English”, WMT 2010.
•
C. Hardmeier, A. Bisazza, M. Federico, “Word Lattices for Morphological Reduction and Chunk-based Reordering”, WMT 2010.
•
A. Bisazza, D. Pighin, M. Federico, “Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation”, MT Journal, Special Issues on MT for Arabic, 2012.
•
A. Bisazza, M. Federico, “Modified Distortion Matrices for Phrase-Based Statistical Machine Translation”, ACL 2012.
•
A. Bisazza, M. Federico, “Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation”, Transactions of the ACL 2013 (accepted with minor revisions).
84
Arianna Bisazza – PhD Thesis – 19 April 2013
w0 w1 0 1 w0 0 w1 2 w2 3 T w3 4 H w4 5 A w5 6 N w6 7 K w7 8 S w8 9 8 w9 10 9 w10 11 10
85
w2 2 1 0 2 T 4 5 6 7 8 9
w3 3 2 1 0 T 3 4 5 6 7 8
w4 4 3 2 1 0 E 2 3 4 5 6 7
w5 5 4 3 2 1 N 2 3 4 5 6
w6 6 5 4 3 2 T 0 F 2 3 4 5
w7 7 6 5 4 3 I 1 O 2 3 4
w8 8 7 6 5 Y O U R 0 2 3
w9 9 8 7 6 5 N 3 2 1 0
w10 10 9 8 7 6 ! 4 3 2 1 0
2
Arianna Bisazza – PhD Thesis – 19 April 2013
w0 w1 0 1 w0 0 w1 2 w2 3 T w3 4 H w4 5 A w5 6 N w6 7 K w7 8 S w8 9 8 w9 10 9 w10 11 10
86
w2 2 1 0 2 T 4 5 6 7 8 9
w3 3 2 1 0 T 3 4 5 6 7 8
w4 4 3 2 1 0 E 2 3 4 5 6 7
w5 5 4 3 2 1 N 2 3 4 5 6
w6 6 5 4 3 2 T 0 F 2 3 4 5
w7 7 6 5 4 3 I 1 O 2 3 4
w8 8 7 6 5 Y O U R 0 2 3
w9 9 8 7 6 5 N 3 2 1 0
w10 10 9 8 7 6 ! 4 3 2 1 0
2
Arianna Bisazza – PhD Thesis – 19 April 2013