Linguistically Motivated Reordering Modeling for Phrase-Based ...

*The Moroccan monarch King Mohamed VI __ his support to… *He renewed the Moroccan monarch King Mohamed VI his support to… Reordering patterns in ...
6MB Größe 2 Downloads 282 Ansichten
PhD Thesis:

Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler / Università di Trento

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

2

Arianna Bisazza – PhD Thesis – 19 April 2013

PSMT decoding overview

ReoM scores

ReoM scor es

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s

cores

Freedom of movement must be encouraged LM scores

3

LM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

PSMT decoding overview

ReoM scores

ReoM sco res

ReoM scor es

ReoM scores

ReoM scor es

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s

cores

Freedom of movement must be encouraged while ensuring that career paths LM scores

4

LM scores

LM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

LM scores



PSMT decoding overview

ReoM scores

ReoM sco res

ReoM scor es

ReoM scores

ReoM scor es

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM s

cores

Freedom of movement must be encouraged while ensuring that career paths LM scores

5

LM scores

LM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

LM scores



Reordering Models

Many solutions have been proposed with different reo. classes, features, train modes, etc.

ReoM scores

ReoM scor es

ReoM sco res

Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … ReoM scores

ReoM scor es

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reordering Models

Many solutions have been proposed with different reo. classes, features, train modes, etc.

ReoM scores

ReoM scor es

ReoM sco res

Tillman 04, Zens&Ney06 Zens & Ney 06 Tillman04, Al Onaizan&&Papineni06 Papineni 06 AlOnaizan Manning 08 Galley & Manning08 & al.10, Feng Feng&al.10 & al.10 Green &al.10, … ReoM scores

ReoM scor es

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

No matter what reordering model is used, the permutation search space must be limited!  The power of all reordering models is bound to the reordering constraints in use 7

Arianna Bisazza – PhD Thesis – 19 April 2013

ReoM scores

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

8

Arianna Bisazza – PhD Thesis – 19 April 2013

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Reordering Constraints #perm = |w|! ≈40,000,000 

9

Arianna Bisazza – PhD Thesis – 19 April 2013

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Reordering Constraints #perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1| 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion 10

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

DL: distortion limit

Reordering Constraints #perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1|  DL=3  #perm ≈7,000 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion 11

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2

10

The problem with DL… Arabic-English EN

AR

EN

AR

12

Arianna Bisazza – PhD Thesis – 19 April 2013

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2

10

The problem with DL… German-English

EN

EN

DE DE

13

Arianna Bisazza – PhD Thesis – 19 April 2013

#perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1| 

Current solution Increasing the DLimit!

DL=3  #perm ≈7,000 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion

14

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

#perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1| 

Current solution Increasing the DLimit!

DL=3  #perm ≈7,000  DL=7  #perm ≈7,000,000 

Coarse reordering space definition:   slower decoding   worse translations

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion

15

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

Observations • 

Word reordering is difficult!

• 

The existing word reordering models are not perfect, but they are expected to guide search over huge search spaces

one way to go: •  • 

16

design a perfect model problem: many have already tried and failed

our way: • 

simplify the task for the existing reordering models

Arianna Bisazza – PhD Thesis – 19 April 2013

Working hypotheses

• 

A better definition of the reordering search space (i.e. constraints) can simplify the task of the reordering model

• 

(Shallow) linguistic knowledge can help us to refine the reordering search space for a given language pair

17

Arianna Bisazza – PhD Thesis – 19 April 2013

Outline o  The problem o  The solutions: •  verb reordering lattices •  modified distortion matrices •  dynamically pruning the reordering space

o  Comparative evaluation & conclusions

18

Arianna Bisazza – PhD Thesis – 19 April 2013

Outline o  The

Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010 problem

o  The solutions: •  verb reordering lattices •  modified distortion matrices

Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012

•  dynamically pruning the reordering space

o  Comparative evaluation & conclusions

19

Arianna Bisazza – PhD Thesis – 19 April 2013

Idea: keep a low distortion limit and …

#perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1|  DL=3  #perm ≈7,000  DL=7  #perm ≈7,000,000 

… modify the input to allow only specific long reorderings

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion

20

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

Reordering patterns in Arabic-English Example of VSO sentences: the Arabic verb is anticipated wrt the English order

Typical PSMT outputs: *The Moroccan monarch King Mohamed VI __ his support to…  *He renewed the Moroccan monarch King Mohamed VI his support to… 

21

Arianna Bisazza – PhD Thesis – 19 April 2013

Working hypothesis Uneven distribution of long and short-range word movements: •  few long:   verb-subject-object sentences We try to model them explicitly! •  many short:   adjective-noun   head-initial genitive constructions (idafa)

We assume they are well handled in standard PSMT 22

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based fuzzy reordering rules Shallow syntax chunking: •  cheaper

and easier than deep parsing

•  constrains

reorderings in a softer way

Fuzzy (non-determinisic) reordering rules: •  generate

N permutations for each matching sequence

•  final

reordering decision is taken during translation, guided by all SMT models (reoM, LM...)

Few rules for language pair, to only capture long reordering 23

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based fuzzy reordering rules Move verb chunk ahead by 1 to N chunks

…   CH(*)  CH(V)  CH(*)  CH(*)  CH(*)  CH(*)  CH(*)  …   Move verb chunk and following chunk ahead by 1 to N chunks …   CH(*)   CH(V)    CH(*)  CH(*)  CH(*)  CH(*)  CH(*)  …  

24

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based verb reordering in parallel data

The optimal reordering is the one that minimizes total distortion

25

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based verb reordering in test data Move verb chunk

Move verb chunk and following chunk

26

Verb chunk Other chunks

Arianna Bisazza – PhD Thesis – 19 April 2013

Experiments •  Task: NIST-MT09 (news translation) •  Systems based on Moses, include lexicalized phrase reordering models [Tillmann 04; Koehn & al 05] •  Non-monotonic lattice decoding [Dyer & al 08] •  Evaluation by - BLEU [Papineni & al 01] for lexical match & local order - KRS [Birch & al 10] for global order

27

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English:

Translation Quality +0.5 BLEU +0.4 KRS

Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 28

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English: -0.1 BLEU -0.3 KRS

Translation Quality Translation Time

Decoding

Pruning

Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 29

Arianna Bisazza – PhD Thesis – 19 April 2013

Lessons learned limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down Can we do better? Observation: lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer Can we achieve the same effect more directly?

30

Arianna Bisazza – PhD Thesis – 19 April 2013

Outline o  The problem o  The

Bisazza and Federico, Modified Distortion Matrices for solutions:Phrase-Based Statistical Machine Translation, ACL 2012

•  verb reordering lattices •  modified distortion matrices •  dynamically pruning the reordering space

o  Comparative evaluation & conclusions

31

Arianna Bisazza – PhD Thesis – 19 April 2013

#perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1|  DL=3  #perm ≈7,000  DL=7  #perm ≈7,000,000 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 7 8

w4 4 3 2 1 0 2 3 4 5 6 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 5 4 3 2 1 0 2 3 4

w8 8 7 6 5 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion

32

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 4 3 2 1 0

Idea: modify the distortion matrix for each test sentence!

#perm = |w|! ≈40,000,000  D(wx,wy)=|y‐x‐1|  DL=3  #perm ≈7,000  DL=7  #perm ≈7,000,000  DL=3 & modif(D)        #perm ≈20,000 

Refined reordering search space

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8 w9 10 9 w10 11 10

w2 2 1 0 2 3 4 5 6 7 8 9

w3 3 2 1 0 2 3 4 5 6 2 8

w4 4 3 2 1 0 2 3 4 5 2 7

w5 5 4 3 2 1 0 2 3 4 5 6

w6 6 5 4 3 2 1 0 2 3 4 5

w7 7 6 0 0 3 2 1 0 2 3 4

w8 8 7 0 0 4 3 2 1 0 2 3

w9 9 8 7 6 5 4 3 2 1 0 2

Source-to-Source distortion

33

Arianna Bisazza – PhD Thesis – 19 April 2013

w10 10 9 8 7 6 5 0 3 2 1 0

Chunk-based fuzzy reordering rules

Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”

 

  w‐      $Ark       fy AltZAhrp   E$rAt AlmslHyn      mn AlktA}b         .     and    took part     in the march     dozens of militants   from the Brigades        

 CC1    VC2 

34

PC3 

NC4 

 PC5 

Arianna Bisazza – PhD Thesis – 19 April 2013

 Pct6 

Chunk-based fuzzy reordering rules

Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”

 

  w‐      $Ark       fy AltZAhrp   E$rAt AlmslHyn      mn AlktA}b         .     and    took part     in the march     dozens of militants   from the Brigades        

 CC1    VC2   CC1   CC1   CC1 

35

PC3  PC3  PC3 

PC3   VC2  NC4  NC4 

NC4 

 PC5 

 Pct6 

NC4   VC2  PC5 

 PC5   PC5   VC2 

 Pct6   Pct6   Pct6 

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based fuzzy reordering rules

Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks”

 

  w‐      $Ark       fy AltZAhrp   E$rAt AlmslHyn      mn AlktA}b         .     and    took part     in the march     dozens of militants   from the Brigades        

 CC1    VC2   CC1   CC1   CC1   CC1   CC1 

36

PC3  PC3  PC3  NC4  NC4 

PC3   VC2  NC4  NC4   VC2  PC5 

NC4 

 PC5 

 Pct6 

NC4   VC2  PC5 

 PC5   PC5   VC2 

 Pct6   Pct6   Pct6 

 PC5  PC3 

 Pct6   Pct6 

PC3   VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based fuzzy reordering rules

Reordered source LM

Reordering selection  

  w‐      $Ark       fy AltZAhrp   E$rAt AlmslHyn      mn AlktA}b         .     and    took part     in the march     dozens of militants   from the Brigades        

 CC1    VC2   CC1   CC1   CC1   CC1   CC1 

37

PC3  PC3  PC3  NC4  NC4 

PC3   VC2  NC4  NC4   VC2  PC5 

NC4 

 PC5 

 Pct6 

NC4   VC2  PC5 

 PC5   PC5   VC2 

 Pct6   Pct6   Pct6 

 PC5  PC3 

 Pct6   Pct6 

PC3   VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

0.7 0.1 0.1 0.4 0.9

Chunk-based fuzzy reordering rules

Reordered source LM

Reordering selection  

  w‐      $Ark       fy AltZAhrp   E$rAt AlmslHyn      mn AlktA}b         .     and    took part     in the march     dozens of militants   from the Brigades        

 CC1    VC2   CC1 

PC3 

PC3   VC2 

NC4 

 PC5 

 Pct6 

NC4 

 PC5 

 Pct6 

Reorderings to include in the distortion matrix  CC1 

38

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

0.7 0.1 0.1 0.4 0.9

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8

PC3

w2 2 1 0 2 3 4 5 6 7

NC4 

w3 3 2 1 0 2 3 4 5 6

NC4

w4 4 3 2 1 0 2 3 4 5

w5 5 4 3 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

39

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 3 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 3 2 1 0 2 3 4 5

w5 5 4 3 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

40

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 3 2 1 0 2 3 4 5

w5 5 4 3 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

41

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 3 0 1 0 2 3 4 5

w5 5 4 0 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

42

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 6 w7 8 7 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 0 0 1 0 2 3 4 5

w5 5 0 0 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

43

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 0 0 1 0 2 3 4 5

w5 5 0 0 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 5 4 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

44

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 0 0 1 0 2 3 4 5

w5 5 0 0 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 0 0 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

45

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

CC1 VC2

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

 CC1 

PC3 

 VC2 

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

NC4 

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 0 0 1 0 2 3 4 5

w5 5 0 0 2 1 0

PC5

w6 6 5 4 3 2 1 0

2 3 4

 PC5 

2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 0 0 3 2 1 0

2

 Pct6 

Reorderings to include in the distortion matrix  CC1 

46

NC4 

PC5 

 VC2 

Arianna Bisazza – PhD Thesis – 19 April 2013

PC3 

 Pct6 

Modifying the distortion matrix

CC1 VC2

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 0 1 w0 0 w1 2 w2 3 2 w3 4 2 w4 5 4 w5 6 5 w6 7 2 w7 8 2 w8 9 8

PC3

w2 2 0 0 2 3 4 5 6 7

w3 3 0 1 0 2 3 4 5 6

NC4

w4 4 0 0 1 0 2 3 4 5

w5 5 0 0 2 1 0 2 3 4

PC5

w6 6 5 4 3 2 1 0 2 3

w7 7 6 5 4 3 2 1 0

Pct6

w8 8 7 6 0 0 3 2 1 0

2

“ w‐  $Ark   fy  AltZAhrp  E$rAt  AlmslHyn  mn  AlktA}b  . ”   

Decoder input

47

Arianna Bisazza – PhD Thesis – 19 April 2013

Experiments •  Tasks: NIST-MT09 for Ar-En, WMT10 for De-En •  Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08]

•  Baseline Distortion Limits: 5 in Ar-En, 10 in De-En •  Evaluation by: - BLEU for lexical match & local order - KRS for global order 48

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English: +0.9 BLEU +0.6 KRS

Translation Quality Translation Time

!"#$%&'(

#%&$

*(($ #)($ #(($ !)($

Test set: eval09-nw Distortion modified with 3-best reorderings per rule-matching sequence 49

!'#$

!"#$

!(($

+,-./012)$

Arianna Bisazza – PhD Thesis – 19 April 2013

+,-./012%$

345.6012)$

German-English:

+0.5 BLEU +0.7 KRS

Translation Quality Translation Time

'(&$

!"#$%&'( '")$

%&%$

'))$ %")$ %))$

!('$

!")$

Test set: newstest10 Distortion modified with 3-best reorderings per rule-matching sequence 50

!"#$

!))$

*+,-./012$ *+,-./01!)$ *+,-./01%)$ 345-6/012$

Arianna Bisazza – PhD Thesis – 19 April 2013

Lessons learned

modified distortion matrices improve reordering without decoding overhead language-specific reordering rules are still needed Can we learn everything from the data?

51

Arianna Bisazza – PhD Thesis – 19 April 2013

Outline o  The problem Bisazza and Federico, Dynamically Shaping the Reordering o  The solutions:

•  verb

Search Space of Phrase-Based Statistical Machine Translation, Transactions of ACL 2013 (accepted with minor revisions) reordering lattices

•  modified distortion matrices •  dynamically pruning the reordering space

o  Comparative evaluation & conclusions

52

Arianna Bisazza – PhD Thesis – 19 April 2013

A fully data-driven approach •  Train a binary classifier to learn if an input word wy is to be translated right after another wx  Word-after-Word (WaW) reordering model

no

no

no

no

no

yes

“... anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet ”

•  No rules required, all is learnt from parallel data •  Approach is easily portable to new language pairs with similar reordering characteristics 53

Arianna Bisazza – PhD Thesis – 19 April 2013

Decoder-integration

usual approach] approach additional feature function [usual novel approach [novel approach dynamically prune the reordering space:

➞  use model score to decide (early) if a given reordering path is promising enough to be further explored

54

Arianna Bisazza – PhD Thesis – 19 April 2013

Die Budapester

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

Test time: run classifier for each input sentence

Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . 55

Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.3 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

zum Vorfall eingeleitet . 56

Consider a larger space (DL)

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen

Test time: run classifier for each input sentence

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

zum Vorfall eingeleitet . 57

Consider a larger space (DL)

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen

Test time: run classifier for each input sentence

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 58

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 59

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 60

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 61

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Early reordering pruning

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 62

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Arianna Bisazza – PhD Thesis – 19 April 2013

.

eingeleitet

Vorfall

zum

Ermittlungen

ihre

hat

anwaltschaft

Staat~

Budapester

Die

Decoder-integration

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 63

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

How to reduce early pruning errors?  always allow short jumps!

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Improved Word Reordering for PBSMT

eingeleitet

Ermittlungen

Vorfall

Prunable zone .

zum

ihre

Non-prunable zone hat

anwaltschaft

Staat~

Budapester

Die

Decoder-integration

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

Off limits

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

Die Budapester 0.6 Staat~ anwaltschaft hat

0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5

0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3

0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7

0.9 0.3 0.4 0.6 0.7 0.1

ihre 0.1 0.1 0.4 0.5 0.2 Ermittlungen zum Vorfall eingeleitet . 64

0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2

How to reduce early pruning errors?  always allow short jumps!

0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1

0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4

0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3

0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1 Improved Word Reordering for PBSMT

Experiments •  Same tasks •  Similar baselines, but with early distortion cost [Moore & Quirk 07]

•  Baseline Distortion Limit: 8 •  Evaluation by: - BLEU, KRS - KRS-V Weighted KRS, only sensitive to verbs

65

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English:

!"#$%&

*+"+%&',() *-$./0-12$)

(7/8)

+0.3 BLEU +0.8 KRS-V

(4/()

*+"+%&'()

(4/6)

!"#$%&'()

Translation Quality

*+"+%&',()

(4/5)

!"#$%&',() (3/4) 35/8)

35/6)

35/4)

35/()

3,/5)

3,/8)

'()*&

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 66

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English:

!"#$%&

*+"+%&',() *-$./0-12$)

(7/8)

+0.6 BLEU +1.2 KRS-V

(4/()

*+"+%&'()

(4/6)

!"#$%&'()

*+"+%&',()

(4/5)

Translation Quality Translation Time

!"#$%&',() (3/4) 35/8)

35/6)

35/4)

35/()

3,/5)

3,/8)

'()*&

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 67

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English: !"#$%&

*+"+%&',() *-$./0-12$)

+0.2 BLEU +0.7 KRS-V

3(/5) 38/5)

!"#$%&'()

*+"+%&'()

Translation Quality

33/5) 37/5)

!"#$%&',()

36/5)

*+"+%&',() 34/5) ,9/5)

,9/7)

:5/5)

:5/7)

:,/5)

'()*&

Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) 68

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English: !"#$%&

*+"+%&',() *-$./0-12$)

3(/5) 38/5)

!"#$%&'()

+1.3 BLEU +4.0 KRS-V

33/5)

*+"+%&'()

Translation Quality Translation Time

37/5)

!"#$%&',()

36/5)

*+"+%&',() 34/5) ,9/5)

,9/7)

:5/5)

:5/7)

:,/5)

'()*&

Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) 69

Arianna Bisazza – PhD Thesis – 19 April 2013

Outline o  The problem o  The solutions: •  verb reordering lattices •  modified distortion matrices •  dynamically pruning the reordering space

o  Comparative evaluation & conclusions

70

Arianna Bisazza – PhD Thesis – 19 April 2013

Experiments •  Same PSMT baselines •  Best enhanced PSMT systems: -  Ar-En: WaW model & erly reo. pruning -  De-En: reo. lattices pruned with reo. source LM

•  Hierarchical phrase-based system: -  default configuration (max span for rule extract.: 10 words) -  max span for decoding: 10 or 20

•  Evaluation by: -  BLEU, KRS -  KRS-V 71

Weighted KRS, only sensitive to verbs Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English:

Translation Quality Translation Time

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) 72

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English:

Translation Quality Translation Time

Test set: newstest10 Lattices pruned with reo. source LM (more metrics and test sets in the thesis) 73

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (1)

74

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (1)

75

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (2)

76

Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (2)

77

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (1)

78

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (1)

79

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (2)

80

Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (2)

81

Arianna Bisazza – PhD Thesis – 19 April 2013

Conclusions •  Our techniques advance the state of the art in reordering modeling within the PSMT framework:   capture long-range reordering patterns without sacrificing decoding efficiency   proved importance of refining the reordering search space •  Positive results on large-scale news translation task in two difficult language pairs:   significant gains in reordering-specific metrics while generic scores are preserved or increased   our best PSMT systems compare favorably with a strong tree-based approach (HSMT) - both in quality and efficiency 82

Arianna Bisazza – PhD Thesis – 19 April 2013

Future Directions •  Improve the proposed methods by:   refining chunk-based reordering rules with POS or lexical clues   increasing accuracy of WaW model with new features   combining different reordering scores for early pruning •  Evaluate on language pairs with similar reordering characteristics •  Analyze the effect of improved long reordering on post-editing effort by human translators •  Address the problem of reordering search space definition in HSMT, possibly with analogous strategies 83

Arianna Bisazza – PhD Thesis – 19 April 2013

Related publications • 

A. Bisazza, M. Federico, “Chunk-based Verb Reordering in VSO Sentences for Arabic-English”, WMT 2010.

• 

C. Hardmeier, A. Bisazza, M. Federico, “Word Lattices for Morphological Reduction and Chunk-based Reordering”, WMT 2010.

• 

A. Bisazza, D. Pighin, M. Federico, “Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation”, MT Journal, Special Issues on MT for Arabic, 2012.

• 

A. Bisazza, M. Federico, “Modified Distortion Matrices for Phrase-Based Statistical Machine Translation”, ACL 2012.

• 

A. Bisazza, M. Federico, “Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation”, Transactions of the ACL 2013 (accepted with minor revisions).

84

Arianna Bisazza – PhD Thesis – 19 April 2013

w0 w1 0 1 w0 0 w1 2 w2 3 T w3 4 H w4 5 A w5 6 N w6 7 K w7 8 S w8 9 8 w9 10 9 w10 11 10

85

w2 2 1 0 2 T 4 5 6 7 8 9

w3 3 2 1 0 T 3 4 5 6 7 8

w4 4 3 2 1 0 E 2 3 4 5 6 7

w5 5 4 3 2 1 N 2 3 4 5 6

w6 6 5 4 3 2 T 0 F 2 3 4 5

w7 7 6 5 4 3 I 1 O 2 3 4

w8 8 7 6 5 Y O U R 0 2 3

w9 9 8 7 6 5 N 3 2 1 0

w10 10 9 8 7 6 ! 4 3 2 1 0

2

Arianna Bisazza – PhD Thesis – 19 April 2013

w0 w1 0 1 w0 0 w1 2 w2 3 T w3 4 H w4 5 A w5 6 N w6 7 K w7 8 S w8 9 8 w9 10 9 w10 11 10

86

w2 2 1 0 2 T 4 5 6 7 8 9

w3 3 2 1 0 T 3 4 5 6 7 8

w4 4 3 2 1 0 E 2 3 4 5 6 7

w5 5 4 3 2 1 N 2 3 4 5 6

w6 6 5 4 3 2 T 0 F 2 3 4 5

w7 7 6 5 4 3 I 1 O 2 3 4

w8 8 7 6 5 Y O U R 0 2 3

w9 9 8 7 6 5 N 3 2 1 0

w10 10 9 8 7 6 ! 4 3 2 1 0

2

Arianna Bisazza – PhD Thesis – 19 April 2013