IWSLT 2014 German→English

Paper Basic Architecture #Layer #Hidden Algorithm BLEU Open-Sourced
Multi-agent dual learning [bib] Transformer 8 256 The output model is boosted by the duality bridged by multiple models 35.56 No
Tied transformer [bib] Transformer 8 384 A group of parameters shared by encoder and decoder 35.52 No
Layer-wise coordination [bib] Transformer 18 256 Layer-wise coordination and parameter sharing 35.31 No
One model for a pair of dual tasks [bib] Transformer 8 256 One model trained for both De->En and En->De 35.30 ± 0.1 No
Model-level dual learning [bib] Transformer 6 256 Use two language modules + dual inference 35.19 No
Learn to teach loss functions [bib] Transformer 6 256 Dynamic loss function taught by a teacher 34.80 No
Role interactive layer [bib] Transformer 3 256 +RIL, a layer between the word embedding and the first hidden layer of the network 34.74 No
FRAGE [bib] Transformer 5 256 + FRAGE (Use GAN to eliminate the differences between rare and popular words) 33.97 github
Variational attention [bib] BiLSTM 1 -- variational attention 33.68 github
Dual transfer learning [bib] BiLSTM 2 512 Leveraging En->De model to help boost performances 32.85 No