Multi-agent dual learning [bib] |
Transformer |
8 |
256 |
The output model is boosted by the duality bridged by multiple models |
35.56 |
No |
Tied transformer [bib] |
Transformer |
8 |
384 |
A group of parameters shared by encoder and decoder |
35.52 |
No |
Layer-wise coordination [bib] |
Transformer |
18 |
256 |
Layer-wise coordination and parameter sharing |
35.31 |
No |
One model for a pair of dual tasks [bib] |
Transformer |
8 |
256 |
One model trained for both De->En and En->De |
35.30 ± 0.1 |
No |
Model-level dual learning [bib] |
Transformer |
6 |
256 |
Use two language modules + dual inference |
35.19 |
No |
Learn to teach loss functions [bib] |
Transformer |
6 |
256 |
Dynamic loss function taught by a teacher |
34.80 |
No |
Role interactive layer [bib] |
Transformer |
3 |
256 |
+RIL, a layer between the word embedding and the first hidden layer of the network |
34.74 |
No |
FRAGE [bib] |
Transformer |
5 |
256 |
+ FRAGE (Use GAN to eliminate the differences between rare and popular words) |
33.97 |
github |
Variational attention [bib] |
BiLSTM |
1 |
-- |
variational attention |
33.68 |
github |
Dual transfer learning [bib] |
BiLSTM |
2 |
512 |
Leveraging En->De model to help boost performances |
32.85 |
No |