IWSLT 2014 German→English

Paper	Basic Architecture	#Layer	#Hidden	Algorithm	BLEU	Open-Sourced
Multi-agent dual learning [bib]	Transformer	8	256	The output model is boosted by the duality bridged by multiple models	35.56	No
Tied transformer [bib]	Transformer	8	384	A group of parameters shared by encoder and decoder	35.52	No
Layer-wise coordination [bib]	Transformer	18	256	Layer-wise coordination and parameter sharing	35.31	No
One model for a pair of dual tasks [bib]	Transformer	8	256	One model trained for both De->En and En->De	35.30 ± 0.1	No
Model-level dual learning [bib]	Transformer	6	256	Use two language modules + dual inference	35.19	No
Learn to teach loss functions [bib]	Transformer	6	256	Dynamic loss function taught by a teacher	34.80	No
Role interactive layer [bib]	Transformer	3	256	+RIL, a layer between the word embedding and the first hidden layer of the network	34.74	No
FRAGE [bib]	Transformer	5	256	+ FRAGE (Use GAN to eliminate the differences between rare and popular words)	33.97	github
Variational attention [bib]	BiLSTM	1	--	variational attention	33.68	github
Dual transfer learning [bib]	BiLSTM	2	512	Leveraging En->De model to help boost performances	32.85	No