Machine translation has achieved great success in the past few decades. The emergence and development of neural machine translation NMT have pushed the performance and practicality of machine translation to new heights. While NMT has obtained state-of-the-art results as a new paradigm, it still suffers from many drawbacks introduced by the new framework and machine translation itself.
The standard NMT usually builds a translation model from source to target. The modeling and training procedures in NMT are independent without the interaction with other NMT models such as an inverse translation model. In this book, we propose approaches to jointly training two directional NMT models, including the following topics:
1. Improving attentional mechanism: The attentional mechanism has proved to be effective in capturing long dependencies in NMT. However, due to the intricate structural divergence between natural languages, unidirectional attention-based models might only capture partial aspects of attentional regularities. We propose agreement-based joint training to encourage the two complementary models to agree on word alignment matrices on the same training data.
2. Incorporating monolingual corpora: NMT systems heavily rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled parallel corpora and unlabeled monolingual corpora data. The semi-supervised approach uses an autoencoder to reconstruct monolingual corpora, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively.
3. Improving pivot-based translation: NMT systems suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target lan- guages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for
pivot-based NMT. We are committed to connecting two models closely to enable them to interact with each other.
4. Integrating bidirectional dependencies: The standard NMT only captures uni- directional dependencies to model the translation procedure from source to target. Nevertheless, the inverse information is explicitly available to reinforce the con?dence of the translation process. We propose an end-to-end bidirec- tional NMT model to connect the source-to-target and target-to-source transla- tion models, which opens up the interaction of parameters between two directional models. A contrastive learning approach is also adopted to further enhance the information sharing.
This book not only introduces four interesting research works that propose a novel idea of combining multiple NMT directional models but also covers the basic techniques of NMT and some potential research directions. It can make novice researchers enter the NMT ?eld quickly and broaden their view for the advanced development of NMT.
Beijing, China Dr. Yong Cheng
June 2019