Language AI Playbook
  • 1. Introduction
    • 1.1 How to use the partner playbook
    • 1.2 Chapter overviews
    • 1.3 Acknowledgements
  • 2. Overview of Language Technology
    • 2.1 Definition and uses of language technology
    • 2.2 How language technology helps with communication
    • 2.3 Areas where language technology can be used
    • 2.4 Key terminology and concepts
  • 3. Partner Opportunities
    • 3.1 Enabling Organizations with Language Technology
    • 3.2 Bridging the Technical Gap
    • 3.3 Dealing with language technology providers
  • 4. Identifying Impactful Use Cases
    • 4.1 Setting criteria to help choose the use case
    • 4.2 Conducting A Needs Assessment
    • 4.3 Evaluating What Can Be Done and What Works
  • 5 Communication and working together
    • 5.1 Communicating with Communities
    • 5.2 Communicating and working well with partners
  • 6. Language Technology Implementation
    • 6.1 Navigating the Language Technology Landscape
    • 6.2 Creating a Language-Specific Peculiarities (LSP) Document
    • 6.3 Open source data and models
    • 6.4 Assessing data and model maturity
      • 6.4.1 Assessing NLP Data Maturity
      • 6.4.2 Assessing NLP Model Maturity:
    • 6.5 Key Metrics for Evaluating Language Solutions
  • 7 Development and Deployment Guidelines
    • 7.1 Serving models through an API
    • 7.2 Machine translation
      • 7.2.1 Building your own MT models
      • 7.2.2 Deploying your own scalable Machine Translation API
      • 7.2.3 Evaluation and continuous improvement of machine translation
    • 7.3 Chatbots
      • 7.3.1 Overview of chatbot technologies and RASA framework
      • 7.3.2 Building data for a climate change resilience chatbot
      • 7.3.3 How to obtain multilinguality
      • 7.3.4 Components of a chatbot in deployment
      • 7.3.5 Deploying a RASA chatbot
      • 7.3.6 Channel integrations
        • 7.3.6.1 Facebook Messenger
        • 7.3.6.2 WhatsApp
        • 7.3.6.3 Telegram
      • 7.3.7 How to create effective NLU training data
      • 7.3.8 Evaluation and continuous improvement of chatbots
  • 8 Sources and further bibliography
Powered by GitBook
On this page
  1. 7 Development and Deployment Guidelines
  2. 7.2 Machine translation

7.2.1 Building your own MT models

Previous7.2 Machine translationNext7.2.2 Deploying your own scalable Machine Translation API

Last updated 1 year ago

In the field of machine translation, pre-trained models have brought a significant shift, offering valuable starting points for creating translation systems. Platforms like Hugging Face provide access to a range of pre-trained models suitable for various language pairs and directions. For instance, the repository houses both unidirectional and multilingual models trained using parallel data sourced from OPUS. Evaluations of these models across different benchmark datasets are accessible at.

Recently, multilingual models have gained prominence, accommodating multiple language directions simultaneously. stands out, supporting a remarkable 200 languages. However, it's crucial to acknowledge that translation quality can differ substantially among various language pairs as the data quality and size differ substantially.

The BLEU scores of the for the NLLB model offer insights into its performance across different language directions. While these models offer impressive capabilities, their practicality necessitates customization to suit specific language pairs and domains.

These pre-trained models are advantageous due to their adaptability. Researchers can download these models and further fine-tune them using their collected data given that they have access to the right computational resources.

In 2022, CLEAR Global has collaborated with Digital Umuganda to fine-tune the NLLB model in Kinyarwanda on two domains: Finance education and Tourism. You can find the source code for training and evaluation scripts together with the results in

Several other well-known machine translation frameworks have also gained traction in the MT landscape. Each of these frameworks brings its own strengths and features to the table, catering to different needs and preferences in the realm of machine translation.

, for instance, stands out as an open-source toolkit that provides comprehensive support for neural machine translation. With its modular design, OpenNMT offers flexibility in building and fine-tuning translation models.

You can consult for training OpenNMT models. It contains the necessary scripts and short instructions for training and evaluating a neural machine translation system from scratch.

was created with the objective of providing a unified interface for loading, training, and storing Transformer models, streamlining the process for NLP practitioners. Notably, the library boasts features such as ease of use, enabling users to download and employ cutting-edge NLP models for inference using just a few lines of code. Moreover, the library seamlessly integrates with the open model and datasets in their platform, making loading of a dataset or a model as easy as one line in Python. Refer to their for a general overview of NLP in practice, and the for a deeper understanding of developing translation models.

is another notable framework that focuses on efficiency and high performance. It leverages advanced optimization techniques to achieve fast and accurate translations, making it a preferred choice for various MT applications.

is recognized for its user-friendly interface and ease of use. Built on top of PyTorch, JoeyNMT simplifies the process of creating, training and deploying translation models. Its straightforward configuration and accessibility have made it popular among developers and researchers.

is an open-source sequence-to-sequence framework for Neural Machine Translation built on. It implements distributed training and optimized inference for state-of-the-art models, powering and other MT applications.

Helsinki-NLP
https://opus.nlpl.eu/dashboard/
Meta AI's NLLB model
FLORES-200 benchmark set
this link.
OpenNMT
CLEAR Global’s codebase
HuggingFace Transformers library
NLP course
Translation section
MarianNMT
JoeyNMT
Sockeye
PyTorch
Amazon Translate