7.3.1 Overview of chatbot technologies and RASA framework

Rasa is an open source machine learning framework for automated text and voice-based conversations. The documentation for the 2.x version, which we use, can be accessed through this link. Here we will give a brief explanation of the workings of a chatbot.

The user interacts with the chatbot through a messaging interface, which can be a website, messaging app, or voice assistant. The user sends messages, and the chatbot responds with text or voice.

Natural language understanding (NLU) is a crucial component that processes the preprocessed messages. It involves tasks like intent recognition (understanding what the user wants) and entity recognition (identifying specific pieces of information within the user's message).

Based on the recognized intent and entities, the chatbot's dialogue management system decides how to respond. This involves selecting the appropriate response from a pool of predefined responses or generating a dynamic response.

If the response is not a straightforward static message, Natural Language Generation (NLG) comes into play. It generates human-like text that is coherent and contextually relevant to the user's query.

In some cases, the chatbot may need to interact with external systems or APIs to fetch information or perform specific actions, like retrieving weather data or booking flights.

Finally, the chatbot assembles the final response, which may include text, images, or links, and sends it back to the user through the messaging interface.

How does a chatbot learn?

The learning process of a chatbot involves collecting conversation data from user interactions, which is used to train machine learning models (NLU/NLG) responsible for understanding user intents and generating responses. Human annotators or linguistic tools label the intents and entities in the conversation data to create labeled training sets. Machine learning models are trained on this data to recognize patterns and context in user input. User feedback, both positive and negative, further guides the system's learning. Advanced chatbots may employ reinforcement learning to fine-tune responses based on feedback. Periodic updates to the models incorporate new data and insights, followed by rigorous testing and validation. Continuous monitoring ensures ongoing improvement, while domain experts can provide expertise to enhance accuracy and domain-specific knowledge. This iterative process allows the chatbot to continually refine its understanding and responses, resulting in a more effective and user-friendly conversational experience.

Overall, building a chatbot involves integrating various components such as NLU, dialogue management, and NLG to create a seamless conversational experience for users. It's an iterative process that requires collaboration between developers, linguists, domain experts, and user feedback to refine the chatbot's performance and capabilities.

A chatbot doesn’t really learn the language itself from the provided training data. The most common practice is that the chatbot framework utilizes a foundational language model as its starting point, which serves as a fundamental understanding of the form of the language. This base language model is pre-trained on vast amounts of text from the internet, allowing it to grasp grammar, vocabulary, and context. However, this general language model needs customization to serve a particular purpose. The chatbot developer then fine-tunes the model using curated training data, which includes examples of user interactions specific to the desired domain or task. This adaptation process refines the language model's responses, ensuring that the chatbot can comprehend intents, generate relevant responses, and engage users effectively within the intended context.

Training data format

Depending on the chatbot framework you use, the exact format of the training data can vary. RASA, for example, uses YAML format files for storing NLU training data, answers, stories, etc. For more information, refer to the official documentation.

Last updated