Guarding the Chat Flow

Guarding the Chat Flow
Photo by Hanna May / Unsplash

Chatbots are very hyped these days, and they come with their own problems. One of them is managing the chat flow.

As a party owner or a company, you want to choose the topic that folks are talking about. The participants love natural conversations. The same thing is needed for the bits and bytes. Users wish to have natural conversations and they also want to do what they can with that conversation with the computer. If it is possible to schedule your calendar go ahead and do it! if it is not able to do it then it needs to say it in an understanding manner - or at least fake it.

Plus these, what if you only want some topics with only a few conversation flows and nothing more... Yes, you are the owner of this party and you only wish then it happen. There is a perfect tool (actually not exactly perfect but it is getting perfect day to day) named NeMo Guardrails that leverages the LLM world for the answer to your question in an understanding manner and it also guardrails your conversation flows. You can find out more info from its documentation and I will not promote this tool more. Instead, I will note what I learned from its codebase and the other tool named Rasa which is also mentioned in the NeMo Guardrails' paper.

This blog includes my understanding of my words and this is the disclaimer part that refuses the wrong information caused by my misunderstanding. But hey, I think that is working too - which might be a new way to do the same thing if that is the case 🤗

Baby Steps: Conversation

Ask yourself - When you start a conversation what are the steps that are occurring in every conversation turn?

Maybe we can summarize a simple conversation turn with these simple steps;

  • Understand the person's intent
    • Is that person trying to ask something?
    • Asking for output
    • etc...
  • Think about what you should do for that input
    • Should you do something
    • Should you just give your thoughts
    • ...
  • Do what needs to be done and give an output for the next turn
    • "Oh maybe you should break up with her"
    • "Got it! Here are some holiday trips for you ..."
    • "..." (just being silent for whatever reason)
  • Wait for the next response.

So, we can say that these are the main 4 parts of a conversation turn (and this is not from the literature 🤫)

Okay, this is great we are a part of the conversation turn and what can we do with it? We may convert this to computational blocks which means "implement this"

For a computer, of course, these are not great at all but still, these words can be converted somehow. So, leaving it as is now but we will get some drawings after to visualize.

Being The Nice Guy

We have learned the parts of the conversation turn, and now you can talk with our platonic but to not be awkward you have to be the nicest guy ever (or the jerk - it depends on who is crushing your heart). Also, you don't want to be caught off guard by talking about what you don't know.

To do this, you have to control the conversation somehow, and the easiest way to do this is with a quote

“Think before you speak. Read before you think.”
Fran Lebowitz

While applying this motto, we have to say "Think before you speak" to the computer with a slight change, which is probably "Take the input and look around if that is okay to talk more." After interpreting the input in your head, you will either say, "Oh, I know that one, and I may say something about it," or "Well, I don't know that one, but I know how to cook pasta!".

You did a good job guarding the input, and you have to control yourself, too! What if you say something that might be a red line for the person you are talking with? I have to say that someone going to sleep alone after years if that is still the case. To overcome this issue, the same process that you have done with the input is the right answer. "Take your output and look around to see if that is okay to talk"

What have we got now?

💡
To find the intent there is more than one strategy;
1. You can use LLM to categorize the input with options,
2. or, you can create an index with user/bot utterances and request for vector search to similar ones. I would recommend https://qdrant.tech/ to give it a try in your local.

Blue ones are what we mentioned in this part. If you are familiar with programming, you may already say that "it is just a middleware/interceptor," and you are right. Like the wise person says, "Why complicate your life?"

The Marble Machine

Alright, I have mentioned the basic components, and now it needs to be dressed up. I am going to tell the secret about "What if there are multiple flows that have the same utterance and you can't decide which one is correct?"

When things are related to deciding who/what is the best, then mostly the answer is to race them and look at their scores. The same thing happens in this condition. We will race every flow, and we will decide the winner for every dialog turn. Also, I will mention why we are deciding every time and how it is fair. Just be patient.

Now, let's define some mock flows and think about their states through and example dialog

This syntax looks like Colang syntax which is a language that is developed under NeMo Guardrails project but the example not actually same. I will use the similar syntax for mocking the process.
flow picking the right flower
  user says "Where tulips are grown?"
  > bot searches and finds where tulips are mostly grown

flow travel planner
  user says "Could you please create a travel plan for us?"
  > bot says "Which country you would like to travel to?"
  user says $region
  > bot plans a travel around $region

  

mock conversation flows

And the dialog goes like this

user > "Could you please create a travel plan for us?"
bot > "Which country you would like to travel to?"

aaand you realize you don't have a good guess but you want to go somewhere that has tulips. Let's think about the natural conversation again - what would be the best thing you can do without thinking this is a machine.

You probably want to ask for the tulip growing places and want to continue the process right? Something like this

user > "Could you please create a travel plan for us?"
bot  > "Which country you would like to travel to?"

user > "Where tulips are grown?"

bot  > ... (searches) "According to Google The Netherlands is the world's main producer"
user > "Let's plan for The Netherlands"
bot  > ... (makes some plans) Here are some places you can visit ...

Looks natural, right? To do this flow, it needs some state to follow per flow, and those states are iterated every dialog turn. You may ask why it is happening for every turn, can't we just update the single matching one the answer is no, since there can be multiple flows with the same utterances.

💡
With an analogy;
If you have seen or played with the toy you move the horses by clicking the buttons. You can imagine every horse as flow and every step is one dialog turn. So, anytime and any horse can jump forward as the first.

In this conflicting scenario, the race comes up. We need to score by the state change history, check the matching ones, and score by how much they match. This gives us which one we should continue and also the ability to change flow in case of any conversation direction change.

💡
I believe this scoring algorithm can vary for different cases but as far as I see NeMo Guardrails scores like if the full match score is 1.00, if not but still matches (fuzzy match) gives less than 1.0 or if not match 0.0 and -1 for mismatch.

Serving the Burrito

These are overall what I learn while reading the code and some documentation around mostly covers how NeMo Guardrails works but not specific to it since there are other alternatives has similar ways/components like Rasa which also mentioned in NeMoGuardrails' paper. This was my first time to read and work for something like this and I found this awesome to write it down and reinforce what I learn.

If you have read this up to here I would like to say thank you and recommend you to be skeptical what you have read.

Before saying thats a wrap it is time to let marbles fall to make some noise

Goodbye!

Reference

Introduction to Rasa Open Source & Rasa Pro
Learn more about open-source natural language processing library Rasa for conversation handling, intent classification and entity extraction in on premise chatbots.
Introduction — NVIDIA NeMo Guardrails latest documentation
GitHub - NVIDIA/NeMo-Guardrails: NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. - NVIDIA/NeMo-Guardrails

Read more