This chapter provides guidelines and best practices that Conversational AI developers can use when working with their custom data, leveraging Omilia’s Machine Learning technology.
NLU model development and evaluation best practices
This chapter provides guidelines and best practices around how to best tune, train & evaluate your NLU model to get optimal performance.
Creating your own intents
Follow a top-down approach when you build your intents. Define generic intents and build your way up to more complex ones. For example, if you want to build intents around Bank Account Balance inquiries, you can work your way up following the methodology below. Keep in mind that it all comes down to the level of understanding you want to get to and the effort you want to invest in the model’s preparation.
Use at least 5 utterances per intent to get decent accuracy.
Create a generic
Accountintent with the following utterances:
it's about my account
I want to ask something about my account
question about an account
inquiry about my account
Move to the more specific
Account.Balanceintent and use phrases like the following:
an inquiry about my account’s balance
can I get my balance, please
I want the balance of my account
account remaining balance
whats my account remaining balance
Training your model
The best guidelines and practices for training an NLU model are given below.
intents you create should have a minimum of 5 utterances per intent.
The more utterances you use per intent, the better.
Keep your intents balanced. The intent with the most utterances, should not have more than double the utterance count than of the one with the least amount of utterances.
Use a variety of different utterances for each intent. It increases the model’s generalization.
Do not use the same utterance per intent more than once. Duplicates are NOT taken into consideration when training the model.
Adjust the number of utterances and their corresponding generalization according to the granularity of your intents.
Creating similar intents requires a more careful selection of utterances.
Similar utterances for different intents can lead to poor performance.
Create meta intents in cases you see that your end customers express such requests. For example:
In case of absence of the intent
Meta.Negative, the utterance "No, I do not want to know my balance" could be mistakenly identified as the
Could you repeat please? → intent
If needed, create
OOScopeintents, that is out-of-scope intents.
These are in-domain intents that you are aware of but have intentionally chosen not to service them.
Delete my account? →
Account-Deletion→ could be out-of-scope for your banking agent.
Once your solution goes live, it may be exposed to out-of-domain utterances that are considered in-domain. Periodically populate an
Unknownintent with the incoming confusing utterances.
Work/life balance →
Avoid special characters like
#, and so on.
Evaluation of a trained model
The evaluation of a trained model requires an evaluation set. The following best practices can help you draft a proper one:
Avoid using the same utterances that you already used to train your model.
Your evaluation set must include all the intents you built. Do NOT include xPack intents since they are not part of the model’s training process since they are already pre-tuned and ready to be used on runtime.
Your evaluation set must be as balanced as possible.
Adopt the same text formatting for both training and evaluation utterances (upper/lower case, punctuation, and so on).
See some insightful use cases below which can help you better understand what it means to NOT follow the best practice guides discussed in this section.
Your evaluation set utterances are identical to the ones you used for training your model. →
Your evaluation set only includes a single intent, the model’s favorable one (the one with the most training utterances). →
Your evaluation set contains all the training set intents, but it is heavily imbalanced. For example, 990 utterances for the model’s favorable intent, and one utterance for the rest of the 10 intents). →
Accuracy 99%(even though it fails on 10 intents and succeeds in 1).