I tested to retrain 100 times one model with 52 intents. Conclussions about that retraining: - Having only Logistic Regression Classifier, scores are always the same - Having also neural, scores are slightly different, but not decreasing, simply moving near the same values
About your problem, without the corpus to check, it's impossible to see what's happening. But I can guess that: - You are having so many utterances - Utterances that should be in the same intent are in different intents. - You train intents with only one utterance.
That's why usully nobody let's a chatbot to be trained by the users. The trainers are choosen by those that knows the business of the chatbot. Because something like this can happen:
user1> train "What is your favourite color" as "color" user2> Train "What is your favourite color" as "favourite_color" user3> Train "What is your color favourite" as "whatever" ...
So perhaps you can share your corpus to check.