I am working on a web application that tracks helpdesk entries. We want to find a way to prevent people from copying and pasting their notes regarding common issues - we want original helpdesk entries to be written for every trouble-call.
In any case, we have thousands of entries and some of them are similar, I am trying to find a way of comparing them all to eachother and pointing out any entries that are very similar to others, i.e. 80% likely to be a direct copy, etc.
I've looked into similar_text() and a few other built-in PHP functions, but I am interested in hearing if anyone else has done something similar before. I don't believe I can use similar_text() efficiently since I need to compare multiple entries against each other, not two strings.
Any input is appreciated.