I currently have the problem that I have a dataset of about 1000 entries.
Each entry has two relevant features:
-
weight
(float) -
origin
(string / another entity)
I have to sort those entries into groups of max. four entries. Groups can contain less entries, though.
- group of four entries: very good
- group of three entries: good
- group of two entries: not so good
- group of one entry: really bad (but possible if not avoidable)
Now, the way those entries are being sorted into the groups depend on their features in the following way:
- the max. delta of
weight
within a group can be 10%. - there should be as many different values for
origin
as possible in each group. Having one duplicate is not so bad, but having three or more entries with the sameorigin
should be avoided.
Within the dataset weight
has a range of roughly 20.0 to 120.0.
There are about 50 different possible values for origin
.
I have to implement this in php, but answering with a php implementation is not necessary. The algorithm alone would be enough.
I have tried sorting all values for their weight
and then simply split them every fourth entry. But the groups I then get are hard to rearrange with regard to the origin
value. I think I could somehow get this done through a nasty implementation, but I hope there is a very elegant algorithm that can do just that.
Thanks in advance!