I have a large amount of data that I need to sort in a specific way based on a search query, but I'm not sure of the best approach to take.
The data I'm trying to sort is a list of courses, grouped by schools. Each course is taught by one school. Each school may belong to any number of "partnerships", which represents a relationship between a number of schools. A user can search for any number of courses by the name of the course.
I need to sort the data as follows:
Courses are grouped by school, with 10 schools appearing per page.
Schools that can provide every course that the user has searched for should appear first in the list.
After these results, schools that belong to a partnership that can accommodate all of the courses the user searched for should appear next to each other.
Here is an example:
- A teaches History, French and English courses.
- B teaches French and Mathematics.
- C teaches History.
- B and C are in a partnership together.
D teaches History.
The user searches for "History" and "French".
A should appear first in the results, with its History and French courses, as it can provide both of the courses the user is looking for.
B, followed by C appears next, with the relevant course it teaches listed after it, as the partnership can provide both of the user's required courses.
D Appears next as it only provides 1 relevant course.
The data is stored in an Microsoft SQL Server database across a few tables. Here is a simplified schema:
Course:
- int id
- varchar name
- int schoolId
School:
- int id
- varchar name
Partnership:
- int id
- varchar partnershipName
SchoolPartnership:
- int id
- int schoolId
- int partnershipId
There are over 100000 courses and around 300 schools. I don't know of a way to sort the courses as specified in SQL, which I think is my biggest problem. I only need to display 10 results per page, but as I can't do the sorting in the SQL query, I have to extract the entire result set and sort it manually in PHP before I can cut the result set down to 10 results.
I'm currently extracting the data I need in a single query with multiple joins using Doctrine 2, hydrating the results as an array. Then the plan is to manipulate this big array of records in PHP to get it into the correct order. Due to the size of this array, I'm worried that this sorting process is going to be very slow, so I'm looking for advice on how to make this quicker, either by:
- Handling the sorting in the SQL query.
- Suggesting how an algorithm such as the one described could be implemented in a search engine such as Solr (I have a little experience of the basics of this, but not performing complex sorting).
- Suggestions on how best to perform the sorting in PHP, if the other two options are not viable.
EDIT:
I've made some good progress on this, thanks (particularly @Neil). I've opened a separate question up ( Groupwise MAX() on a subquery ), which contains some of my progress so far.