dphw5101 2013-02-09 17:00
浏览 62
已采纳

针对大量数据的自定义排序算法

I have a large amount of data that I need to sort in a specific way based on a search query, but I'm not sure of the best approach to take.

The data I'm trying to sort is a list of courses, grouped by schools. Each course is taught by one school. Each school may belong to any number of "partnerships", which represents a relationship between a number of schools. A user can search for any number of courses by the name of the course.

I need to sort the data as follows:

  • Courses are grouped by school, with 10 schools appearing per page.

  • Schools that can provide every course that the user has searched for should appear first in the list.

  • After these results, schools that belong to a partnership that can accommodate all of the courses the user searched for should appear next to each other.

Here is an example:

  • A teaches History, French and English courses.
  • B teaches French and Mathematics.
  • C teaches History.
  • B and C are in a partnership together.
  • D teaches History.

  • The user searches for "History" and "French".

  • A should appear first in the results, with its History and French courses, as it can provide both of the courses the user is looking for.

  • B, followed by C appears next, with the relevant course it teaches listed after it, as the partnership can provide both of the user's required courses.

  • D Appears next as it only provides 1 relevant course.

The data is stored in an Microsoft SQL Server database across a few tables. Here is a simplified schema:

Course:

  • int id
  • varchar name
  • int schoolId

School:

  • int id
  • varchar name

Partnership:

  • int id
  • varchar partnershipName

SchoolPartnership:

  • int id
  • int schoolId
  • int partnershipId

There are over 100000 courses and around 300 schools. I don't know of a way to sort the courses as specified in SQL, which I think is my biggest problem. I only need to display 10 results per page, but as I can't do the sorting in the SQL query, I have to extract the entire result set and sort it manually in PHP before I can cut the result set down to 10 results.

I'm currently extracting the data I need in a single query with multiple joins using Doctrine 2, hydrating the results as an array. Then the plan is to manipulate this big array of records in PHP to get it into the correct order. Due to the size of this array, I'm worried that this sorting process is going to be very slow, so I'm looking for advice on how to make this quicker, either by:

  • Handling the sorting in the SQL query.
  • Suggesting how an algorithm such as the one described could be implemented in a search engine such as Solr (I have a little experience of the basics of this, but not performing complex sorting).
  • Suggestions on how best to perform the sorting in PHP, if the other two options are not viable.

EDIT:

I've made some good progress on this, thanks (particularly @Neil). I've opened a separate question up ( Groupwise MAX() on a subquery ), which contains some of my progress so far.

  • 写回答

3条回答 默认 最新

  • doucigua0278 2013-02-09 20:56
    关注

    To find schools by the number of matching courses is simple:

    SELECT schoolId, COUNT(*) AS schoolCount
      FROM Courses
      WHERE name IN ('History', 'French')
      GROUP BY schoolId
    

    If this was all you needed, you could ORDER BY schoolCount DESC to get them in the order you want.

    To find partnerships with matching courses, you first need to find the partnerships that have the course at at least one school:

    SELECT partnershipId, COUNT(DISTINCT name) AS partnershipCount
      FROM SchoolPartnership
      INNER JOIN Courses ON Course.schoolId = SchoolPartnership.schoolId
      WHERE name IN ('History', 'French')
      GROUP BY partnershipId
    

    Note that DISTINCT is needed because we don't care how many schools in the partnership have that course. If you don't have DISTINCT then you could use a subselect instead:

    SELECT partnershipId, COUNT(*) AS partnershipCount
      FROM (
        SELECT DISTINCT partnershipId, name
          FROM SchoolPartnership
          INNER JOIN Courses ON Course.schoolId = SchoolPartnership.schoolId
          WHERE name IN ('History', 'French'))
      GROUP BY partnershipId
    

    You can then use the first and last query above as subselects in a join with SchoolPartnership to order the schools in descending order of partnershipMatches and schoolMatches. (Note that I assume that all schools are in a partnership of at least one school.) I think the final query will look like this:

    SELECT SchoolMatches.schoolID
      FROM (
        SELECT schoolId, COUNT(*) AS schoolCount
          FROM Courses
          WHERE name IN ('History', 'French')
          GROUP BY schoolId
      ) SchoolMatches
      JOIN SchoolPartnership ON SchoolMatches.schoolID = SchoolPartnership.schoolID
      JOIN (
        SELECT partnershipId, COUNT(DISTINCT name) AS partnershipCount
          FROM SchoolPartnership
          INNER JOIN Courses ON Course.schoolId = SchoolPartnership.schoolId
          WHERE name IN ('History', 'French')
          GROUP BY partnershipId
       ) PartnershipMatches ON SchoolPartnership.schoolId = PartnershipMatches.schoolId
       ORDER BY PartnershipMatches.partnershipCount DESC, SchoolMatches.SchoolCount DESC
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line