I have an index of about 100k documents that represent a movie entity.
Users can put films on various lists (like favorites etc.)
These lists are stored in a mysql database and are not indexed in solr.
I could store the user ids in multivalued fields that represent a list, but that is quite bad because the fields would get very, very long and the indexing would be problematic too.
So currently i do the following (pseudocode):
$favorites = SELECT document_id FROM favorites WHERE user_id = $user_id
$documents = 'http://solr.com:8393/select/?q=XYZ&fq=document_id:('.join(' OR ',$favorites);
this works great and fast but the number of items in filter queries is limited to 1024 (i tried that). also filter queries add up. so if i have one filter query with 500 values to filter i can have another values to 524 filters on another field.
It's okay for now because I limited the entries per list to 1024, and that's quite a lot but I think this approach is very clumsy and produces a lot of overhead.
Isn't there a better solution? Like writing a solr module that directly connects to the database or something? I'd like to do it in php.
If there is no other way, can i somehow raise the 1024 limit? because it works very fast now! I think with good hardware more wouldn't be a problem.
Edit: as asked in the comments i here post my original schema and a working example query.
<field name="film_id" type="int" indexed="true" stored="true" required="true"/>
<field name="imdb_id" type="int" indexed="true" stored="true" />
<field name="parent_id" type="int" indexed="true" stored="true"/>
<field name="malus" type="int" indexed="true" stored="true"/>
<field name="type" type="int" indexed="true" stored="true"/>
<field name="year" type="int" indexed="true" stored="true" termVectors="true"/>
<field name="locale_title" type="string" indexed="false" stored="true"/>
<field name="aka_title" type="filmtitle" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
<field name="sort_title" type="string" indexed="true" stored="true"/>
<field name="director" type="person" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
<field name="director_phonetic" type="person_phonetic" multiValued="true" omitNorms="true"/>
<field name="actor" type="person" indexed="true" stored="true" multiValued="true" omitNorms="true"/>
<field name="actor_phonetic" type="person_phonetic" multiValued="true" omitNorms="true"/>
<field name="country" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="description" type="text" indexed="true" stored="true" />
<field name="genre" type="genre" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<field name="url" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="image_url" type="string" indexed="false" stored="true" multiValued="false"/>
<field name="rating" type="int" indexed="true" stored="true" required="false" default="50"/>
<field name="affiliate" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="product_type" type="string" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="product_*" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="blockbuster" type="boolean" indexed="true" stored="true" />
<copyField source="film_id" dest="id"/>
<field name="director_id" type="string" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<field name="actor_id" type="string" indexed="true" stored="true" multiValued="true" termVectors="true"/>
theese are my additions to the default schema.xml
a sample search result can be viewed here.
a sample query would be:
http://my-server.com:8983/solr/select/?
q=description:nazis
&fq=product_bluray:amazon
&fq=film_id:(1185616 1054606 88763 361748 78748)
here the user would search for movies that are:
- available on amazon as a bluray
- that have the term "nazis" in the description
- AND that are on his favorite list
the list includes the movies (documents) with the ids 1185616 1054606 88763 361748 78748 and are stored in the mysql database.
ps: I don't know whether I formulated the question well, I hope its understandable. If not, please feel free to edit!