I'm creating a website that will have the admin upload documents available only to the paid members of the website. But what I do want is that the search engines crawls or indexes the document, so that it appears in the search engine search results. Documents include DOC, DOCX and PDF.
For example i have a document that has this text: "the quick brown fox jumped over the lazy dog". Now someone Google's "brown fox". Assuming that I have the standings, I would want the result to appear in the Google results. When the user clicks on it, I want that the user lands on a page, instead of the document, where there is a preview of the text with a link to be a member to view full document.
I planned that the preview of the document on the page will be saved into the database when the document is uploaded. So it is easily visible and crawl-able. For the full document, I could only figure to allow the full document to be crawled. But I think if I allow the search engine to crawl, then I'll be giving access to the users aswell. And if I use htaccess to keep the documents from being accessed directly then I'm shutting the crawlers out too.
I also considered extracting all document text and putting it in the database, but I read somewhere that it is very difficult to distinguish between a user and a spider, and using user agents is a bad idea as it is very easy to spoof.
So I'm confused as to how I should go about this. Any help will be appreciated. Thank you in advance!