I have built RSS, twitter, and other content aggregators for clients using php/Mysql. It typically involves a cron job, some feed parsing and inserting data into a database for storing and later re-publishing, or deleting, or archiving, etc. Nothing ground-breaking.
But now I am tasked with building an aggregator service for a public audience. I imagine this will need to scale quickly as each person with access to the service can add dozens, if not hundred of source feeds. Within a few months we may be regularly parsing 1000's of feeds and maybe 100,000 within a year, or more with any luck.
I guess the ultimate model is something similar to what google reader does.
So, what is a good strategy for this? Multiple, overlapping crons, continuously running and reading feeds and connecting to APIs to pull content? Should I plan to run multiple instances of Elastic Cloud or something as need grows?