dsb238100 2017-12-19 02:17
浏览 78

PHP:索引大型RSS源数组

Currently, I am retrieving individual RSS feeds and storing the data that I need from them in a JSON format like this for every source (like 100):

{
"status": "ok",
"source": "source-string",
"sortBy": "top",
"unixTimeStampLastUpdated": 1513555729,
"articles": [{
    "author": null,
    "title": "Article Title",
    "description": "Short Description",
    "urlToImage": null,
    "publishedAt": 1513536447,
    "id": "2017-12_5a370775559fa"
},
 ...and so on

I store a monthly JSON file for each source (about 100 sources) in that format.

From that, I generate pages based on the sources monthly JSON file. For each of the articles listed it has a unique ID that needs to point to something on my server; to do this, I have an ENORMOUS monthly array of just the article IDs and a few of their attributes, like this:

{
"2017-12_5a3701fb89c99": {
    "title": "Sample Article Title",
    "url": "https:\/\/www.example.com\/",
    "feed": "the-source",
    "origin": "2017-12"
},
"2017-12_5a3701fba9c9a": {
    "title": "Sample Article Title",
    "url": "https:\/\/www.example.com\/",
    "feed": "the-source-2",
    "origin": "2017-12"
},

My Question:

What is the best way to retrieve articles, index them, display them, and act on the callbacks of them (ID); lighting fast and organized?

I am not sure if a SQL Database will solve my problems, as I have not had to set one up yet and I think this could be simpler...

Is there a way that I could do this with each article listed in only 1 JSON file instead of it being reference in a few places? Or would it lack speed?

Any input would be greatly appreciated!

  • 写回答

1条回答 默认 最新

  • doudao7113 2017-12-19 02:39
    关注

    Sounds like your data isn't terribly relational and you want:

    1. A key-value/document store. [fast retrieval, eg: id -> json doc]
    2. Something to build/search indexes overtop of data with loose schemas. [fast search, eg: author -> doc id]

    Welcome to NoSQL land.

    There are plenty of simple services that each accomplish one task or the other, [eg: Lucene or Solr for search] and plenty of consolidated services that accomplish both. If you're running this app in a public cloud somewhere [eg: AWS DynamoDB, GCP Datastore] then chances are they already have a service that does what you want, otherwise you're probably going to want to look into something like Couchbase, Cassandra, or Elasticsearch.

    I've tried to be as broad as possible, so as not to ignite a holy war, but your question itself really rides the line for "Too Broad" and "Primarily Opinion-based" to begin with.

    Lastly, if all this is too daunting you can always cobble together loose approximations of NoSQL systems inside of an RDBMS. In fact, Postgres has some fairly nice tools for interacting with schemaless data.

    评论

报告相同问题?

悬赏问题

  • ¥15 关于smbclient 库的使用
  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画