2020-12-26 08:01

Crawler issues

The crawler will continue to ping remote URLs even when there are issues being reported by the Elasticsearch backend, such as when disk space runs low and it switches to read-only mode. While I missed the error from the crawler, this will will be output from the search side:

INFO 2018/04/29 00:16:04 search.go:327: elastic: Error 403 (Forbidden): blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]; [type=cluster_block_exception]

It might be better to trap this condition and report on its cause, then exit gracefully, or perhaps catch it in the search subsystem when this shows up in the logs:

Also related: the crawler will error out when the Elasticsearch daemon goes offline.


