weixin_39627665
weixin_39627665
2021-01-01 03:34

Exited with 429 when offlining

Let me try again with speed=0.1 I guess


Command 'mwoffliner --mwUrl=https://es.vikidia.org/ --adminEmail=contact.org --format=nopic --format=novid --useCache --addNamespaces=102 --redis=redis://redis --outputDirectory=/output' in image 'openzim/mwoffliner:1.8.0' returned non-zero exit status 2: b'Failed to run mwoffliner after [585s]: { Error: Request failed with status code 429
    at createError (/usr/local/lib/node_modules/mwoffliner/node_modules/axios/lib/core/createError.js:16:15)
    at settle (/usr/local/lib/node_modules/mwoffliner/node_modules/axios/lib/core/settle.js:18:12)
    at IncomingMessage.handleStreamEnd (/usr/local/lib/node_modules/mwoffliner/node_modules/axios/lib/adapters/http.js:201:11)
    at IncomingMessage.emit (events.js:194:15)
    at endReadableNT (_stream_readable.js:1125:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
  config:
   { adapter: [Function: httpAdapter],
     transformRequest: { \'0\': [Function: transformRequest] },
     transformResponse: { \'0\': [Function: transformResponse] },
     timeout: 0,
     xsrfCookieName: \'XSRF-TOKEN\',
     xsrfHeaderName: \'X-XSRF-TOKEN\',
     maxContentLength: -1,
     validateStatus: [Function: validateStatus],
     headers:
      { Accept: \'application/json, text/plain, */*\',
        \'User-Agent\': \'axios/0.18.0\' },
     method: \'get\',
     responseType: \'json\',
     url:
      \'https://es.vikidia.org/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\',
     data: undefined },
  request:
   ClientRequest {
     _events:
      [Object: null prototype] {
        socket: [Function],
        abort: [Function],
        aborted: [Function],
        error: [Function],
        timeout: [Function],
        prefinish: [Function: requestOnPrefinish] },
     _eventsCount: 6,
     _maxListeners: undefined,
     output: [],
     outputEncodings: [],
     outputCallbacks: [],
     outputSize: 0,
     writable: true,
     _last: true,
     chunkedEncoding: false,
     shouldKeepAlive: false,
     useChunkedEncodingByDefault: false,
     sendDate: false,
     _removedConnection: false,
     _removedContLen: false,
     _removedTE: false,
     _contentLength: 0,
     _hasBody: true,
     _trailer: \'\',
     finished: true,
     _headerSent: true,
     socket:
      TLSSocket {
        _tlsOptions: [Object],
        _secureEstablished: true,
        _securePending: false,
        _newSessionPending: false,
        _controlReleased: true,
        _SNICallback: null,
        servername: \'es.vikidia.org\',
        alpnProtocol: false,
        authorized: true,
        authorizationError: null,
        encrypted: true,
        _events: [Object],
        _eventsCount: 8,
        connecting: false,
        _hadError: false,
        _handle: [TLSWrap],
        _parent: null,
        _host: \'es.vikidia.org\',
        _readableState: [ReadableState],
        readable: true,
        _maxListeners: undefined,
        _writableState: [WritableState],
        writable: false,
        allowHalfOpen: false,
        _sockname: null,
        _pendingData: null,
        _pendingEncoding: \'\',
        server: undefined,
        _server: null,
        ssl: [TLSWrap],
        _requestCert: true,
        _rejectUnauthorized: false,
        parser: null,
        _httpMessage: [Circular],
        [Symbol(res)]: [TLSWrap],
        [Symbol(asyncId)]: 55658,
        [Symbol(lastWriteQueueSize)]: 0,
        [Symbol(timeout)]: null,
        [Symbol(kBytesRead)]: 0,
        [Symbol(kBytesWritten)]: 0,
        [Symbol(connect-options)]: [Object] },
     connection:
      TLSSocket {
        _tlsOptions: [Object],
        _secureEstablished: true,
        _securePending: false,
        _newSessionPending: false,
        _controlReleased: true,
        _SNICallback: null,
        servername: \'es.vikidia.org\',
        alpnProtocol: false,
        authorized: true,
        authorizationError: null,
        encrypted: true,
        _events: [Object],
        _eventsCount: 8,
        connecting: false,
        _hadError: false,
        _handle: [TLSWrap],
        _parent: null,
        _host: \'es.vikidia.org\',
        _readableState: [ReadableState],
        readable: true,
        _maxListeners: undefined,
        _writableState: [WritableState],
        writable: false,
        allowHalfOpen: false,
        _sockname: null,
        _pendingData: null,
        _pendingEncoding: \'\',
        server: undefined,
        _server: null,
        ssl: [TLSWrap],
        _requestCert: true,
        _rejectUnauthorized: false,
        parser: null,
        _httpMessage: [Circular],
        [Symbol(res)]: [TLSWrap],
        [Symbol(asyncId)]: 55658,
        [Symbol(lastWriteQueueSize)]: 0,
        [Symbol(timeout)]: null,
        [Symbol(kBytesRead)]: 0,
        [Symbol(kBytesWritten)]: 0,
        [Symbol(connect-options)]: [Object] },
     _header:
      \'GET /w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml HTTP/1.1\\r\
Accept: application/json, text/plain, */*\\r\
User-Agent: axios/0.18.0\\r\
Host: es.vikidia.org\\r\
Connection: close\\r\
\\r\
\',
     _onPendingData: [Function: noopPendingOutput],
     agent:
      Agent {
        _events: [Object],
        _eventsCount: 1,
        _maxListeners: undefined,
        defaultPort: 443,
        protocol: \'https:\',
        options: [Object],
        requests: {},
        sockets: [Object],
        freeSockets: {},
        keepAliveMsecs: 1000,
        keepAlive: false,
        maxSockets: Infinity,
        maxFreeSockets: 256,
        maxCachedSessions: 100,
        _sessionCache: [Object] },
     socketPath: undefined,
     timeout: undefined,
     method: \'GET\',
     path:
      \'/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\',
     _ended: true,
     res:
      IncomingMessage {
        _readableState: [ReadableState],
        readable: false,
        _events: [Object],
        _eventsCount: 3,
        _maxListeners: undefined,
        socket: [TLSSocket],
        connection: [TLSSocket],
        httpVersionMajor: 1,
        httpVersionMinor: 1,
        httpVersion: \'1.1\',
        complete: true,
        headers: [Object],
        rawHeaders: [Array],
        trailers: {},
        rawTrailers: [],
        aborted: false,
        upgrade: false,
        url: \'\',
        method: null,
        statusCode: 429,
        statusMessage: \'\',
        client: [TLSSocket],
        _consuming: false,
        _dumped: false,
        req: [Circular],
        responseUrl:
         \'https://es.vikidia.org/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\',
        redirects: [] },
     aborted: undefined,
     timeoutCb: null,
     upgradeOrConnect: false,
     parser: null,
     maxHeadersCount: null,
     _redirectable:
      Writable {
        _writableState: [WritableState],
        writable: true,
        _events: [Object],
        _eventsCount: 2,
        _maxListeners: undefined,
        _options: [Object],
        _ended: true,
        _ending: true,
        _redirectCount: 0,
        _redirects: [],
        _requestBodyLength: 0,
        _requestBodyBuffers: [],
        _onNativeResponse: [Function],
        _currentRequest: [Circular],
        _currentUrl:
         \'https://es.vikidia.org/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\' },
     [Symbol(isCorked)]: false,
     [Symbol(outHeadersKey)]:
      [Object: null prototype] { accept: [Array], \'user-agent\': [Array], host: [Array] } },
  response:
   { status: 429,
     statusText: \'\',
     headers:
      { date: \'Fri, 15 Mar 2019 19:01:41 GMT\',
        \'content-length\': \'0\',
        connection: \'close\',
        \'set-cookie\': [Array],
        \'strict-transport-security\': \'max-age=15552000; includeSubDomains; preload\',
        \'x-content-type-options\': \'nosniff\',
        \'expect-ct\':
         \'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"\',
        server: \'cloudflare\',
        \'cf-ray\': \'4b80bcc829f7239c-IAD\' },
     config:
      { adapter: [Function: httpAdapter],
        transformRequest: [Object],
        transformResponse: [Object],
        timeout: 0,
        xsrfCookieName: \'XSRF-TOKEN\',
        xsrfHeaderName: \'X-XSRF-TOKEN\',
        maxContentLength: -1,
        validateStatus: [Function: validateStatus],
        headers: [Object],
        method: \'get\',
        responseType: \'json\',
        url:
         \'https://es.vikidia.org/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\',
        data: undefined },
     request:
      ClientRequest {
        _events: [Object],
        _eventsCount: 6,
        _maxListeners: undefined,
        output: [],
        outputEncodings: [],
        outputCallbacks: [],
        outputSize: 0,
        writable: true,
        _last: true,
        chunkedEncoding: false,
        shouldKeepAlive: false,
        useChunkedEncodingByDefault: false,
        sendDate: false,
        _removedConnection: false,
        _removedContLen: false,
        _removedTE: false,
        _contentLength: 0,
        _hasBody: true,
        _trailer: \'\',
        finished: true,
        _headerSent: true,
        socket: [TLSSocket],
        connection: [TLSSocket],
        _header:
         \'GET /w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml HTTP/1.1\\r\
Accept: application/json, text/plain, */*\\r\
User-Agent: axios/0.18.0\\r\
Host: es.vikidia.org\\r\
Connection: close\\r\
\\r\
\',
        _onPendingData: [Function: noopPendingOutput],
        agent: [Agent],
        socketPath: undefined,
        timeout: undefined,
        method: \'GET\',
        path:
         \'/w/api.php?action=parse&format=json&page=Argumento_de_Dragon_Ball_Z&prop=modules%7Cjsconfigvars%7Cheadhtml\',
        _ended: true,
        res: [IncomingMessage],
        aborted: undefined,
        timeoutCb: null,
        upgradeOrConnect: false,
        parser: null,
        maxHeadersCount: null,
        _redirectable: [Writable],
        [Symbol(isCorked)]: false,
        [Symbol(outHeadersKey)]: [Object] },
     data: \'\' } }


**********

Request failed with status code 429

**********

该提问来源于开源项目:openzim/mwoffliner

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

18条回答

  • weixin_39619858 weixin_39619858 4月前

    We really need to have an efficient and normalized way to deal with this error. We have new recent cases on the Zimfarm.

    点赞 评论 复制链接分享
  • weixin_39611308 weixin_39611308 4月前

    We really need to have an efficient and normalized way to deal with this error. We have new recent cases on the Zimfarm.

    Any particular wiki I should be looking for in zimfarm?

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    This happen with Wikimedia web site... and like any 429 error this happens when the scraper sends too many request from the same IP. I want a robust system where each time this happen the request is retried and the speed is redruced. Please explain here the system you plan to do before implementing it.

    点赞 评论 复制链接分享
  • weixin_39875842 weixin_39875842 4月前

    This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    Could you please have a look to this please?

    点赞 评论 复制链接分享
  • weixin_39939665 weixin_39939665 4月前

    I managed to scrape this wiki with no issues (now re-running with --addNamespaces=102).

    We have a backoff strategy with 5 retries on all http requests

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    If I look the log, it does not looks like that it retries. The HTTP 429 should also be handled differently as a standard error. See #496. So why: * It does not retry? * We see nothing to decreate the speed?

    点赞 评论 复制链接分享
  • weixin_39627665 weixin_39627665 4月前

    It might be due to me running multiple vikidia offliners at the same time. Maybe that is why running one at a time will be successful

    点赞 评论 复制链接分享
  • weixin_39939665 weixin_39939665 4月前

    With verbose logging on you will see a log message like: Failed to get [XX] [XX] times

    We don't log each individual failure and retry. Perhaps we should

    点赞 评论 复制链接分享
  • weixin_39939665 weixin_39939665 4月前

    627

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    We should log each individual retry and we should log the speed changes. mwoffliner should not crash because of HTTP429, it should slow down the speed until the error 429 disappear, this was the purpose of #496

    点赞 评论 复制链接分享
  • weixin_39939665 weixin_39939665 4月前

    I agree, we should never crash because of a failed request, I need to look into how/why this one wasn't handled properly.

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    I tried again with 1.8.1 same problem:

    
    **********
    
    Request failed with status code 429
    
    **********
    
    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    Also with en.wikipedia.org.

    点赞 评论 复制链接分享
  • weixin_39939665 weixin_39939665 4月前

    The changes above were made after publishing 1.8.1, I wouldn't expect it to have been fixed since

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    This has not been fixed (properly) still failing to adapt the speed. See here for example https://farm.openzim.org/pipeline/5e4a0fda922acabdac6e000f/debug

    点赞 评论 复制链接分享
  • weixin_39619858 weixin_39619858 4月前

    Do you think this still can happen after you have fixed #1051?

    点赞 评论 复制链接分享
  • weixin_39611308 weixin_39611308 4月前

    Yes, this issue can happen after my fix, as I can see in logs it fails during fetching article id's from namespace. And what was done to fix #1051 was specifically handled during downloading content.

    点赞 评论 复制链接分享

相关推荐