weixin_39986973
weixin_39986973
2020-12-28 19:42

[Backup NG] Merge operation timed out unexplicably

Context

  • XO origin: the sources
  • Versions:
  • Node: 8.11
  • xo-web: 5.28.0
  • xo-server: 5.28.0
  • XCP-ng 7.5

Nightly Delta Backup NG job with 13 VMs.

Expected behavior

Merge operation completes correctly.

Current behavior

2 merge operations out of 11 failed unexplicably. Nothing in system logs indicate any problem.


transfer ✔
    Start time: Sunday, November 4th 2018, 2:17:13 am
    End time: Sunday, November 4th 2018, 2:22:04 am
    Duration: 5 minutes
    Size: 6.82 GiB
    Speed: 24.02 MiB/s 

merge 🚨
    Start time: Sunday, November 4th 2018, 2:22:04 am
    End time: Sunday, November 4th 2018, 2:25:21 am
    Duration: 3 minutes
    Error: operation timed out 

transfer ✔
    Start time: Sunday, November 4th 2018, 2:22:35 am
    End time: Sunday, November 4th 2018, 2:24:34 am
    Duration: 2 minutes
    Size: 5.37 GiB
    Speed: 46.44 MiB/s 

merge 🚨
    Start time: Sunday, November 4th 2018, 2:24:34 am
    End time: Sunday, November 4th 2018, 2:30:16 am
    Duration: 6 minutes
    Error: operation timed out 

Any ideas why this could happen? Why such a quick timeout? All other merge operations completed rapidly with merge speeds between 50 and 90MB/s. It can't be a network glitch because the remote is actually a local mount point on the system: a dedicated raid10 volume for backups.

Also, it's not clear what I need to do (if anything) to recover from this. Do I have to delete the VM snapshot for these two so that it runs a full again next backup run?

该提问来源于开源项目:vatesfr/xen-orchestra

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

9条回答

  • weixin_39988779 weixin_39988779 4月前

    Thanks for your report, this has been fixed by #3632.

    点赞 评论 复制链接分享
  • weixin_39986973 weixin_39986973 4月前

    -f Should I delete the snapshots of the affected VM to trigger a full backup on the next run?

    点赞 评论 复制链接分享
  • weixin_39988779 weixin_39988779 4月前

    That may be better after a merge failure, I'm making changes to automate this.

    点赞 评论 复制链接分享
  • weixin_39986973 weixin_39986973 4月前

    -f There was another merge failure during our last backup:

    
    merge 🚨
        Start time: Sunday, December 2nd 2018, 2:08:18 am
        End time: Sunday, December 2nd 2018, 2:18:30 am
        Duration: 10 minutes
        Error: operation timed out 
    

    Is there a way to recover this backup without running a full backup? The full backup is over 6TB...

    Also, do you have any ideas what can be done to avoid this problem? The remote is dedicated to backups, there's nothing else that should be blocking these merge operations...

    点赞 评论 复制链接分享
  • weixin_39986973 weixin_39986973 4月前

    Would it help to set the backup concurrency to 1 instead of 0 ?

    Oh, and for the record: our backup remote is actually a local filesystem...

    点赞 评论 复制链接分享
  • weixin_39986973 weixin_39986973 4月前

    I also noticed, in the same backup report, all merge operations are reporting size 0 B and speed 0 B/s, like this:

    
    merge ✔
    
        Start time: Sunday, December 2nd 2018, 2:02:21 am
        End time: Sunday, December 2nd 2018, 2:04:43 am
        Duration: 2 minutes
        Size: 0 B
        Speed: 0 B/s 
    

    I'm not sure if this means that all merge operations failed?

    点赞 评论 复制链接分享
  • weixin_39988779 weixin_39988779 4月前

    Please provide the full log of the run.

    点赞 评论 复制链接分享
  • weixin_39986973 weixin_39986973 4月前

    -f OK I'll open a support ticket to send the logs, instead of posting them in public.

    点赞 评论 复制链接分享
  • weixin_39988779 weixin_39988779 4月前

    Perfect thank you :slightly_smiling_face:

    点赞 评论 复制链接分享

相关推荐