weixin_39737224
weixin_39737224
2020-12-30 06:01

NO_NEXT_RESC_FOUND due to minimum free space constraint incorrectly calculated

iRODS Version, OS and Version

irods 4.1.10 OS Ubuntu 12.04

What did you try to do?

iput an smallish xml file to a composite resource tree whose leaf resources have context strings set of minimum_free_space_for_create_in_bytes along with free_space being populated on resource information.

All leaf resources have 10's of TB available.

Expected behavior

iput file is uploaded

Observed behavior


Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [4781801644851] > free space [59108425728] on resource [irods-arch-sr02-fg]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr02-fg] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [3177150498816] > free space [59108450304] on resource [irods-arch-sr1-bc]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr1-bc] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [4781801644851] > free space [59108466688] on resource [irods-arch-sr02-de]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr02-de] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [3177150498816] > free space [59109539840] on resource [irods-arch-sr1-de]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr1-de] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [3177150498816] > free space [88816431104] on resource [irods-arch-sr1-fg]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr1-fg] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [4776454303744] > free space [88816427008] on resource [irods-arch-sr02-bc]
Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr02-bc] exceeds free space, voting 0.0
Mar  8 09:33:05 pid:5067 ERROR: [-]     iRODS/server/api/src/rsGetRemoteZoneResc.cpp:68:rsGetRemoteZoneResc :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [failed for [/archive/GAPI/exp/infinium/0f/e0/23/200796210115_D_Grn.xml]]
        [-]     iRODS/server/core/src/irods_resource_redirect.cpp:291:resolve_hier_for_create :  status [NO_NEXT_RESC_FOUND]  errno [] -- message []
                [-]     iRODS/server/core/src/irods_resource_redirect.cpp:74:request_vote_for_file_object :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [failed in call to redirect host [irods-arch-g1-vm]  hier []]
                        [-]     librepl.cpp:1528:replRedirect :  status [NO_NEXT_RESC_FOUND]  errno [] -- message []
                                [-]     librepl.cpp:1476:repl_redirect_impl :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [repl_redirect_impl - Failed to redirect to all children.]
                                        [-]     librepl.cpp:1307:replRedirectToChildren :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [replRedirectToChildren - Failed calling redirect on the child "green"]
                                                [-]     librandom.cpp:832:random_redirect :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [Failed getting next valid child.]
                                                        [-]     librandom.cpp:761:get_next_valid_child_resource :  status [NO_NEXT_RESC_FOUND]  errno [] -- message [No valid child found.]

However, its not clear why the resources voted no, since all the filesystems both had enough space, and, unless I am misreading, have sufficient free space set in the resource, and a low enough minimum_free_space_for_create_in_bytes



# df -h /irods-arch-i01-*
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/irods--sd--bc-1   58T   24T   34T  41% /irods-arch-i01-bc
/dev/mapper/irods--sd--de-1   58T   20T   38T  34% /irods-arch-i01-de
/dev/mapper/irods--sd--fg-1   58T   35T   23T  61% /irods-arch-i01-fg

name: demoirods-arch-i01 path: /usr/local/iRODS/Vault status: up context_string: minimum_free_space_for_create_in_bytes=3434289562
name: irods-arch-i01-bc path: /irods-arch-i01-bc status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816
name: irods-arch-i01-de path: /irods-arch-i01-de status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816
name: irods-arch-i01-fg path: /irods-arch-i01-fg status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816

$ iadmin lr ${HOSTNAME}-bc | grep free_space
free_space: 37119101583360
free_space_ts 2018-03-08.11:50:09
resc_context: minimum_free_space_for_create_in_bytes=3177150498816
$ iadmin lr ${HOSTNAME}-de | grep free_space
free_space: 41531622506496
free_space_ts 2018-03-08.11:55:51
resc_context: minimum_free_space_for_create_in_bytes=3177150498816
$ iadmin lr ${HOSTNAME}-fg | grep free_space
free_space: 24860432900096
free_space_ts 2018-03-08.11:55:52
resc_context: minimum_free_space_for_create_in_bytes=317715049881

# df -h /irods-arch-i02-*
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/irods--sd--bc-1   87T   13T   74T  15% /irods-arch-i02-bc
/dev/mapper/irods--sd--de-1   87T   13T   75T  15% /irods-arch-i02-de
/dev/mapper/irods--sd--fg-1   87T   12T   75T  14% /irods-arch-i02-fg

name: demoirods-arch-i02 path: /usr/local/iRODS/Vault status: up context_string: minimum_free_space_for_create_in_bytes=5165540762
name: irods-arch-i02-bc path: /irods-arch-i02-bc status: up context_string: minimum_free_space_for_create_in_bytes=4776454303744
name: irods-arch-i02-de path: /irods-arch-i02-de status: up context_string: minimum_free_space_for_create_in_bytes=4781801644851
name: irods-arch-i02-fg path: /irods-arch-i02-fg status: up context_string: minimum_free_space_for_create_in_bytes=4781801644851

$ iadmin lr ${HOSTNAME}-bc | grep free_space
free_space: 81061998485504
free_space_ts 2018-03-08.11:50:58
resc_context: minimum_free_space_for_create_in_bytes=4776454303744
$ iadmin lr ${HOSTNAME}-de | grep free_space
free_space: 81410829148160
free_space_ts 2018-03-08.11:55:51
resc_context: minimum_free_space_for_create_in_bytes=4781801644851
$ iadmin lr ${HOSTNAME}-fg | grep free_space
free_space: 81605770838016
free_space_ts 2018-03-08.11:55:51
resc_context: minimum_free_space_for_create_in_bytes=4781801644851

# df -h /irods-arch-sr1-*
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/irods--sd--bc-1   58T   41T   17T  71% /irods-arch-sr1-bc
/dev/mapper/irods--sd--de-1   58T   15T   43T  26% /irods-arch-sr1-de
/dev/mapper/irods--sd--fg-1   58T   20T   38T  34% /irods-arch-sr1-fg

name: demoResc-irods-arch-sr1 path: /usr/local/iRODS/Vault status: up context_string: minimum_free_space_for_create_in_bytes=3434289562
name: irods-arch-sr1-de path: /irods-arch-sr1-de status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816
name: irods-arch-sr1-bc path: /irods-arch-sr1-bc status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816
name: irods-arch-sr1-fg path: /irods-arch-sr1-fg status: up context_string: minimum_free_space_for_create_in_bytes=3177150498816

$ iadmin lr ${HOSTNAME}-bc | grep free_space
free_space: 18592824238080
free_space_ts 2018-03-08.11:50:52
resc_context: minimum_free_space_for_create_in_bytes=3177150498816
$ iadmin lr ${HOSTNAME}-de | grep free_space
free_space: 46707319218176
free_space_ts 2018-03-08.11:55:32
resc_context: minimum_free_space_for_create_in_bytes=3177150498816
$ iadmin lr ${HOSTNAME}-fg | grep free_space
free_space: 41665955786752
free_space_ts 2018-03-08.11:55:32
resc_context: minimum_free_space_for_create_in_bytes=3177150498816

# df -h /irods-arch-sr02-*
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/irods--sd--bc-1   87T   16T   71T  19% /irods-arch-sr02-bc
/dev/mapper/irods--sd--de-1   87T   12T   75T  14% /irods-arch-sr02-de
/dev/mapper/irods--sd--fg-1   87T   12T   75T  14% /irods-arch-sr02-fg

name: irods-arch-sr02-de path: /irods-arch-sr02-de status: up context_string: minimum_free_space_for_create_in_bytes=4781801644851
name: demoirods-arch-sr02 path: /usr/local/iRODS/Vault status: up context_string: minimum_free_space_for_create_in_bytes=5165540762
name: irods-arch-sr02-bc path: /irods-arch-sr02-bc status: up context_string: minimum_free_space_for_create_in_bytes=4776454303744
name: irods-arch-sr02-fg path: /irods-arch-sr02-fg status: up context_string: minimum_free_space_for_create_in_bytes=4781801644851

$ iadmin lr ${HOSTNAME}-bc | grep free_space
free_space: 77526593134592
free_space_ts 2018-03-08.11:50:27
resc_context: minimum_free_space_for_create_in_bytes=4776454303744
$ iadmin lr ${HOSTNAME}-de | grep free_space
free_space: 81603150082048
free_space_ts 2018-03-08.11:55:51
resc_context: minimum_free_space_for_create_in_bytes=4781801644851
$ iadmin lr ${HOSTNAME}-fg | grep free_space
free_space: 81483930464256
free_space_ts 2018-03-08.11:55:50
resc_context: minimum_free_space_for_create_in_bytes=4781801644851

Steps to reproduce

Alas, it's intermittent, but also stopping us reliably uploading into this zone. We've seen it on this zone, but also on another; the other zones not but I suspect that's because they get much less data uploaded..

该提问来源于开源项目:irods/irods

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

12条回答

  • weixin_39737224 weixin_39737224 4月前
    
    {
        "build_system_information": "Linux default-cloud-hostname 3.11.0-15-generic #25~precise1-Ubuntu SMP Thu Jan 30 17:39:31 UTC 2014 x86_64 x86_64", 
        "catalog_schema_version": 4, 
        "commit_id": "bb48cd38c4a543ad8bf45082fa06b15290378b50", 
        "compile_time": "2016-11-04T17:24:49Z", 
        "compiler_version": "g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3", 
        "configuration_schema_version": 2, 
        "installation_time": "2017-02-13T10:16:28Z", 
        "irods_version": "4.1.10", 
    

    However, we had also installed a debug build from you folks when we were investigating the last reoccurrence of the issue

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Ah, right.

    Can you share the result of: ls -l /var/lib/irods/plugins/resources/libunixfilesystem.so and sha256sum /var/lib/irods/plugins/resources/libunixfilesystem.so

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    I have found the code that produced your binary - it is the 4.1.11 libunixfilesystem.so + a single commit that produces these helpful debugging logs. The logs are printing the intended variables. Now we just need to figure out why 59... instead of 81...

    Do any of your resources have ~55GB (59,108,466,688) in their free_space value in the catalog? Is there a chance we're picking up a good value from the wrong resource/location somewhere?

    点赞 评论 复制链接分享
  • weixin_39737224 weixin_39737224 4月前
    
    for resc in $(iadmin lr); do iadmin lr ${resc} | grep -e space -e resc_name; done
    resc_name: RENCI_tmp
    free_space: 59091353600
    free_space_ts 2018-03-09.10:15:28
    resc_context: minimum_free_space_for_create_in_bytes=3434289562
    resc_name: bundleResc
    free_space: 
    free_space_ts Never
    resc_name: demoResc
    free_space: 2507554816
    free_space_ts 2018-03-09.10:15:30
    resc_context: minimum_free_space_for_create_in_bytes=620294963
    resc_name: demoResc-irods-arch-sr1
    free_space: 59214921728
    free_space_ts 2018-03-09.10:15:46
    resc_context: minimum_free_space_for_create_in_bytes=3434289562
    resc_name: demoirods-arch-i01
    free_space: 59091353600
    free_space_ts 2018-03-09.10:15:28
    resc_context: minimum_free_space_for_create_in_bytes=3434289562
    resc_name: demoirods-arch-i02
    free_space: 88799735808
    free_space_ts 2018-03-09.10:15:52
    resc_context: minimum_free_space_for_create_in_bytes=5165540762
    resc_name: demoirods-arch-sr02
    free_space: 83359223808
    free_space_ts 2018-03-09.10:15:34
    resc_context: minimum_free_space_for_create_in_bytes=5165540762
    resc_name: green
    free_space: 
    free_space_ts Never
    resc_name: irods-arch-i01-bc
    free_space: 37113816014848
    free_space_ts 2018-03-09.10:15:27
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: irods-arch-i01-de
    free_space: 41527274385408
    free_space_ts 2018-03-09.10:15:27
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: irods-arch-i01-fg
    free_space: 24855821549568
    free_space_ts 2018-03-09.10:15:28
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: irods-arch-i02-bc
    free_space: 81061998485504
    free_space_ts 2018-03-09.10:15:52
    resc_context: minimum_free_space_for_create_in_bytes=4776454303744
    resc_name: irods-arch-i02-de
    free_space: 81410829148160
    free_space_ts 2018-03-09.10:15:52
    resc_context: minimum_free_space_for_create_in_bytes=4781801644851
    resc_name: irods-arch-i02-fg
    free_space: 81605770838016
    free_space_ts 2018-03-09.10:15:53
    resc_context: minimum_free_space_for_create_in_bytes=4781801644851
    resc_name: irods-arch-sr02-bc
    free_space: 77526593134592
    free_space_ts 2018-03-09.10:15:35
    resc_context: minimum_free_space_for_create_in_bytes=4776454303744
    resc_name: irods-arch-sr02-de
    free_space: 81603150082048
    free_space_ts 2018-03-09.10:15:34
    resc_context: minimum_free_space_for_create_in_bytes=4781801644851
    resc_name: irods-arch-sr02-fg
    free_space: 81483930464256
    free_space_ts 2018-03-09.10:15:34
    resc_context: minimum_free_space_for_create_in_bytes=4781801644851
    resc_name: irods-arch-sr1-bc
    free_space: 18618689138688
    free_space_ts 2018-03-09.10:15:46
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: irods-arch-sr1-de
    free_space: 46733185204224
    free_space_ts 2018-03-09.10:15:46
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: irods-arch-sr1-fg
    free_space: 41691841646592
    free_space_ts 2018-03-09.10:15:46
    resc_context: minimum_free_space_for_create_in_bytes=3177150498816
    resc_name: red
    free_space: 
    free_space_ts Never
    resc_name: repl
    free_space: 
    free_space_ts Never
    resc_name: root
    free_space: 
    free_space_ts Never
    

    resource tree structure

    
    ilsresc
    RENCI_tmp
    demoResc:unix file system
    demoResc-irods-arch-sr1:unix file system
    demoirods-arch-i01:unix file system
    demoirods-arch-i02:unix file system
    demoirods-arch-sr02:unix file system
    root:passthru
    └── repl:replication
        ├── green:random
        │   ├── irods-arch-sr02-bc:unix file system
        │   ├── irods-arch-sr02-de:unix file system
        │   ├── irods-arch-sr02-fg:unix file system
        │   ├── irods-arch-sr1-bc:unix file system
        │   ├── irods-arch-sr1-de:unix file system
        │   └── irods-arch-sr1-fg:unix file system
        └── red:random
            ├── irods-arch-i01-bc:unix file system
            ├── irods-arch-i01-de:unix file system
            ├── irods-arch-i01-fg:unix file system
            ├── irods-arch-i02-bc:unix file system
            ├── irods-arch-i02-de:unix file system
            └── irods-arch-i02-fg:unix file system
    

    the Resource RENCI_tmp looks suspiciously close (and its the first in the list and based on a / filesystem as below);

    
    resc_net: irods-arch-i01
    resc_def_path: /RENCI_tmp
    
    
    $ iquest "SELECT DATA_NAME WHERE DATA_RESC_NAME like 'RENCI_tmp%'"
    CAT_NO_ROWS_FOUND: Nothing was found matching your query
    

    Likewise a couple of the demoResc's have very close values (their Vaults are on the /var filesystem, same as the logs, so would expect to be the slightly decreasing in free space over time.

    rulebase has a msiSetDefaultResc("root","forced"), so none of those should be considered, of course;

    
        "re_rulebase_set": [
            {
                "filename": "arch"
            },
            {
                "filename": "core"
            }
        ], 
    
    
    cat /etc/irods/arch.re 
    #ensure that the compound resource tree is used no matter what the user has
    # in their irrds Environment
    acSetRescSchemeForCreate {msiSetDefaultResc("root","forced"); }
    
    #ensure checksumming
    acPostProcForPut {msiSysChksumDataObj; }
    
    #turn on Quotas
     "acRescQuotaPolicy"
    
     "acAclPolicy"
    
     "acTrashPolicy"
    
    acPostProcForParallelTransferReceived(*leaf_resource) {
        msi_update_unixfilesystem_resource_free_space(*leaf_resource);
    }
    acPostProcForDataCopyReceived(*leaf_resource) {
        msi_update_unixfilesystem_resource_free_space(*leaf_resource);
    }
    
    #4.1.10 included new PEP's, so adding definitions here to hopefully
    # stop DEBUG errors
     "4110additionalPEPDefinitions"
    
    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    The size on / looks suspiciously close, I agree... So a fallthrough match to / could explain it - if we could find something like that... looking through the resource resolution logic yet again... thanks.

    点赞 评论 复制链接分享
  • weixin_39737224 weixin_39737224 4月前

    Forgive me if you've thought of this, but could it be a regression of #3173 ?

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    no worries - all things are possible, but we have this code bracketed under/threshold/over in the tests now...

    the math and logic are good, the value it's using in that math is the mystery (and the value comes from the resource table in the database, which should rule out filesystem/mountpoint shenanigans).

    点赞 评论 复制链接分享
  • weixin_39612228 weixin_39612228 4月前

    Looking at some of the server logs, we found this line:

    
    Apr 20 15:01:22 pid:14412 NOTICE: msi_update_unixfilesystem_resource_free_space: path to stat [/irods-seq-sr02-ddn-ra08-6-7-8] for resource [irods-seq-sr02-ddn-ra08-6-7-8] doesn't exist, moving to parent
    

    Which comes from here: https://github.com/irods/irods/blob/921f1bc45d4a03f15bb143e4941ca056e430a922/plugins/microservices/administration/msi_update_unixfilesystem_resource_free_space/libmsi_update_unixfilesystem_resource_free_space.cpp#L76-L79

    The parent of the specified vault path /irods-seq-sr02-ddn-ra08-6-7-8 is /.

    We think this is the cause of pulling the "wrong" (but correct) information from the database. This could probably benefit from a warning about stat'ing a raw /.

    What do you think would be the best explanation for why /irods-seq-sr02-ddn-ra08-6-7-8 was not found?

    点赞 评论 复制链接分享
  • weixin_39737224 weixin_39737224 4月前

    Hi folks,

    After a ❤️ stopping moment , I can verify that resource irods-seq-sr02-ddn-ra08-6-7-8 has a non-root path.

    
    resc_def_path: /irods-seq-sr02-ddn-ra08-6-7-8
    

    Now that my heart rate is back to approaching normal....

    My theories;

    1. IO timeout on the dir stat - I don't know how the code would behave if the stat function timed out, or even if it does.. However, those disks can get very busy at times (loads of over 1500 are not unusual at peak activity), so its a possibility..
    2. file system corruption - the problem with this is the directory (and its underlying SAN volume) is still present on the system, and would display other errors (e.g. dmesg/kmesg) or give other irods errors failing to retrieve files (assuming files are read/written to that dir during the timeframe, not guaranteed)
    3. add an exception handler to boost::filesystem::path path_to_stat(resource_vault_path);? My knowledge of C extends to spelling it, but the docs imply its possible to add an exception handler to catch whatever the error actually is under the hood.. I expect you have thought of this.. :-)
    4. more 1 and 2 really, but we do get qla2xxx [0000:07:00.1]-3020:8: QUEUE FULL detected., which is a SAN throughput issue (sometimes fabric, sometimes storage), and the system should retry, but this may introduce a delay in the response..
    5. a quick check of the logs shows it occurs to different filesystems at different times, so its not the same LUN (which might imply a storage system issue)
    6. the problem occurs on both systems with attached disk and with SAN attached storage, so its likely we can rune out the storage itself..
    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Seems adding a retry and logging (4 above) for the path_to_stat() is reasonable and good.

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    So... focusing on... irods-arch-sr02-de...

    I see:

    
    Mar  8 09:33:05 pid:5067 NOTICE: replica_exceeds_resource_free_space: minimum free space constraint [4781801644851] > free space [59108466688] on resource [irods-arch-sr02-de]
    Mar  8 09:33:05 pid:5067 NOTICE: unix_file_redirect_create: resource [irods-arch-sr02-de] exceeds free space, voting 0.0
    
    
    Filesystem                   Size  Used Avail Use% Mounted on
    /dev/mapper/irods--sd--de-1   87T   12T   75T  14% /irods-arch-sr02-de
    
    
    name: irods-arch-sr02-de
    path: /irods-arch-sr02-de
    status: up
    context_string: minimum_free_space_for_create_in_bytes=4781801644851
    
    
    $ iadmin lr ${HOSTNAME}-de | grep free_space
    free_space: 81603150082048
    free_space_ts 2018-03-08.11:55:51
    resc_context: minimum_free_space_for_create_in_bytes=4781801644851
    

    So the question is why, for this comparison, is the code using ~55GB available (59,108,466,688) when the system and resource itself are reporting ~75TB available (81,603,150,082,048).

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Can you re-re-confirm and share the commit_id from one of the Ubuntu12 servers' /var/lib/irods/VERSION.json?

    点赞 评论 复制链接分享

相关推荐