weixin_39634132
weixin_39634132
2021-01-07 08:17

Failed to start the daemon: no \"source\" property found for the storage pool

Required information

  • Distribution: Ubuntu 18.04
  • Storage backend in use: zfs

Issue description

This seems to be an issue similar to : https://github.com/lxc/lxd/issues/4629

I have a 2 machine LXD Cluster and recently i changed from class C network to class B network destroying the LXD cluster.

I tried to fix the local and global sqlite db and see if everything can be restored.

Printing the global db.bin from both lxd-pod1 and lxd-pod2

root-pod1:/var/lib/lxd# for t in sqlite3 ./database/local.db ".tables"; do echo "#### ${t} #####"; sqlite3 ./database/local.db "select * from ${t}"; done > local.db.txt root-pod1:/var/lib/lxd# cat local.db.txt

config

3|core.https_address|10.1.0.5:8443

patches

1|invalid_profile_names|1543768094 2|leftover_profile_config|1543768094 3|network_permissions|1543768094 4|storage_api|1543768094 5|storage_api_v1|1543768094 6|storage_api_dir_cleanup|1543768094 7|storage_api_lvm_keys|1543768094 8|storage_api_keys|1543768094 9|storage_api_update_storage_configs|1543768095 10|storage_api_lxd_on_btrfs|1543768095 11|storage_api_lvm_detect_lv_size|1543768095 12|storage_api_insert_zfs_driver|1543768095 13|storage_zfs_noauto|1543768095 14|storage_zfs_volume_size|1543768095 15|network_dnsmasq_hosts|1543768095 16|storage_api_dir_bind_mount|1543768095 17|fix_uploaded_at|1543768095 18|storage_api_ceph_size_remove|1543768095 19|devices_new_naming_scheme|1543768095 20|storage_api_permissions|1543768095 21|container_config_regen|1543768095 22|lvm_node_specific_config_keys|1543768096 23|candid_rename_config_key|1543768096 24|shrink_logs_db_file|1545319854

raft_nodes

1|10.1.0.5:8443 2|10.2.0.5:8443

schema

1|37|1543768094

root-pod1:/var/lib/lxd# for t in sqlite3 ./database/global/db.bin ".tables"; do echo "#### ${t} #####"; sqlite3 ./database/global/db.bin "select * from ${t}"; done > global.db.txt root-pod1:/var/lib/lxd# cat global.db.txt

certificates

2|166bc60e63e4c9535b912a81e9851c00eb4d6b0f829066f9392a2a316b2794af|1|10.1.0.5|-----BEGIN CERTIFICATE----- MIIFejCCA2KgAwIBAgIRAINXhGGMhSgYBQ5XMb/xCPowDQYJKoZIhvcNAQELBQAw QzEcMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEjMCEGA1UEAwwacm9vdEB1 MTgwNC1seGQxLnNhdHlhbi5jb20wHhcNMTgxMTI2MTczOTIxWhcNMjgxMTIzMTcz OTIxWjBDMRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMSMwIQYDVQQDDBpy b290QHUxODA0LWx4ZDEuc2F0eWFuLmNvbTCCAiIwDQYJKoZIhvcNAQEBBQADggIP ADCCAgoCggIBAKzziXz1L02NBHN8bqlIIAIaQEgcVKN4SdH1a04Co8SFv40dEEjq JgPXwpUR8bGrm0LnkZxd/T2g1lUzDGSyxl3Au7HDfBID7pH16HLdRuaz4REs57my MFxw2u4n/tz33kMzSnvtZQXwiV7SgBWZBTpucbocpPqrso9+7h44wfJji4/HLgAR 3GPeFjnR6ZKhc/5QIj/d07BfzarlgVR2Xy/jChAE06v03RjLMhEoEq4Sjmr3o+gC z+UPw0WOnwguqhL2FWHoQojQxxPBbzt8va7dWsj4qc3Zur7g+ZDocP/CPbP9hVnB t0CDDI3ckbGEfcaGpSH/lOWpeWdUmv8IXqjMIUyDiKzDNMbxNQM0B79vyls7IW8h XqbX4RJlvsBzMU23q9vFBZjEx0RVdq/957ahtznwauPBaXvWgVb+if3wu4GnCO/m +Wp06mWZsH58TcXMwKclzAoFjWzNBT/BUdpFytyNU9COttkUaSMCWzilwNdbXrV9 aa4rjyJCwb3v8STNm9Lr0sSArTpvC8iu6wPXEsBZbtO4l2Lq4ZWapHaTVNmp/Yj2 orGzxKQt2krMbO3ot7fcRbnIcqosh2anfFoqliVkHAT/b1o7O0L09TXCDwQgdoFy q5nrDvS8grI1JbvxH2uZGOKjMQcuY8Fbrtk1S1OtDS7udS2TXBF8MwPZAgMBAAGj aTBnMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDAjAMBgNVHRMB Af8EAjAAMDIGA1UdEQQrMCmCFXUxODA0LWx4ZDEuc2F0eWFuLmNvbYcEwKgBCIcE wKgBB4cEwKh6ATANBgkqhkiG9w0BAQsFAAOCAgEAnxKjKmJ56RsLdJSnKWCsfMTw P4YZ4rZX/acGJlNIo8cM4ejQG5+R65iS3i/tcNWY8CWzegGVEa2gLJy2vg30cTgw Qurjw35m4Jx3dtB0i9290Eu0o9kxOYaXOlGSgvQ+HFMw7O/IzcyQ92lMYPom8T+I iC2aeBtv0KJNRqu4Dki2LenIFi8u/0SwIEvhnkM7c+9ZB3QVUk687cqDjCoCII+v zI9hBwHrXjDRWvL4KnYxIPNY7oQ+CKMDGmDyBsHLxZ62ihrpl3iOmnDdlx6HUGJ1 O6KRLYDh/SAUflQ09Pah+/sIRwCstu7RtOewVvTMgTCADBB15/ukKIOWcm8JlvST rCT37OfY2zXqSIlqRmFS8XNthX/x5vmqGkBfau6RX5j+6gMSWiRsPQW3JS3hf9/P 98v7HjYipwjZ6w/37b0HE+cC20m/GsMzc8SLYqk54G78+jOol89MjVWyS79Abo9p JxGpGEY1HohHbGw+XzAEnz4DRfXwp9NAIvtTuh1VGuSeOeRiRzXFxWaA7HHW5+3s FbgDVABS2jyNx18AZhpug9m3P4vwWPpB0qjjk3rQ1p21C/w7jxOoXRGTdqAAh5sq NO6QRr0zXCQIBBycUi/HBfP+i+MraMggS/wdm96X/IZ07IztM+cxB2i13CQf563B co9El5penyHhgGllXZ0= -----END CERTIFICATE-----

networks_nodes

config

7|core.trust_password|25ed52b33557bba23c4ece1eb4f13c625329430350468999d9ece6dda161c60bbed329a6764edafcf8379a1a4f142ec5b8f7b44315d09c9ad54c3de8f063d5f316f46c3724bcbd52c6f7aaf972e00400801e1ed5afe67f47a24e0804796d4a32

nodes

1|none||0.0.0.0|7|85|2019-01-12 22:41:49|0

containers

operations

containers_config

profiles

1|default|Default LXD profile 2|maas|Default LXD profile

containers_devices

profiles_config

17|2|raw.lxc|lxc.cgroup.devices.allow = c 10:237 rwm lxc.apparmor.profile = unconfined lxc.cgroup.devices.allow = b 7:* rwm 18|2|security.privileged|true

containers_devices_config

profiles_devices

1|1|root|2 2|1|eth0|1 49|2|loop5|4 50|2|loop7|4 51|2|root|2 52|2|loop0|4 53|2|loop2|4 54|2|loop3|4 55|2|loop4|4 56|2|loop6|4 57|2|eth0|1 58|2|loop1|4

containers_profiles

profiles_devices_config

1|1|path|/ 2|1|pool|local 3|2|parent|br0 4|2|name|eth0 5|2|nictype|bridged 79|49|path|/dev/loop5 80|50|path|/dev/loop7 81|51|path|/ 82|51|pool|local 83|52|path|/dev/loop0 84|53|path|/dev/loop2 85|54|path|/dev/loop3 86|55|path|/dev/loop4 87|56|path|/dev/loop6 88|57|name|eth0 89|57|nictype|bridged 90|57|parent|br0 91|58|path|/dev/loop1

images

15|84a71299044bc3c3563396bef153c0da83d494f6bf3d38fecc55d776b1e19bf9|ubuntu-18.04-server-cloudimg-amd64-lxd.tar.xz|183218960|0|2|2018-12-06 00:00:00+00:00|2023-04-26 00:00:00+00:00|2018-12-24 21:51:31.090988404+00:00|1|2019-01-04 22:30:15.129603176+00:00|1 36|1a37b19805e7f6077eff5cb8bdc8bd062c5d3e6069fe0806a9cfd2ef039a6136|lxd.tar.xz|130724540|0|2|2019-01-12 00:00:00+00:00|1970-01-01 00:00:00+00:00|2019-01-12 12:09:10.048742676+00:00|1|2019-01-04 02:59:02.877431134+00:00|1

schema

1|7|1543768124

images_aliases

storage_pools

1|local|zfs||1

images_nodes

storage_pools_config

images_properties

71|15|0|description|ubuntu 18.04 LTS amd64 (release) (20181206) 72|15|0|os|ubuntu 73|15|0|release|bionic 74|15|0|version|18.04 75|15|0|architecture|amd64 76|15|0|label|release 77|15|0|serial|20181206 178|36|0|serial|20190112_10:58 179|36|0|description|Centos 7 amd64 (20190112_10:58) 180|36|0|os|Centos 181|36|0|release|7 182|36|0|architecture|amd64

storage_pools_nodes

images_source

15|15|https://cloud-images.ubuntu.com/releases|2||18.04 36|36|https://images.linuxcontainers.org|2||centos/7/amd64

storage_volumes

networks

storage_volumes_config

networks_config

root-pod2:/var/lib/lxd# for t in sqlite3 ./database/local.db ".tables"; do echo "#### ${t} #####"; sqlite3 ./database/local.db "select * from ${t}"; done > local.db.txt root-pod2:/var/lib/lxd# cat local.db.txt

config

1|core.https_address|10.2.0.5:8443

patches

1|invalid_profile_names|1543769800 2|leftover_profile_config|1543769801 3|network_permissions|1543769801 4|storage_api|1543769801 5|storage_api_v1|1543769801 6|storage_api_dir_cleanup|1543769801 7|storage_api_lvm_keys|1543769801 8|storage_api_keys|1543769801 9|storage_api_update_storage_configs|1543769801 10|storage_api_lxd_on_btrfs|1543769801 11|storage_api_lvm_detect_lv_size|1543769801 12|storage_api_insert_zfs_driver|1543769801 13|storage_zfs_noauto|1543769801 14|storage_zfs_volume_size|1543769801 15|network_dnsmasq_hosts|1543769801 16|storage_api_dir_bind_mount|1543769801 17|fix_uploaded_at|1543769801 18|storage_api_ceph_size_remove|1543769801 19|devices_new_naming_scheme|1543769801 20|storage_api_permissions|1543769801 21|container_config_regen|1543769801 22|lvm_node_specific_config_keys|1543769801 23|candid_rename_config_key|1543769801 24|shrink_logs_db_file|1545319811

raft_nodes

1|10.1.0.5:8443 2|10.2.0.5:8443

schema

1|37|1543769800

root-pod2:/var/lib/lxd# for t in sqlite3 ./database/global/db.bin ".tables"; do echo "#### ${t} #####"; sqlite3 ./database/global/db.bin "select * from ${t}"; done > global.db.txt root-pod2:/var/lib/lxd# cat global.db.txt

certificates

2|166bc60e63e4c9535b912a81e9851c00eb4d6b0f829066f9392a2a316b2794af|1|10.1.0.5|-----BEGIN CERTIFICATE----- MIIFejCCA2KgAwIBAgIRAINXhGGMhSgYBQ5XMb/xCPowDQYJKoZIhvcNAQELBQAw QzEcMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEjMCEGA1UEAwwacm9vdEB1 MTgwNC1seGQxLnNhdHlhbi5jb20wHhcNMTgxMTI2MTczOTIxWhcNMjgxMTIzMTcz OTIxWjBDMRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMSMwIQYDVQQDDBpy b290QHUxODA0LWx4ZDEuc2F0eWFuLmNvbTCCAiIwDQYJKoZIhvcNAQEBBQADggIP ADCCAgoCggIBAKzziXz1L02NBHN8bqlIIAIaQEgcVKN4SdH1a04Co8SFv40dEEjq JgPXwpUR8bGrm0LnkZxd/T2g1lUzDGSyxl3Au7HDfBID7pH16HLdRuaz4REs57my MFxw2u4n/tz33kMzSnvtZQXwiV7SgBWZBTpucbocpPqrso9+7h44wfJji4/HLgAR 3GPeFjnR6ZKhc/5QIj/d07BfzarlgVR2Xy/jChAE06v03RjLMhEoEq4Sjmr3o+gC z+UPw0WOnwguqhL2FWHoQojQxxPBbzt8va7dWsj4qc3Zur7g+ZDocP/CPbP9hVnB t0CDDI3ckbGEfcaGpSH/lOWpeWdUmv8IXqjMIUyDiKzDNMbxNQM0B79vyls7IW8h XqbX4RJlvsBzMU23q9vFBZjEx0RVdq/957ahtznwauPBaXvWgVb+if3wu4GnCO/m +Wp06mWZsH58TcXMwKclzAoFjWzNBT/BUdpFytyNU9COttkUaSMCWzilwNdbXrV9 aa4rjyJCwb3v8STNm9Lr0sSArTpvC8iu6wPXEsBZbtO4l2Lq4ZWapHaTVNmp/Yj2 orGzxKQt2krMbO3ot7fcRbnIcqosh2anfFoqliVkHAT/b1o7O0L09TXCDwQgdoFy q5nrDvS8grI1JbvxH2uZGOKjMQcuY8Fbrtk1S1OtDS7udS2TXBF8MwPZAgMBAAGj aTBnMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDAjAMBgNVHRMB Af8EAjAAMDIGA1UdEQQrMCmCFXUxODA0LWx4ZDEuc2F0eWFuLmNvbYcEwKgBCIcE wKgBB4cEwKh6ATANBgkqhkiG9w0BAQsFAAOCAgEAnxKjKmJ56RsLdJSnKWCsfMTw P4YZ4rZX/acGJlNIo8cM4ejQG5+R65iS3i/tcNWY8CWzegGVEa2gLJy2vg30cTgw Qurjw35m4Jx3dtB0i9290Eu0o9kxOYaXOlGSgvQ+HFMw7O/IzcyQ92lMYPom8T+I iC2aeBtv0KJNRqu4Dki2LenIFi8u/0SwIEvhnkM7c+9ZB3QVUk687cqDjCoCII+v zI9hBwHrXjDRWvL4KnYxIPNY7oQ+CKMDGmDyBsHLxZ62ihrpl3iOmnDdlx6HUGJ1 O6KRLYDh/SAUflQ09Pah+/sIRwCstu7RtOewVvTMgTCADBB15/ukKIOWcm8JlvST rCT37OfY2zXqSIlqRmFS8XNthX/x5vmqGkBfau6RX5j+6gMSWiRsPQW3JS3hf9/P 98v7HjYipwjZ6w/37b0HE+cC20m/GsMzc8SLYqk54G78+jOol89MjVWyS79Abo9p JxGpGEY1HohHbGw+XzAEnz4DRfXwp9NAIvtTuh1VGuSeOeRiRzXFxWaA7HHW5+3s FbgDVABS2jyNx18AZhpug9m3P4vwWPpB0qjjk3rQ1p21C/w7jxOoXRGTdqAAh5sq NO6QRr0zXCQIBBycUi/HBfP+i+MraMggS/wdm96X/IZ07IztM+cxB2i13CQf563B co9El5penyHhgGllXZ0= -----END CERTIFICATE-----

networks_nodes

config

7|core.trust_password|25ed52b33557bba23c4ece1eb4f13c625329430350468999d9ece6dda161c60bbed329a6764edafcf8379a1a4f142ec5b8f7b44315d09c9ad54c3de8f063d5f316f46c3724bcbd52c6f7aaf972e00400801e1ed5afe67f47a24e0804796d4a32

nodes

1|none||0.0.0.0|7|85|2019-01-12 22:41:49|0

containers

operations

containers_config

profiles

1|default|Default LXD profile 2|maas|Default LXD profile

containers_devices

profiles_config

17|2|raw.lxc|lxc.cgroup.devices.allow = c 10:237 rwm lxc.apparmor.profile = unconfined lxc.cgroup.devices.allow = b 7:* rwm 18|2|security.privileged|true

containers_devices_config

profiles_devices

1|1|root|2 2|1|eth0|1 49|2|loop5|4 50|2|loop7|4 51|2|root|2 52|2|loop0|4 53|2|loop2|4 54|2|loop3|4 55|2|loop4|4 56|2|loop6|4 57|2|eth0|1 58|2|loop1|4

containers_profiles

profiles_devices_config

1|1|path|/ 2|1|pool|local 3|2|parent|br0 4|2|name|eth0 5|2|nictype|bridged 79|49|path|/dev/loop5 80|50|path|/dev/loop7 81|51|path|/ 82|51|pool|local 83|52|path|/dev/loop0 84|53|path|/dev/loop2 85|54|path|/dev/loop3 86|55|path|/dev/loop4 87|56|path|/dev/loop6 88|57|name|eth0 89|57|nictype|bridged 90|57|parent|br0 91|58|path|/dev/loop1

images

15|84a71299044bc3c3563396bef153c0da83d494f6bf3d38fecc55d776b1e19bf9|ubuntu-18.04-server-cloudimg-amd64-lxd.tar.xz|183218960|0|2|2018-12-06 00:00:00+00:00|2023-04-26 00:00:00+00:00|2018-12-24 21:51:31.090988404+00:00|1|2019-01-04 22:30:15.129603176+00:00|1 36|1a37b19805e7f6077eff5cb8bdc8bd062c5d3e6069fe0806a9cfd2ef039a6136|lxd.tar.xz|130724540|0|2|2019-01-12 00:00:00+00:00|1970-01-01 00:00:00+00:00|2019-01-12 12:09:10.048742676+00:00|1|2019-01-04 02:59:02.877431134+00:00|1

schema

1|7|1543768124

images_aliases

storage_pools

1|local|zfs||1

images_nodes

storage_pools_config

images_properties

71|15|0|description|ubuntu 18.04 LTS amd64 (release) (20181206) 72|15|0|os|ubuntu 73|15|0|release|bionic 74|15|0|version|18.04 75|15|0|architecture|amd64 76|15|0|label|release 77|15|0|serial|20181206 178|36|0|serial|20190112_10:58 179|36|0|description|Centos 7 amd64 (20190112_10:58) 180|36|0|os|Centos 181|36|0|release|7 182|36|0|architecture|amd64

storage_pools_nodes

images_source

15|15|https://cloud-images.ubuntu.com/releases|2||18.04 36|36|https://images.linuxcontainers.org|2||centos/7/amd64

storage_volumes

networks

storage_volumes_config

networks_config

I have 6 to 7 containers on each node. i have the ZFS disk image at /var/lib/lxd/disk/local.img (93GB and 94GB)

i also noticed that the containers directory is empty and zfs pool is not mounted.

How can i fix the storage issue as i feel the select * from storage * nodes is coming as empty

Appreciate any help.

Information to attach

  • [ ] Any relevant kernel output (dmesg)
  • [ ] Container log (lxc info NAME --show-log)
  • [ ] Container configuration (lxc config show NAME --expanded)
  • [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • [ ] Output of the client with --debug
  • [ ] Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)

该提问来源于开源项目:lxc/lxd

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

27条回答

  • weixin_39634132 weixin_39634132 4月前

    After restarting the LXD daemon seems to be successfully coming up. Note: lxc list is coming up empty and it seems to not know about its cluster membership.

    
    systemctl stop lxd lxd.socket
    systemctl start lxd lxd.socket
    
    lxd-pod1
    root-pod1:/var/lib/lxd/database# lxc list
    +------+-------+------+------+------+-----------+----------+
    | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
    +------+-------+------+------+------+-----------+----------+
    root-pod1:/var/lib/lxd/database# lxc cluster list
    Error: LXD server isn't part of a cluster
    root-pod1:/var/lib/lxd/database# cd
    root-pod1:~# lxc profile show default
    config: {}
    description: Default LXD profile
    devices:
      eth0:
        name: eth0
        nictype: bridged
        parent: br0
        type: nic
      root:
        path: /
        pool: local
        type: disk
    name: default
    used_by: []
    root-pod1:~#
    
    LXD POD 2 
    root-pod2:/var/lib/lxd/database# cd
    root-pod2:~# lxc list
    +------+-------+------+------+------+-----------+----------+
    | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
    +------+-------+------+------+------+-----------+----------+
    root-pod2:~# lxc cluster list
    Error: LXD server isn't part of a cluster
    root-pod2:~# lxc profile show default
    config: {}
    description: Default LXD profile
    devices:
      eth0:
        name: eth0
        nictype: bridged
        parent: br0
        type: nic
      root:
        path: /
        pool: local
        type: disk
    name: default
    used_by: []
    root-pod2:~#
    
    
    
    
    LXD POD1 ZFS list
    root-pod1:~# zfs list
    NAME                                                                            USED  AVAIL  REFER  MOUNTPOINT
    local                                                                          31.8G  58.3G    24K  /lxd-pod1
    local/containers                                                               31.2G  58.3G    24K  /lxd-pod1/containers
    local/containers/u1804-crowd                                                   3.19G  58.3G  3.19G  /var/lib/lxd/storage-pools/local/containers/u1804-crowd
    local/containers/u1804-ispconfig                                               10.8G  58.3G  10.8G  /var/lib/lxd/storage-pools/local/containers/u1804-ispconfig
    local/containers/u1804-jenkins                                                  748M  58.3G   927M  /var/lib/lxd/storage-pools/local/containers/u1804-jenkins
    local/containers/u1804-jira                                                    2.11G  58.3G  2.11G  /var/lib/lxd/storage-pools/local/containers/u1804-jira
    local/containers/u1804-maas-rack                                               2.18G  58.3G  2.45G  /var/lib/lxd/storage-pools/local/containers/u1804-maas-rack
    local/containers/u1804-maas-rack-vlan20                                        1.94G  58.3G  2.21G  /var/lib/lxd/storage-pools/local/containers/u1804-maas-rack-vlan20
    local/containers/u1804-nagios                                                  1.27G  58.3G  1.27G  /var/lib/lxd/storage-pools/local/containers/u1804-nagios
    local/containers/u1804-nexus                                                   1.99G  58.3G  1.99G  /var/lib/lxd/storage-pools/local/containers/u1804-nexus
    local/containers/u1804-pgsql                                                   1.52G  58.3G  1.52G  /var/lib/lxd/storage-pools/local/containers/u1804-pgsql
    local/containers/u1804-splunk                                                  5.56G  58.3G  5.56G  /var/lib/lxd/storage-pools/local/containers/u1804-splunk
    local/custom                                                                     24K  58.3G    24K  none
    local/deleted                                                                    48K  58.3G    24K  none
    local/deleted/images                                                             24K  58.3G    24K  none
    local/images                                                                    525M  58.3G    24K  none
    local/images/84a71299044bc3c3563396bef153c0da83d494f6bf3d38fecc55d776b1e19bf9   334M  58.3G   334M  none
    local/images/e9b96e469242c7a3cd5c05faa0a0d930fc84a3671c8d7218ba0cc7b99f714e64   190M  58.3G   190M  none
    local/snapshots                                                                  24K  58.3G    24K  none
    root-pod1:~#
    root-pod1:~# zpool status
      pool: local
     state: ONLINE
      scan: scrub repaired 0B in 0h16m with 0 errors on Sun Jan 13 00:40:52 2019
    config:
    
            NAME                            STATE     READ WRITE CKSUM
            local                           ONLINE       0     0     0
              /var/lib/lxd/disks/local.img  ONLINE       0     0     0
    
    errors: No known data errors
    
    
    
    root-pod2:~# zfs list
    NAME                                                                                    USED  AVAIL  REFER  MOUNTPOINT
    local                                                                                  29.6G  60.5G    24K  none
    local/containers                                                                       28.1G  60.5G    24K  none
    local/containers/c7-ansible                                                            97.1M  60.5G   299M  /var/lib/lxd/storage-pools/local/containers/c7-ansible
    local/containers/u1804-bitbucket                                                       1.72G  60.5G  1.72G  /var/lib/lxd/storage-pools/local/containers/u1804-bitbucket
    local/containers/u1804-confluence                                                      2.45G  60.5G  2.45G  /var/lib/lxd/storage-pools/local/containers/u1804-confluence
    local/containers/u1804-juju                                                             272M  60.5G   452M  /var/lib/lxd/storage-pools/local/containers/u1804-juju
    local/containers/u1804-ldap                                                             645M  60.5G   645M  /var/lib/lxd/storage-pools/local/containers/u1804-ldap
    local/containers/u1804-maas-region                                                     4.44G  60.5G  4.71G  /var/lib/lxd/storage-pools/local/containers/u1804-maas-region
    local/containers/u1804-mariadb                                                          741M  60.5G   741M  /var/lib/lxd/storage-pools/local/containers/u1804-mariadb
    local/containers/u1804-mattermost                                                       321M  60.5G   507M  /var/lib/lxd/storage-pools/local/containers/u1804-mattermost
    local/containers/u1804-observium                                                       1.26G  60.5G  1.26G  /var/lib/lxd/storage-pools/local/containers/u1804-observium
    local/containers/u1804-zabbix                                                          16.2G  60.5G  16.2G  /var/lib/lxd/storage-pools/local/containers/u1804-zabbix
    local/custom                                                                             24K  60.5G    24K  none
    local/deleted                                                                           398M  60.5G    24K  none
    local/deleted/images                                                                    398M  60.5G    24K  none
    local/deleted/images/6ccf58e975a0e50c066c88f337fbbad305b3619db546871124d16585dca91c87   190M  60.5G   190M  none
    local/deleted/images/9c5414a2cb8f2818ef161d4edd8c74a334862f18fd64b266e5e74dbcc36b04aa   208M  60.5G   208M  none
    local/images                                                                           1.09G  60.5G    24K  none
    local/images/1c8d9341050a00c12c43d7b210e924107d56f4a8f428b11324df8a604103bcf3           190M  60.5G   190M  none
    local/images/544e95c2fe23a8e6b196cd4bcd0765e0f34478610771137a42d44801ea201b02           208M  60.5G   208M  none
    local/images/84a71299044bc3c3563396bef153c0da83d494f6bf3d38fecc55d776b1e19bf9           334M  60.5G   334M  none
    local/images/8828bdf6ab978d47715a5e8e4f384a4f4dfade1155aa4cbcbcda16e9be34f6a9           190M  60.5G   190M  none
    local/images/ee627b9870a268417099e3b4d2053e764e1f507ee3f9d8e64700b5fb8bac3488           190M  60.5G   190M  none
    local/snapshots                                                                          24K  60.5G    24K  none
    root-pod2:~#
    root-pod2:~# zpool status
      pool: local
     state: ONLINE
      scan: scrub repaired 0B in 0h22m with 0 errors on Sun Jan 13 00:46:15 2019
    config:
    
            NAME                            STATE     READ WRITE CKSUM
            local                           ONLINE       0     0     0
              /var/lib/lxd/disks/local.img  ONLINE       0     0     0
    
    errors: No known data errors
    
    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Right, so I don't know exactly what happened to your database to get in this state, the fact that you're not clustered anymore is slightly annoying as recovering the containers on the two individual nodes is pretty straightforward but once you do that you will not be able to cluster them together as you need one node to be empty to have that work.

    Anyway, I suspect having your containers back online is what matters most at this point, so let's do that first, then you may be able to relocate all containers onto one node temporarily so you can reset LXD on the other, join them as a cluster again and then move things back to their respective nodes.

    To import your containers, run: - lxd import c7-ansible - lxd import u1804-bitbucket - ...

    Doing that should re-create the missing DB entries for the container from the on-disk backup file we generate. Once you're done with all your containers, confirm that lxc list looks correct, then do the same on the other node.

    At that point, you'll end up having two standalone LXDs each with their respective containers.

    You can then either choose to keep things that way, or you can add the second server as a remote on the first, then use lxc move to move all the containers of the second server onto the first, then enable clustering on the first with lxc config set core.https_address 1.2.3.4:8443 (replace by expected IP) , lxc config set core.trust_password some-password and followed by lxc cluster enable. On the second server, you'll want to delete the storage pool and network that you have right now, then run lxd init and have it join the cluster.

    At that point you'd have a cluster again with one of the two nodes empty, you can then move containers back to it using lxc move with the --target option.

    Sorry for the inconvenience, sadly it doesn't look like we have enough information available here to do a full rebuild of the DB of both nodes to get things back online without having to rebuild things from the backup files.

    I hope this helps.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Stephane, no need for apologies, i'm interested in documenting the problem and see what the path of recovery would be. I don't host any mission critical apps but people who do might find this useful during recovery.

    And yes, my immediate priority is to recover the containers, move the containers to a spare lxd, fix the 2 main lxd hosts with clustering again, (adding a 3rd spare lxd node)

    thanks to and you, appreciate much.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Hi, this is strange. I imported the containers on each lxd hosts as per what was listed by zfs list (shown earlier), but even though both the hosts are not in clustered mode, i see the containers listed on both. the location shows as none which if it were clustered should have shown the right cluster name.

    root-pod1:~# lxc list +------------------------+---------+---------------------+------+------------+-----------+----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +------------------------+---------+---------------------+------+------------+-----------+----------+ | c7-ansible | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-bitbucket | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-confluence | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-crowd | RUNNING | 10.1.255.247 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-ispconfig | RUNNING | 10.1.255.249 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-jenkins | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-jira | RUNNING | 10.1.255.246 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-juju | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-ldap | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-maas-rack | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-maas-rack-vlan20 | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-maas-region | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-mariadb | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-mattermost | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-nagios | RUNNING | 10.1.255.245 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-nexus | RUNNING | 10.1.255.250 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-observium | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-pgsql | RUNNING | 10.1.255.248 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-splunk | RUNNING | 10.1.255.244 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ | u1804-zabbix | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+---------------------+------+------------+-----------+----------+ Note: the ones listed here as stopped are the containers on other host.

    root-pod2:~# lxc list +------------------------+---------+------------------+------+------------+-----------+----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +------------------------+---------+------------------+------+------------+-----------+----------+ | c7-ansible | RUNNING | 10.2.7.9 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-bitbucket | RUNNING | 10.2.7.8 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-confluence | RUNNING | 10.2.7.10 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-crowd | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-ispconfig | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-jenkins | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-jira | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-juju | RUNNING | 10.2.7.2 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-ldap | RUNNING | 10.2.7.5 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-maas-rack | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-maas-rack-vlan20 | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-maas-region | RUNNING | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-mariadb | RUNNING | 10.2.7.7 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-mattermost | RUNNING | 10.2.7.4 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-nagios | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-nexus | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-observium | RUNNING | 10.2.7.6 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-pgsql | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-splunk | STOPPED | | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ | u1804-zabbix | RUNNING | 10.2.7.3 (eth0) | | PERSISTENT | 0 | none | +------------------------+---------+------------------+------+------------+-----------+----------+ Note: the ones listed here as stopped are the containers on other host.

    Thanks,

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Oh, that's interesting, so it may be that the database is still replicating and we'd just need to manually re-add the node information to get it back online.

    Can you try doing a simple change like:

    • lxc profile set default user.foo bar

    And then on the other server, do:

    • lxc profile get default user.foo

    That would confirm whether the DB is being replicated. That may be a pretty fun one to fix, but should be doable :)

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    looks like it did replicate.

    root-pod1:~# lxc profile set default user.foo bar

    root-pod2:~# lxc profile get default user.foo bar ================================================================= Below is the lxc profile show default

    root-pod1:~# lxc profile show default config: user.foo: bar description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: br0 type: nic root: path: / pool: local type: disk name: default used_by: - /1.0/containers/u1804-nexus - /1.0/containers/u1804-crowd - /1.0/containers/u1804-ispconfig - /1.0/containers/u1804-jenkins - /1.0/containers/u1804-jira - /1.0/containers/u1804-nagios - /1.0/containers/u1804-pgsql - /1.0/containers/u1804-splunk - /1.0/containers/c7-ansible - /1.0/containers/u1804-bitbucket - /1.0/containers/u1804-confluence - /1.0/containers/u1804-juju - /1.0/containers/u1804-ldap - /1.0/containers/u1804-mariadb - /1.0/containers/u1804-mattermost - /1.0/containers/u1804-observium - /1.0/containers/u1804-zabbix

    root-pod2:~# lxc profile show default config: user.foo: bar description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: br0 type: nic root: path: / pool: local type: disk name: default used_by: - /1.0/containers/u1804-nexus - /1.0/containers/u1804-crowd - /1.0/containers/u1804-ispconfig - /1.0/containers/u1804-jenkins - /1.0/containers/u1804-jira - /1.0/containers/u1804-nagios - /1.0/containers/u1804-pgsql - /1.0/containers/u1804-splunk - /1.0/containers/c7-ansible - /1.0/containers/u1804-bitbucket - /1.0/containers/u1804-confluence - /1.0/containers/u1804-juju - /1.0/containers/u1804-ldap - /1.0/containers/u1804-mariadb - /1.0/containers/u1804-mattermost - /1.0/containers/u1804-observium - /1.0/containers/u1804-zabbix

    ================================================================= 2nd maas profile:

    root-pod1:~# lxc profile show maas config: raw.lxc: |- lxc.cgroup.devices.allow = c 10:237 rwm lxc.apparmor.profile = unconfined lxc.cgroup.devices.allow = b 7:* rwm security.privileged: "true" description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: br0 type: nic loop0: path: /dev/loop0 type: unix-block loop1: path: /dev/loop1 type: unix-block loop2: path: /dev/loop2 type: unix-block loop3: path: /dev/loop3 type: unix-block loop4: path: /dev/loop4 type: unix-block loop5: path: /dev/loop5 type: unix-block loop6: path: /dev/loop6 type: unix-block loop7: path: /dev/loop7 type: unix-block root: path: / pool: local type: disk name: maas used_by: - /1.0/containers/u1804-maas-rack - /1.0/containers/u1804-maas-rack-vlan20 - /1.0/containers/u1804-maas-region

    root-pod2:~# lxc profile show maas config: raw.lxc: |- lxc.cgroup.devices.allow = c 10:237 rwm lxc.apparmor.profile = unconfined lxc.cgroup.devices.allow = b 7:* rwm security.privileged: "true" description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: br0 type: nic loop0: path: /dev/loop0 type: unix-block loop1: path: /dev/loop1 type: unix-block loop2: path: /dev/loop2 type: unix-block loop3: path: /dev/loop3 type: unix-block loop4: path: /dev/loop4 type: unix-block loop5: path: /dev/loop5 type: unix-block loop6: path: /dev/loop6 type: unix-block loop7: path: /dev/loop7 type: unix-block root: path: / pool: local type: disk name: maas used_by: - /1.0/containers/u1804-maas-rack - /1.0/containers/u1804-maas-rack-vlan20 - /1.0/containers/u1804-maas-region

    ================================================================= root-pod1:~# lxc cluster list Error: LXD server isn't part of a cluster

    root-pod2:~# lxc cluster list Error: LXD server isn't part of a cluster

    =================================================================

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    lxd-pod1.zip lxd-pod2.zip

    Attaching the database.zip files from both lxd-pod1 & lxd-pod2

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Ok, so your local databases still have the raft entries, that explains why replication still works.

    Can you show lxd sql global "SELECT * FROM nodes;"?

    With that output I should be able to get you what's needed to re-enable part of clustering, that should get lxc cluster list to behave. We'll then need to adjust the node_id of various entries to reflect where things are, then you should be mostly back online.

    点赞 评论 复制链接分享
  • weixin_39787089 weixin_39787089 4月前

    As I had mentioned earlier, from the db dump that was pasted we can see that the

    nodes
    table in the global database was only listing one node (as though each instance was not clustered at all). This is way each node thinks it's not clustered.
    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    right, I just want to get the current schema and api information from it so we can restore the nodes table to what it should be based on the local raft nodes list.

    Once that's done, we need to update the node_id to the expected value in a bunch of tables and things should be back to normal.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    root-pod1:~# lxd sql global "SELECT * FROM nodes;" +----+------+-------------+---------+--------+----------------+--------------------------------+---------+ | id | name | description | address | schema | api_extensions | heartbeat | pending | +----+------+-------------+---------+--------+----------------+--------------------------------+---------+ | 1 | none | | 0.0.0.0 | 7 | 85 | 2019-01-18T14:25:17.559600834Z | 0 | +----+------+-------------+---------+--------+----------------+--------------------------------+---------+

    root-pod2:~# lxd sql global "SELECT * FROM nodes;" +----+------+-------------+---------+--------+----------------+--------------------------------+---------+ | id | name | description | address | schema | api_extensions | heartbeat | pending | +----+------+-------------+---------+--------+----------------+--------------------------------+---------+ | 1 | none | | 0.0.0.0 | 7 | 85 | 2019-01-18T14:24:48.586437598Z | 0 | +----+------+-------------+---------+--------+----------------+--------------------------------+---------+

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    global.lxd-pod1.txt global.lxd-pod2.txt

    Attaching the global db.win select * from {t}

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Ok, lets start with:

    • lxd sql global "UPDATE nodes SET name='pod1', address='10.1.0.5:8443';"
    • lxd sql global "INSERT INTO nodes (name, address, schema, api_extensions) VALUES ('pod2', '10.2.0.5:8443', 7, 85);"

    This should get the nodes table to make sense again, please show lxd sql global "SELECT * FROM nodes;" after the fact to confirm it's good.

    This may also get lxc cluster list to behave. If it does, then you should reload LXD on both nodes to have it notice that it's part of a cluster, do that with systemctl reload snap.lxd.daemon on both nodes.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Apologize for a delayed response, i tried the above and the lxd stop coming up. it was throwing another error on storage. when the lxd was running earlier, i had added a remote lxd and copied all the containers to the remote lxd.

    I am going to rebuild this 2 hosts with lxd cluster and try taking backups automate it as well.

    Appreciate much on all the help, I feel the sqlite3 db.bin is going into a catch 22 situation. when i add the nodes, the storage doesn't have the required 2nd node details, so it fails to come up.

    Let me know if want to close this or if any more details are required.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Yeah, though if that happens, it can be fixed using a patch file to inject a SQL statement between the DB coming up and it being accessed for host config, not that you should really be getting into such situations in the first place, but we do have a recovery mechanism to get you out of this :)

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Closing this as you're planning on just rebuilding the cluster and moving the containers back to it, which will sort this all out.

    I suspect the issue was that manually re-clustering with the instructions I gave you then required some additional entries to be marked as node-specific. We could certainly have gone through those one by one and fixed them with more .sql patches but that would have taken a few more roundtrips so I understand why you'd instead just go with reinstalling :)

    For your backups, I'd recommend you use lxd sql global .dump (on any cluster host) and lxd sql local .dump (on every cluster host), this will spit out a .sql which should compress pretty well and should be sufficient to re-assemble the database if something goes wrong as well as make it easier to figure out exactly what's in there. Backing up db.bin isn't a good idea as that's just a temporary representation of the clustered database and not guaranteed to match what's in the actual database.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Thank you. I just wanted to document the recovery steps as much as possible. Taking a step forward i think i missed the backups which should be very helpful and provide more confidence.

    Appreciate all the patience, advise and help.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    can you try to help with this one?

    As the daemon is failing to start, we can't use lxd sql, but if we can figure out what's actually in the database, then we can possibly provide a .sql patch file to load on startup to get things going again.

    点赞 评论 复制链接分享
  • weixin_39787089 weixin_39787089 4月前

    I was about to start figuring out what .sql patch could restore the missing config keys in the

    storage_pools_config
    table. But I just noticed that it seems that also the
    storage_volumes
    table is empty, which seems quite bad considering that has mentioned having 6/7 containers per node. What do you think? I could cook something, just to try, but it feels it might be a rabbit hole.
    点赞 评论 复制链接分享
  • weixin_39787089 weixin_39787089 4月前

    The containers table is empty as well.

    点赞 评论 复制链接分享
  • weixin_39787089 weixin_39787089 4月前

    And the nodes table contains just one node. I believe this database can't be easily recovered, as in order to even just populate the

    storage_pools_config
    table I'd need to know the database ID of each of the two nodes of the cluster, which seem to have been both gone.
    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    So if we get the storage pool to behave, you may be able to re-import the local containers using lxd import, having LXD scan the storage and re-add the DB entries.

    But yeah, something bad happen to that database.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    LXD and LXC commands won't work as the daemon seems to be not running.

    In my best effort to recover the containers atleast i have done the following: - mount the zfs manually : zfs list zfs mount -O zfs mount -O local/containers/u1804-nexus - tar the containers tar --numeric-owner -czvf u1804-nexus.tar.gz ./* - move it to NFS for now. cp

    If the database cannot be restored, my best bet is to uninstall the LXD and reinstall it and recreate the cluster again and restore the containers using the backed up tar files. This time though i will take backups of the database.

    The bug which if agreed would be that the tables should not be getting wiped off. If these tables had a time-series based changes, we could put monitoring using prometheus / grafana (lot of opportunities).

    Again it could be my mistake as well, i did perform a VACUUM operation on the WAL file. sqlite3 db.bin sqlite3>.tables sqlite3> sqlite3>.VACUUM sqlite3>.q

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Let me know if i should close this as a non-issue.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    The .sql free mentioned before will get your LXD working again, at which point lxd import will work, running it for everyone of your containers on ZFS will have LXD inspect ZFS and re-create the DB entries for you.

    can you provide the .sql fix so 's LXD can get back online?

    点赞 评论 复制链接分享
  • weixin_39787089 weixin_39787089 4月前

    you can try to create a file named

    /var/lib/lxd/database/patch.global.sql
    with this content:
    
    INSERT INTO storage_pools_config(storage_pool_id, node_id, key, value) VALUES (1, 1, 'size', '93GB');
    INSERT INTO storage_pools_config(storage_pool_id, node_id, key, value) VALUES (1, 1, 'source', '/var/lib/lxd/disks/local.img');
    INSERT INTO storage_pools_config(storage_pool_id, node_id, key, value) VALUES (1, 1, 'zfs.pool_name', 'default');
    

    and then try to start the LXD daemon. Same on both nodes.

    I'd be a bit surprised if it helps in any way, but it won't hurt trying.

    Note that poking directly with the db.bin file is useless, because although it's a sqlite file, it's not the authoritative one, as it gets generated from raft data.

    Also, if you build another cluster, it's highly recommended to have at least 3 nodes, because you can maintain a quorum if one of them is available. With 2 nodes, anything that happens to a node makes the other one unusable.

    点赞 评论 复制链接分享
  • weixin_39634132 weixin_39634132 4月前

    Thanks , the script helped. the pool name was local and i mentioned that in the insert script instead of default. the lxd daemon seems to be coming up but it fails on downloading images.

    LOGS from lxd-pod1 t=2019-01-15T12:18:04+0000 lvl=info msg="LXD 3.0.3 is starting in normal mode" path=/var/lib/lxd t=2019-01-15T12:18:04+0000 lvl=info msg="Kernel uid/gid map:" t=2019-01-15T12:18:04+0000 lvl=info msg=" - u 0 0 4294967295" t=2019-01-15T12:18:04+0000 lvl=info msg=" - g 0 0 4294967295" t=2019-01-15T12:18:04+0000 lvl=info msg="Configured LXD uid/gid map:" t=2019-01-15T12:18:04+0000 lvl=info msg=" - u 0 100000 65536" t=2019-01-15T12:18:04+0000 lvl=info msg=" - g 0 100000 65536" t=2019-01-15T12:18:04+0000 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2019-01-15T12:18:04+0000 lvl=info msg="Kernel features:" t=2019-01-15T12:18:04+0000 lvl=info msg=" - netnsid-based network retrieval: no" t=2019-01-15T12:18:04+0000 lvl=info msg=" - unprivileged file capabilities: yes" t=2019-01-15T12:18:04+0000 lvl=info msg="Initializing local database" t=2019-01-15T12:18:04+0000 lvl=info msg="Starting /dev/lxd handler:" t=2019-01-15T12:18:04+0000 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock t=2019-01-15T12:18:04+0000 lvl=info msg="REST API daemon:" t=2019-01-15T12:18:04+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket t=2019-01-15T12:18:04+0000 lvl=info msg=" - binding TCP socket" socket=10.1.0.5:8443 t=2019-01-15T12:18:04+0000 lvl=info msg="Initializing global database" t=2019-01-15T12:18:08+0000 lvl=warn msg="Raft: Heartbeat timeout from \"\" reached, starting election" t=2019-01-15T12:18:09+0000 lvl=info msg="Updating the LXD global schema. Backup made as \"global.bak\"" t=2019-01-15T12:18:10+0000 lvl=info msg="Initializing storage pools" t=2019-01-15T12:18:10+0000 lvl=info msg="Initializing networks" t=2019-01-15T12:18:10+0000 lvl=info msg="Pruning leftover image files" t=2019-01-15T12:18:10+0000 lvl=info msg="Done pruning leftover image files" t=2019-01-15T12:18:10+0000 lvl=info msg="Loading daemon configuration" t=2019-01-15T12:18:10+0000 lvl=info msg="Pruning expired images" t=2019-01-15T12:18:10+0000 lvl=info msg="Done pruning expired images" t=2019-01-15T12:18:10+0000 lvl=info msg="Expiring log files" t=2019-01-15T12:18:10+0000 lvl=info msg="Done expiring log files" t=2019-01-15T12:18:10+0000 lvl=info msg="Updating instance types" t=2019-01-15T12:18:10+0000 lvl=info msg="Done updating instance types" t=2019-01-15T12:18:11+0000 lvl=info msg="Updating images" t=2019-01-15T12:18:11+0000 lvl=info msg="Done updating images" t=2019-01-15T12:18:12+0000 lvl=info msg="Downloading image" alias=centos/7/amd64 server=https://images.linuxcontainers.org t=2019-01-15T12:18:24+0000 lvl=eror msg="Failed to update the image" err="UNIQUE constraint failed: images.fingerprint" fp=1a37b19805e7f6077eff5cb8bdc8bd062c5d3e6069fe0806a9cfd2ef039a6136

    root-pod1:/var/lib/lxd/database# systemctl status lxd ● lxd.service - LXD - main daemon Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled) Active: active (running) since Tue 2019-01-15 12:18:11 UTC; 3min 25s ago Docs: man:lxd(1) Process: 5117 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS) Process: 5101 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS) Main PID: 5116 (lxd) Tasks: 27 CGroup: /system.slice/lxd.service └─5116 /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log

    Jan 15 12:18:04 lxd-pod1.basementcloud systemd[1]: Starting LXD - main daemon... Jan 15 12:18:04 lxd-pod1.basementcloud lxd[5116]: t=2019-01-15T12:18:04+0000 lvl=warn msg="CGroup memory swap accounting is disabl Jan 15 12:18:08 lxd-pod1.basementcloud lxd[5116]: t=2019-01-15T12:18:08+0000 lvl=warn msg="Raft: Heartbeat timeout from \"\" reach Jan 15 12:18:11 lxd-pod1.basementcloud systemd[1]: Started LXD - main daemon. Jan 15 12:18:24 lxd-pod1.basementcloud lxd[5116]: t=2019-01-15T12:18:24+0000 lvl=eror msg="Failed to update the image" err="UNIQUE

    LOGS from lxd-pod2 t=2019-01-15T12:18:05+0000 lvl=info msg="LXD 3.0.3 is starting in normal mode" path=/var/lib/lxd t=2019-01-15T12:18:05+0000 lvl=info msg="Kernel uid/gid map:" t=2019-01-15T12:18:05+0000 lvl=info msg=" - u 0 0 4294967295" t=2019-01-15T12:18:05+0000 lvl=info msg=" - g 0 0 4294967295" t=2019-01-15T12:18:05+0000 lvl=info msg="Configured LXD uid/gid map:" t=2019-01-15T12:18:05+0000 lvl=info msg=" - u 0 100000 65536" t=2019-01-15T12:18:05+0000 lvl=info msg=" - g 0 100000 65536" t=2019-01-15T12:18:05+0000 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2019-01-15T12:18:05+0000 lvl=info msg="Kernel features:" t=2019-01-15T12:18:05+0000 lvl=info msg=" - netnsid-based network retrieval: no" t=2019-01-15T12:18:05+0000 lvl=info msg=" - unprivileged file capabilities: yes" t=2019-01-15T12:18:05+0000 lvl=info msg="Initializing local database" t=2019-01-15T12:18:05+0000 lvl=info msg="Starting /dev/lxd handler:" t=2019-01-15T12:18:05+0000 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock t=2019-01-15T12:18:05+0000 lvl=info msg="REST API daemon:" t=2019-01-15T12:18:05+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket t=2019-01-15T12:18:05+0000 lvl=info msg=" - binding TCP socket" socket=10.2.0.5:8443 t=2019-01-15T12:18:05+0000 lvl=info msg="Initializing global database" t=2019-01-15T12:18:09+0000 lvl=info msg="Updating the LXD global schema. Backup made as \"global.bak\"" t=2019-01-15T12:18:09+0000 lvl=info msg="Initializing storage pools" t=2019-01-15T12:18:09+0000 lvl=info msg="Initializing networks" t=2019-01-15T12:18:09+0000 lvl=info msg="Pruning leftover image files" t=2019-01-15T12:18:09+0000 lvl=info msg="Done pruning leftover image files" t=2019-01-15T12:18:09+0000 lvl=info msg="Loading daemon configuration" t=2019-01-15T12:18:10+0000 lvl=info msg="Pruning expired images" t=2019-01-15T12:18:10+0000 lvl=info msg="Done pruning expired images" t=2019-01-15T12:18:10+0000 lvl=info msg="Updating images" t=2019-01-15T12:18:10+0000 lvl=info msg="Done updating images" t=2019-01-15T12:18:10+0000 lvl=info msg="Updating instance types" t=2019-01-15T12:18:10+0000 lvl=info msg="Done updating instance types" t=2019-01-15T12:18:10+0000 lvl=info msg="Expiring log files" t=2019-01-15T12:18:10+0000 lvl=info msg="Done expiring log files" t=2019-01-15T12:18:12+0000 lvl=info msg="Downloading image" alias=centos/7/amd64 server=https://images.linuxcontainers.org t=2019-01-15T12:18:21+0000 lvl=info msg="Image downloaded" alias=centos/7/amd64 server=https://images.linuxcontainers.org t=2019-01-15T12:18:24+0000 lvl=eror msg="Error loading image" err="No such object" fp=84a71299044bc3c3563396bef153c0da83d494f6bf3d38fecc55d776b1e19bf9

    root-pod2:/var/lib/lxd/database# systemctl status lxd ● lxd.service - LXD - main daemon Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled) Active: active (running) since Tue 2019-01-15 12:18:10 UTC; 4min 25s ago Docs: man:lxd(1) Process: 4828 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS) Process: 4812 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS) Main PID: 4827 (lxd) Tasks: 25 CGroup: /system.slice/lxd.service └─4827 /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log

    Jan 15 12:18:05 lxd-pod2.basementcloud systemd[1]: Starting LXD - main daemon... Jan 15 12:18:05 lxd-pod2.basementcloud lxd[4827]: t=2019-01-15T12:18:05+0000 lvl=warn msg="CGroup memory swap accounting is disabl Jan 15 12:18:10 lxd-pod2.basementcloud systemd[1]: Started LXD - main daemon. Jan 15 12:18:24 lxd-pod2.basementcloud lxd[4827]: t=2019-01-15T12:18:24+0000 lvl=eror msg="Error loading image" err="No such objec lines 1-15/15 (END)

    点赞 评论 复制链接分享

相关推荐