weixin_39928940
weixin_39928940
2020-11-23 13:10

Crashes when patching kvm_arch_vm_ioctl

I've noticed kernel oopses, invalid opcodes, or corrupted stacks when trying to produce a livepatch for CVE-2015-7513. In starting to debug this I've pared down the problem and tested a few things but have not root caused this yet.

Steps to reproduce: 1. Produce a livepatch using [1]. This simply adds a printk to kvm_arch_vm_ioctl. 2. I built this using 'kpatch-build' and have tested it using kernel livepatch and kpatch.ko. 3. Insert the KVM module, and insert the Livepatch module. (Order doesn't matter for reproduciblilty.) 4. Start a VM

Here's you'll notice crashes as follows:


[  128.546935] BUG: unable to handle kernel paging request at ffffffff87e626c2
[  128.547750] IP: [<ffffffff87e626c2>] 0xffffffff87e626c2
[  128.548283] PGD 1c0f067 PUD 1c10063 PMD 0 
[  128.548808] Oops: 0010 [#1] SMP 
[  128.549215] Modules linked in: vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT nf_reject_ipv4 kpatch_patch(OE) isofs ebtable_filter ebtables ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev joydev serio_raw parport_pc parport kvm_intel kvm autofs4 psmouse floppy
[  128.554499] CPU: 0 PID: 1249 Comm: qemu-system-x86 Tainted: G           OE K 4.2.0-16-generic #19-Ubuntu
[  128.555335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  128.556178] task: ffff88007c0d1900 ti: ffff880036d08000 task.ti: ffff880036d08000
[  128.556847] RIP: 0010:[<ffffffff87e626c2>]  [<ffffffff87e626c2>] 0xffffffff87e626c2
[  128.557683] RSP: 0018:ffff880036d0bcf0  EFLAGS: 00010282
[  128.558274] RAX: 0000000000000051 RBX: ffff88007bc8bc00 RCX: ffff88007aeb5100
[  128.559021] RDX: 002b40bd8b4980c2 RSI: 0000000000000246 RDI: ffff88007aeb5100
[  128.559772] RBP: ffff880036d0bdb8 R08: 000000000000000a R09: 0000000000000240
[  128.560517] R10: 000000000000a210 R11: 0000000000000240 R12: 00007ffc28e2c860
[  128.561260] R13: ffff880036f44000 R14: ffffffff8208ae63 R15: 00007ffc28e2c860
[  128.562003] FS:  00007f1172dd2b00(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  128.562931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  128.563559] CR2: ffffffff87e626c2 CR3: 000000007bf0d000 CR4: 00000000000026f0
[  128.564310] Stack:
[  128.564630]  ffffffffc01eddc3 ffff880036f44000 ffff880036d0be78 000000008208ae63
[  128.565683]  ffff88007d3d78c0 ffff8800374a6d38 0000000000000058 0000000000000001
[  128.566734]  ffff88007c336f00 00007ffc28e2c860 00007ffc28e2c860 000000008208ae63
[  128.567793] Call Trace:
[  128.568154]  [<ffffffffc01eddc3>] ? kvm_arch_vm_ioctl+0xb43/0xc30 [kpatch_patch]
[  128.569026]  [<ffffffffc01ed280>] ? patch_exit+0x4f/0x4f [kpatch_patch]
[  128.569740]  [<ffffffffc004bd8d>] kvm_vm_ioctl+0x9d/0x740 [kvm]
[  128.570416]  [<ffffffff81248af7>] ? eventfd_ctx_read+0x67/0x210
[  128.571084]  [<ffffffff810a5b60>] ? wake_up_q+0x70/0x70
[  128.571700]  [<ffffffff812108a5>] do_vfs_ioctl+0x295/0x480
[  128.572327]  [<ffffffff81212e65>] ? SyS_ppoll+0xf5/0x1b0
[  128.572936]  [<ffffffff81210b09>] SyS_ioctl+0x79/0x90
[  128.573527]  [<ffffffff817ef9f2>] entry_SYSCALL_64_fastpath+0x16/0x75
[  128.574231] Code:  Bad RIP value.
</ffffffff817ef9f2></ffffffff81210b09></ffffffff81212e65></ffffffff812108a5></ffffffff810a5b60></ffffffff81248af7></ffffffffc004bd8d></ffffffffc01ed280></ffffffffc01eddc3></ffffffff87e626c2></ffffffff87e626c2></ffffffff87e626c2>

Note, depending on the patch this may manifest in invalid opcodes, oopses or stack corruption. I've also reproduced this on 4.5-rc2 as well.

In addition to testing using kpatch produced artifacts, I attempted to patch using just a module [2]. Due to the complexities in patching kvm_arch_vm_ioctl, I hacked the base kernel and added a __kvm_arch_vm_ioctl function that was an exact duplicate of the original function. Thus the patch can re-use the original code. Using this method I was able to successfully patch the function, but only if I exported __kvm_arch_vm_ioctl. If I didn't export the function and tried to use kallsyms_lookup_name I got the following error:


[   85.478518] ioctl = 0x4008ae48
[   85.478537] kernel tried to execute NX-protected page - exploit attempt? (uid: 112)
[   85.479253] BUG: unable to handle kernel paging request at ffffffffc02bd5c0
[   85.479881] IP: [<ffffffffc02bd5c0>] __kvm_arch_vm_ioctl+0x0/0xffffffffffffea40 [livepatch]
[   85.480600] PGD 1c0f067 PUD 1c11067 PMD 7bc77067 PTE 800000003651b163
[   85.481204] Oops: 0011 [#1] SMP 
[   85.481512] Modules linked in: vhost_net(E) vhost(E) macvtap(E) macvlan(E) xt_conntrack(E) ipt_REJECT(E) nf_re
ject_ipv4(E) livepatch(OE) nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) xt
_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf
_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E) i
p_tables(E) x_tables(E) ppdev(E) input_leds(E) joydev(E) serio_raw(E) 8250_fintek(E) parport_pc(E) parport(E) mac
_hid(E) i2c_piix4(E) kvm_intel(E) kvm(E) autofs4(E) psmouse(E) floppy(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) drm_kms_helper(E) drm(E) pata_acpi(E)
[   85.487546] CPU: 0 PID: 1504 Comm: qemu-system-x86 Tainted: G           OE K 4.2.3+ #1
[   85.488203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   85.488989] task: ffff880078118c80 ti: ffff88007be84000 task.ti: ffff88007be84000
[   85.489613] RIP: 0010:[<ffffffffc02bd5c0>]  [<ffffffffc02bd5c0>] __kvm_arch_vm_ioctl+0x0/0xffffffffffffea40 [livepatch]
[   85.490521] RSP: 0018:ffff88007be87d90  EFLAGS: 00010246
[   85.490968] RAX: 0000000000000012 RBX: 000000004008ae48 RCX: 0000000000000000
[   85.491571] RDX: 00007ffcf1c26bf8 RSI: 000000004008ae48 RDI: ffff88007bfc5c00
[   85.492166] RBP: ffff88007be87db8 R08: 000000000000000a R09: 000000000000022f
[   85.492762] R10: 0000000000009d74 R11: 000000000000022f R12: ffff88007bfc5c00
[   85.493356] R13: 00007ffcf1c26bf8 R14: 000000004008ae48 R15: 00007ffcf1c26bf8
[   85.494329] FS:  00007f27d64deb00(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[   85.494999] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   85.495491] CR2: ffffffffc02bd5c0 CR3: 000000007bed1000 CR4: 00000000000026f0
[   85.496080] Stack:
[   85.496080] Stack:
[   85.496259]  ffffffffc02bb034 ffff88007be87e78 000000004008ae48 ffff88007be20000
[   85.496922]  00007ffcf1c26bf8 ffff88007be87e78 ffffffffc0132d8d ffff88007be87dd8
[   85.497586]  ffffffff8121aaf5 ffff88007be87e28 ffffffffc0142e88 ffff88007be20000
[   85.498250] Call Trace:
[   85.498464]  [<ffffffffc02bb034>] ? livepatch_kvm_arch_vm_ioctl+0x34/0x60 [livepatch]
[   85.499123]  [<ffffffffc0132d8d>] kvm_vm_ioctl+0x9d/0x740 [kvm]
[   85.499629]  [<ffffffff8121aaf5>] ? fd_install+0x25/0x30
[   85.500081]  [<ffffffffc0142e88>] ? kvm_arch_dev_ioctl+0x158/0x1b0 [kvm]
[   85.500651]  [<ffffffffc012d67a>] ? kvm_vm_ioctl_check_extension_generic+0x4a/0x70 [kvm]
[   85.501322]  [<ffffffffc0130862>] ? kvm_dev_ioctl+0x172/0x4a0 [kvm]
[   85.501843]  [<ffffffff81210315>] do_vfs_ioctl+0x295/0x480
[   85.502298]  [<ffffffff8120c02b>] ? putname+0x5b/0x60
[   85.502721]  [<ffffffff811fbc2f>] ? do_sys_open+0x1bf/0x280
[   85.503184]  [<ffffffff81210579>] SyS_ioctl+0x79/0x90
[   85.503617]  [<ffffffff817ef0b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[   85.504149] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <c0> d5 2b c0 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 
[   85.506536] RIP  [<ffffffffc02bd5c0>] __kvm_arch_vm_ioctl+0x0/0xffffffffffffea40 [livepatch]
[   85.507243]  RSP <ffff88007be87d90>
[   85.507552] CR2: ffffffffc02bd5c0
</ffff88007be87d90></ffffffffc02bd5c0></c0></ffffffff817ef0b2></ffffffff81210579></ffffffff811fbc2f></ffffffff8120c02b></ffffffff81210315></ffffffffc0130862></ffffffffc012d67a></ffffffffc0142e88></ffffffff8121aaf5></ffffffffc0132d8d></ffffffffc02bb034></ffffffffc02bd5c0></ffffffffc02bd5c0></ffffffffc02bd5c0>

I'd be happy to post this on the kernel livepatch ML as well.

--chris

[1] - http://people.canonical.com/~arges/livepatch_issue/patch.diff [2] - http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl/

该提问来源于开源项目:dynup/kpatch

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

31条回答

  • weixin_39942451 weixin_39942451 5月前

    But more generally, it would be a problem in the case where the replacement instructions do have relocations.

    Is there a particular example when the replacement instructions have relocations that really need dynrelas? Or is it more of a precaution that such things may appear in the future?

    As for paravirt operations on x86, it seems that only the indirect calls via the pointers stored in pv_*ops structures need relocations. However, most pv_*ops structures are exported by the kernel (except pv_init_ops), so, I suppose, everything could be resolved without dynrelas for these, right?

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Is there a particular example when the replacement instructions have relocations that really need dynrelas? Or is it more of a precaution that such things may appear in the future?

    Good question. So far it's just been theoretical. Today I've been looking for a specific instance and I haven't been able to find one.

    I think if we tried to convert a rela in the .rela.altinstr_replacement section to a dynrela, it would fail. But I haven't found a place in the kernel where a dynrela would be needed for that section. Because all the relas in that section seem to be either for exported symbols or for code locations local to the function being patched.

    As for paravirt operations on x86, it seems that only the indirect calls via the pointers stored in pv__ops structures need relocations. However, most pv__ops structures are exported by the kernel (except pv_init_ops), so, I suppose, everything could be resolved without dynrelas for these, right?

    Yeah, after looking at how jump and call paravirt sites are applied, I think I agree.

    So maybe there's no issue and we can just skip the dynrela when the target data is non-zero already. But that does make me nervous, since we could be covering up some other issue. I suppose we could spit out a warning when skipping a dynrela, but that's also not ideal.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    So maybe there's no issue and we can just skip the dynrela when the target data is non-zero already. But that does make me nervous, since we could be covering up some other issue. I suppose we could spit out a warning when skipping a dynrela, but that's also not ideal.

    I will patch the core module this way and experiment with it a bit. Let us see, how it goes.

    Another approach could be to patch create-diff-object to avoid generating dynrelas for paravirt stuff. Haven't tried that yet either.

    点赞 评论 复制链接分享
  • weixin_39928940 weixin_39928940 5月前

    I will patch the core module this way and experiment with it a bit. Let us see, how it goes.

    I'd be happy to test on this end too.

    Another approach could be to patch create-diff-object to avoid generating dynrelas for paravirt stuff. Haven't tried that yet either.

    Would this mean creating a blacklist for symbols such as pv_cpu_ops and ensuring we don't create dynrelas for those symbols? Or is there another mechanism to detect if a function contains pv instructions.

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    I am very confused about the dynrela skipping idea. What kind of dynrelas are we talking about? If we have a section .altinstr_replacement and a corresponding rela section .rela.altinstr_replacement and dynrelas that also apply to .altinstr_replacement, why would it be OK to skip those dynrelas? Or am I misunderstanding something completely?

    On the topic of dynrelas affecting replacement instructions...

    Is there a particular example when the replacement instructions have relocations that really need dynrelas? Or is it more of a precaution that such things may appear in the future?

    Yes, see below example.

    I think if we tried to convert a rela in the .rela.altinstr_replacement section to a dynrela, it would fail. But I haven't found a place in the kernel where a dynrela would be needed for that section. Because all the relas in that section seem to be either for exported symbols or for code locations local to the function being patched.

    If we create a kpatch module with Chris' sample kvm patch, and step through kpatch_create_dynamic_rela_sections() with gdb, we'll see that dynrelas are being created from the relas in .rela.altinstr_replacement, because these relas reference symbols that are global but not exported, therefore requiring dynrelas (The symbols in question, if you want specifics: kvm_arch_vcpu_ioctl_set_sregs, kvm_put_guest_fpu).

    In the end the entire original .rela.altinstr_replacement section is stripped because all the relas have been converted to dynrelas, and that is why you don't see a rela section for .altinstr_replacement when you dump the section table with readelf. If you leave the relas as they are and do not create dynrelas for them, the module loader will reject the patch module, as the relas from .rela.altinstr_replacement reference symbols that are not exported (which is why we needed the dynrelas in the first place). I don't see how skipping or ignoring these dynrelas solves the issue, we still need these symbols resolved before the apply_{alternatives,paravirt} code runs, so the ordering issue still stands. The dynrelas need to be written before the paravirt/alt code runs.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    Would this mean creating a blacklist for symbols such as pv_cpu_ops and ensuring we don't create dynrelas for those symbols? Or is there another mechanism to detect if a function contains pv instructions.

    .parainstructions section is available in the object file, along with its rela section. So, create-diff-object could read these, get the locations of paravirt instruction sites and skip generating dynrelas for those instructions.

    we'll see that dynrelas are being created from the relas in .rela.altinstr_replacement, because these relas reference symbols that are global but not exported, therefore requiring dynrelas (The symbols in question, if you want specifics: kvm_arch_vcpu_ioctl_set_sregs, kvm_put_guest_fpu).

    Thanks for pointing this out. I did not think alternatives use such functions. Will take a look.

    As for paravirt, I wonder why create-diff-object created dynrelas for the pv_*_ops-based calls in this example despite pv_*_ops are exported.

    GDB will tell, I suppose. ;-)

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    I'd be happy to test on this end too.

    Here is the patch: https://github.com/euspectre/kpatch/commit/7e9184a04ce8fb35da92e572cb83bf76738e047a

    With it, KVM seems to work OK for my VMs when your binary patch for kvm_arch_vm_ioctl is used.

    I still do not understand, however, why create-diff-object created dynrelas for pv_irq_ops, which is exported by the kernel. Need to dig further in.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    If we create a kpatch module with Chris' sample kvm patch, and step through kpatch_create_dynamic_rela_sections() with gdb, we'll see that dynrelas are being created from the relas in .rela.altinstr_replacement, because these relas reference symbols that are global but not exported, therefore requiring dynrelas (The symbols in question, if you want specifics: kvm_arch_vcpu_ioctl_set_sregs, kvm_put_guest_fpu).

    This is quite strange. I checked .altinstr_replacement of kvm.ko (kernel 3.10.x, a RHEL-like OS, x86_64). Only the following kinds of replacement instructions are listed there: - mfence, lfence, xsaveopt64(%rdi), "rex popcnt %edi,%eax", stac, clac - these ones do not need relocations at all. - call near relative to copy_user_generic_string and copy_user_enhanced_fast_string - these instructions need relocations, indeed. However, both functions are exported by the kernel, so the ordinary relas should have been enough.

    Other kernel modules contain only such relocatable instructions in .altinstr_replacement on that system.

    So, it seems, .altinstr_replacement is not much of a problem as well.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    Well, as for dynrelas for pv_irq_ops, GDB did tell what happened, indeed. ;-)

    When a module should be patched, and dynrelas are being prepared, create-diff-object only checks if a symbol is global and if it is defined in the module itself, right?

    lookup_global_symbol() does not find pv_irq_ops symbol in kvm.ko, so the symbol is considered external and a dynrela is created. No matter whether the symbol is exported or not. I could have guessed it before.

    Anyway, this is probably not a problem. apply_alternatives() relocates the calls explicitly (http://lxr.free-electrons.com/source/arch/x86/kernel/alternative.c?v=3.10#L280).

    Both paravirt and alternatives are applied to the modules after relocations, so they should not depend on dynrelas as well, I suppose. As long as kpatch.ko skips the appropriate dynrelas, everything should be fine.

    Or, perhaps, there are other problematic scenarios?

    点赞 评论 复制链接分享
  • weixin_39928940 weixin_39928940 5月前

    FWIW, I translated what you did for kpatch into livepatch code, since I haven't really been using the kpatch core code here:

     diff
    diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
    index d68fbf6..7726462 100644
    --- a/kernel/livepatch/core.c
    +++ b/kernel/livepatch/core.c
    @@ -235,6 +235,7 @@ static int klp_write_object_relocations(struct module *pmod,
     {
            int ret = 0;
            unsigned long val;
    +       unsigned long size;
            struct klp_reloc *reloc;
    
            if (WARN_ON(!klp_is_object_loaded(obj)))
    @@ -263,6 +264,20 @@ static int klp_write_object_relocations(struct module *pmod,
                    if (ret)
                            goto out;
    
    +               /* hack to get reloc size */
    +               size = reloc->type == R_X86_64_64? 8 : 4;
    +
    +               /*
    +                * skip if the instruction to be relocated has been changed
    +                * already (paravirt of alternatives may do this)
    +                */
    +               if (memchr_inv((void *)reloc->loc, 0, size)) {
    +                       pr_info("skipping relocation for %s (0x%lx name, reloc->loc, val);
    +                       continue;
    +               }
    +
    +
                    ret = klp_write_module_reloc(pmod, reloc->type, reloc->loc,
                                                 val + reloc->addend);
                    if (ret) {
    

    Using this I was able to patch with my original patch and start a VM without a crash.

    However, noting -toast 's comments here: https://github.com/dynup/kpatch/issues/580#issuecomment-200570699 She mentioned that we be applying these relocations and there may be cases that get missed using this method.

    Looking back in this thread, I also see -toast 's comment: https://github.com/dynup/kpatch/issues/580#issuecomment-183001652 Noting that she had reordered the patch applications and got this working as a hack, in addition she mentioned that once the arch-independent livepatch set landed, this would be easier to work out. Is it worth looking more into now as a general solution?

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    First, thanks for taking the time to look into this issue! I can confirm that both patches fixed the panic and I'm able to run a VM without any issues.

    I still have some concerns. I need to preface this by noting that I am not entirely familiar with the paravirtualization code in the kernel, so someone who is more familiar should weigh in and let me know if my concerns are valid or not.

    Since dynrelas are written after module load as opposed to the expected order, which is to 1) apply all relocations to a function first before 2) applying paravirt and alternative patches (i.e. the order is apply_relocations() --> apply_{paravirt,alternative}(), in kernel/module.c), I am not sure if we are breaking any expectations here. My main concern is leaving apply_paravirt() to patch a function without all its relas written in first (which becomes the case when we skip dynrelas).

    A paravirt patch site does in fact refer to the old instruction that is expected to be there, which may be "incomplete" if there are still relas (including dynrelas) to be written (i.e., I'm referring to the address of where the old instruction is expected to be in the .text, in this case, the text of the function kvm_arch_vm_ioctl). See the struct definition of paravirt_patch_site and the field instr: http://lxr.free-electrons.com/source/arch/x86/include/asm/paravirt_types.h#L674. Actually, I'm not even sure if the old instruction even ever gets used, because it looks like the instruction buffer in apply_paravirt() just gets overwritten in the end anyway.

    So, even though in this case these patches work, I'm just worried we'll be shooting ourselves in the foot later on. IMHO, I think the least risky solution is to avoid hard coding edge cases and to apply both relocations and paravirt patches in the original expected order.

    Yep, so my hack fix was to simply have livepatch finish writing its relas before calling apply_paravirt(), which is easy to do with the arch independent patchset since it allows a module to access all its Elf section information. I guess the ugly part of that solution is the irony of introducing more arch code (x86 specific in this case) which we were trying to avoid in the first place :P. I think we should bring this up on the livepatch mailing list, since we'll get more opinions there.

    tl;dr, My main concern is the possible consequences (if at all any...) of letting apply_paravirt patch a function that may possibly not have all its relas and dynrelas written in beforehand. I haven't been able to think of concrete problems yet, so perhaps I'm being overly paranoid.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    Just to make it clearer: I agree, that skipping dynrelas is error-prone and livepatch should use a finer solution suggested by -toast.

    Some projects may still rely on the Kpatch core module though, untill livepatch matures. For kpatch.ko, there are three options, as far as I can see: - skip these dynrelas; - check for the exported symbols using Module.symvers, similar to https://github.com/dynup/kpatch/pull/571. - detect if the instructions have changed and refuse to apply the binary patch in such cases.

    All these options may have their problems, of course. Still, they should allow using kpatch.ko till livepatch is production-ready.

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    Thanks for posting on the livepatch mailing list :-)

    Ah, so we are on the same page then :-) Yeah, I agree for kpatch.ko we will have to go with one of the solutions you listed. I think I like the solution you have now, it is IMO the most succinct way to go about "bandaging" kpatch.ko until we are able to deprecate it. is away for this week, so maybe he could weigh in on this a bit later.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Your patch looks good to me as a temporary band-aid, with a few minor changes: - It should probably use a higher priority printk, maybe a pr_notice() or pr_warning()? In theory it could be covering up a bug, and we need to relay that possibility to the user somehow. - The printk should try to give a slightly more user-friendly message. Though the concept of dynrelas is hard to describe in plain language... It should at least somehow relay that it's probably not a bug, but it could be, so the user needs to be confident the patch creator knew what they were doing :-)

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    For reference, the livepatch version of this bug is discussed here:

    https://lkml.kernel.org/r/20160329120518.GA21252.com

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    Your patch looks good to me as a temporary band-aid, with a few minor changes

    Thanks for the review.

    I'll update the patch and send a pull request then.

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    I think we can close this issue now. The original issue reported with patching kvm_arch_vm_ioctl() has been fixed. We've implemented the fix in upstream livepatch, the kpatch build tools now have support for .klp.arch. sections, and the kpatch core module has a workaround implemented by PR #590

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    I was able to recreate on Fedora:

    
    [ 3224.133584] ioctl = 1074310728
    [ 3224.133587] ioctl = 44615
    [ 3224.133630] ioctl = 44640
    [ 3224.161867] ioctl = 1077980791
    [ 3224.161924] ioctl = 44657
    [ 3224.179096] ioctl = -2113360285
    [ 3224.180457] ioctl = 1081126560
    [ 3224.180462] ioctl = -2113360285
    [ 3224.180465] ioctl = -2113360285
    [ 3224.262730] ioctl = 1081126560
    [ 3224.262737] ioctl = 1076932219
    [ 3224.262758] BUG: unable to handle kernel paging request at 000000004030ae7b
    [ 3224.262780] IP: [<000000004030ae7b>] 0x4030ae7b
    [ 3224.262793] PGD 0 
    [ 3224.262799] Oops: 0010 [#1] SMP 
    [ 3224.262809] Modules linked in: vhost_net vhost macvtap macvlan kpatch_patch(OE) kpatch(OE) nls_utf8 isofs rfcomm fuse cmac xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ccm nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep arc4 intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp iwlmvm uvcvideo mac80211 kvm_intel snd_hda_codec_realtek videobuf2_vmalloc kvm snd_hda_codec_generic snd_hda_codec_hdmi videobuf2_core iTCO_wdt videobuf2_memops
    [ 3224.263008]  iTCO_vendor_support snd_hda_intel iwlwifi v4l2_common btusb snd_hda_codec btrtl videodev btbcm btintel snd_hda_core cfg80211 snd_hwdep bluetooth snd_seq i2c_i801 media snd_seq_device lpc_ich snd_pcm thinkpad_acpi snd_timer snd mei_me mei rfkill ie31200_edac soundcore shpchp edac_core tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt 8021q garp stp llc mrp nouveau i915 mxm_wmi ttm crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_kms_helper crc32c_intel e1000e sdhci_pci drm sdhci serio_raw mmc_core ptp pps_core wmi fjes video
    [ 3224.263159] CPU: 5 PID: 31596 Comm: qemu-system-x86 Tainted: G     U     OE   4.3.4-200.fc22.x86_64 #1
    [ 3224.263180] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
    [ 3224.263197] task: ffff8803864d9e00 ti: ffff88038503c000 task.ti: ffff88038503c000
    [ 3224.263213] RIP: 0010:[<000000004030ae7b>]  [<000000004030ae7b>] 0x4030ae7b
    [ 3224.263229] RSP: 0018:ffff88038504bcc9  EFLAGS: 00010046
    [ 3224.263240] RAX: 0000000000000000 RBX: ffffffffffffffea RCX: 0000000000000000
    [ 3224.263255] RDX: 0000000000000000 RSI: 00007ffe3c441c50 RDI: ffff88038503fd80
    [ 3224.263269] RBP: ffff88038503fdf0 R08: 000000000000000a R09: 000000000000042f
    [ 3224.263284] R10: ffff8803e8841a38 R11: 000000000000042f R12: ffff8804100b4000
    [ 3224.263299] R13: 00007ffe3c441c20 R14: 000000004030ae7b R15: 00007ffe3c441c20
    [ 3224.263315] FS:  00007f10735e4c00(0000) GS:ffff88047e340000(0000) knlGS:0000000000000000
    [ 3224.263331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 3224.263344] CR2: 000000004030ae7b CR3: 0000000396035000 CR4: 00000000001426e0
    [ 3224.263359] Stack:
    [ 3224.263364]  55f811af7b49b54a 1d9072c4091185ba 118ae3479e9fd33e 8df56f86166badac
    [ 3224.263383]  e23218310a1b6e5e bee9eac5dc0467ab d59f2a4fb21541fa 5f245dbf01515e94
    [ 3224.263401]  d6ad0081f6599b18 66da9bd2cdca4edb de2271ab386faf2f 41544048517279f9
    [ 3224.263419] Call Trace:
    [ 3224.263425]  <unk> 
    [ 3224.263430] Code: 
    [ 3224.263436]  Bad RIP value.
    [ 3224.263440] RIP  [<000000004030ae7b>] 0x4030ae7b
    [ 3224.263450]  RSP <ffff88038504bcc9>
    [ 3224.263458] CR2: 000000004030ae7b
    [ 3224.269121] ---[ end trace 35303edd7fcbbd9d ]---
    </ffff88038504bcc9></unk>
    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Looks like a problem with kpatch-build. The relocations look all wrong.

    The original function's rela section has 67 entries, whereas the replacement function only has 2.

    Original:

    
    [1418] .rela.text.kvm_arch_vm_ioctl RELA         0000000000000000 00539de0 00000648 24 I     3863 1417  8
    
    Relocation section [1418] '.rela.text.kvm_arch_vm_ioctl' for section [1417] '.text.kvm_arch_vm_ioctl' at offset 0x539de0 contains 67 entries:
      Offset              Type            Value               Addend Name
      0x0000000000000001  X86_64_NONE     000000000000000000      -4 __fentry__
      0x0000000000000028  X86_64_32S      000000000000000000   +2665 .rodata.str1.1
      0x0000000000000040  X86_64_PC32     000000000000000000      -4 printk
      0x0000000000000095  X86_64_32S      000000000000000000     +16 pv_irq_ops
      0x000000000000009f  X86_64_PC32     000000000000000000      -4 ktime_get_with_offset
      0x00000000000000b3  X86_64_32S      000000000000000000     +24 pv_irq_ops
      0x00000000000000f9  X86_64_PC32     000000000000000000      -4 _copy_to_user
    ...
    

    Replacement:

    
    [12] .rela.text.kvm_arch_vm_ioctl RELA         0000000000000000 000d52a0 00000030 24 I     55  11  8
    
    Relocation section [12] '.rela.text.kvm_arch_vm_ioctl' for section [11] '.text.kvm_arch_vm_ioctl' at offset 0xd52a0 contains 2 entries:
      Offset              Type            Value               Addend Name
      0x0000000000000001  X86_64_PC32     000000000000000000      -4 __fentry__
      0x0000000000000028  X86_64_32S      000000000000000000   +1437 .rodata.str1.1
    
    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Actually, scratch that, the relocations might be missing because they were converted into dynrelas. Still looking...

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    I added some trace_printk() calls and it seems to be dying in local_irq_disable(). I bet there's some weird interaction between paravirt patching and kpatch dynrela patching.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    So this seems to be another module loading order issue.

    In load_module(), first it does normal relocations, then it calls post_relocation() which calls module_finalize() which applies paravirt patches. Then, when the notifier is called, kpatch applies the dynrelas. The problem is that it tries to apply the dynrelas to the original code, not the paravirt-patched code. So it patches the wrong instructions and creates corrupt code.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    The dynrelas need to be written before the paravirt and "alternative" patches are applied. For livepatch we'll need to change the module load initialization order again. The current order is: 1. apply normal relocations 2. initialize kallsyms 3. apply paravirt and alternative patches 4. enable ftrace 5. enable livepatch (apply livepatch relocations and patch functions)

    We need to split step 5 into 2 steps, and change the order to something like: 1. apply normal relocations 2. initialize kallsyms (note: maybe we can swap this with 1) 3. apply livepatch relocations (this needs kallsyms) 4. apply paravirt patches and alternative patches (this needs to be done after livepatch relocations) 5. enable ftrace 6. patch functions (this needs ftrace)

    That's for livepatch. For the kpatch core module, unfortunately I don't think this problem can be reasonably fixed because it's an out of tree module and it relies on the notifier, which is called after paravirt patching. kpatch.ko will hopefully soon be deprecated anyway in favor of livepatch, once we get a consistency model.

    One thing we can improve in kpatch.ko is, when writing a relocation, ensure that the memory location is all zeros before writing to it. If not, skip the relocation and spit out a fat warning that this patch may be problematic for the kpatch core module and there could be corruption (because the code which was copied from the .parainstructions site to the function might not have gotten its dynrela relocations resolved before it was copied).

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    Nice findings. I was also able to confirm the issue and reproduce the general protection faults. I can confirm that applying the paravirt and alternative patches after klp_write_module_relocs() resolved the issue (patch applies without issue, VM runs, and no more kernel panics).

    The hacky fix I put together involved having module_finalize() skip applying the patches if it is a livepatch module, and then having livepatch apply them manually after klp_write_module_relocs() runs. Unfortunately this did involve modifying x86 module loader code (which I'm not sure I like) and adding more x86 livepatch code (to just call apply_paravirt() and apply_alternatives(). With the arch-independent reloc patchset, this will be easy since we now have the elf section table and other information we need to call these functions. Another solution might be to rename .parainstructions to .klp.parainstructions so that the x86 module loader ignores them altogether (so we won't have to modify the x86 module loader at all), and then have x86 livepatch code call apply_{paravirt,alternatives}. There might be better solutions, so I'm still thinking.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    -toast What do you think about the following idea? It seems so simple that I'm probably missing something.

    What if we resolved klp symbols in simplify_symbols() and resolved klp relocations in apply_relocations() like so (this is based on top of both of your pending patch sets on the livepatch mailing list):

    
    diff --git a/kernel/module.c b/kernel/module.c
    index 85ba022..a441087 100644
    --- a/kernel/module.c
    +++ b/kernel/module.c
    @@ -2207,7 +2207,7 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
                            break;
    
                    case SHN_LIVEPATCH:
    -                       /* Livepatch symbols are resolved by livepatch */
    +                       ret = klp_resolve_symbol(mod, info, i);
                            break;
    
                    case SHN_UNDEF:
    @@ -2258,11 +2258,9 @@ static int apply_relocations(struct module *mod, const struct load_info *info)
                    if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
                            continue;
    
    -               /* Livepatch relocation sections are applied by livepatch */
                    if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
    -                       continue;
    -
    -               if (info->sechdrs[i].sh_type == SHT_REL)
    +                       err = klp_apply_relocation(mod, info, i);
    +               else if (info->sechdrs[i].sh_type == SHT_REL)
                            err = apply_relocate(info->sechdrs, info->strtab,
                                                 info->index.sym, i, mod);
                    else if (info->sechdrs[i].sh_type == SHT_RELA)
    @@ -3492,7 +3490,7 @@ static int prepare_coming_module(struct module *mod)
            int err;
    
            ftrace_module_enable(mod);
    -       err = klp_module_coming(mod);
    +       err = klp_module_apply_patch(mod);
            if (err)
                    return err;
    

    So basically split klp_module_coming() into three different functions: resolve symbol, apply relocation, and apply patch. If we did it that way, could we get rid of the klp_info struct in your arch-independent livepatch patches??

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    If we did it that way, could we get rid of the klp_info struct in your arch-independent livepatch patches??

    I knew I missed something obvious... I guess we still need that info for patching a module after it has already been loaded.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    If we did it that way, could we get rid of the klp_info struct in your arch-independent livepatch patches??

    I knew I missed something obvious... I guess we still need that info for patching a module after it has already been loaded.

    But still, if I didn't miss anything else, something like the above patch might be a good way to solve the panics.

    点赞 评论 复制链接分享
  • weixin_39845039 weixin_39845039 5月前

    Hm, I think we would still run into the same ordering problem (apply_paravirt getting called before livepatch applies relocations) for unloaded modules, wouldn't we? Say we are patching an unloaded module, then klp_apply_relocation() in your sample code would have to be delayed until the module loads. Then module_finalize() gets called immediately afterwards and applies the paravirt and "alternative" patches, and then only when the target module loads can klp_apply_relocation finish its work (so we still get ordering 1) apply_paravirt 2) klp_write_module_reloc).

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Ok, that patch is garbage :stuck_out_tongue_winking_eye: I completely overlooked the fact that 1) SHF_RELA_LIVEPATCH and SHN_LIVEPATCH are in the patch module, not the patched module; and 2) the patch can be loaded after the patched module. So this problem is worse than I realized. Now I understand your suggestion solution.

    点赞 评论 复制链接分享
  • weixin_39942451 weixin_39942451 5月前

    A question concerning https://github.com/dynup/kpatch/issues/580#issuecomment-182925526, namely:

    One thing we can improve in kpatch.ko is, when writing a relocation, ensure that the memory location is all zeros before writing to it. If not, skip the relocation and spit out a fat warning that this patch may be problematic for the kpatch core module and there could be corruption (because the code which was copied from the .parainstructions site to the function might not have gotten its dynrela relocations resolved before it was copied).

    Why is it problematic for kpatch.ko to just skip relocations if they happen to be at the locations modified by paravirt mechanism?

    If apply_paravirt() replaces an instruction with another one, we should not actually care about the dynrela for the original one, right?

    As far as this particular example is concerned, apply_paravirt() replaces the calls to pv_irq_ops[n]() for local_irq_disable and local_irq_enable with "cli" and "sti" instructions, respectively, followed by nops. Looks like, if kpatch core module did not apply dynrela to the result, everything would be fine. Or, may be, I am missing something.

    点赞 评论 复制链接分享
  • weixin_39519554 weixin_39519554 5月前

    Why is it problematic for kpatch.ko to just skip relocations if they happen to be at the locations modified by paravirt mechanism?

    If apply_paravirt() replaces an instruction with another one, we should not actually care about the dynrela for the original one, right?

    As far as this particular example is concerned, apply_paravirt() replaces the calls to pv_irq_opsn for local_irq_disable and local_irq_enable with "cli" and "sti" instructions, respectively, followed by nops. Looks like, if kpatch core module did not apply dynrela to the result, everything would be fine. Or, may be, I am missing something.

    You're right for this specific example because the paravirt replacement instructions don't have relocations. But more generally, it would be a problem in the case where the replacement instructions do have relocations. In such a case we'd need to apply the dynrelas to the replacement instructions before the other patching code (paravirt, alternatives, etc.) gets a chance to copy them. At least I think that would be the simplest way to handle it.

    点赞 评论 复制链接分享

相关推荐