weixin_39581995
weixin_39581995
2020-12-09 00:31

CFI violations due to run_on_irqstack_cond

When building a linux-next kernel and booting it in WSL2, I see the following problem in dmesg:


[    2.127859] ------------[ cut here ]------------
[    2.127859] CFI failure (target: __sysvec_hyperv_callback+0x0/0x8):
[    2.127859] WARNING: CPU: 0 PID: 1 at kernel/cfi.c:29 __cfi_check_fail+0x33/0x40
[    2.127859] Modules linked in:
[    2.127859] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.7.0-next-20200612-microsoft-standard-cfi #1
[    2.127859] RIP: 0010:__cfi_check_fail+0x33/0x40
[    2.127859] Code: 48 c7 c7 90 b4 85 9d 48 c7 c6 3a 1b 4e 9d e8 74 e7 60 00 85 c0 75 02 5b c3 48 c7 c7 c3 66 49 9d 48 89 de 31 c0 e8 4d 04 eb ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 85 f6 74 25 41 b9 ea ff ff
[    2.127859] RSP: 0018:ffff9dc5c0003e28 EFLAGS: 00010046
[    2.127859] RAX: 7fbc94f118b6c700 RBX: ffffffff9cd64720 RCX: ffffffff9d83a7f8
[    2.127859] RDX: ffff8dbaebbb37c0 RSI: 0000000000000082 RDI: ffffffff9dbb8ddc
[    2.127859] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8dbaebbb7800
[    2.127859] R10: 00000000000000f6 R11: 0000000000000001 R12: 0000000000000000
[    2.127859] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.127859] FS:  0000000000000000(0000) GS:ffff8dbad1a00000(0000) knlGS:0000000000000000
[    2.127859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.127859] CR2: 0000000000000000 CR3: 00000002ce80c000 CR4: 0000000000340eb0
[    2.127859] Call Trace:
[    2.127859]  <irq>
[    2.127859]  ? __cfi_check+0x475e8/0x4b5c0
[    2.127859]  ? sysvec_hyperv_callback+0xb8/0xc0
[    2.127859]  ? asm_sysvec_hyperv_callback+0x12/0x20
[    2.127859]  ? dequeue_task_fair+0x8/0x8
[    2.127859]  ? vmbus_on_msg_dpc+0x1dc/0x2b0
[    2.127859]  ? vmbus_on_msg_dpc+0x1bb/0x2b0
[    2.127859]  ? acpi_pci_bridge_d3+0x10/0x10
[    2.127859]  ? __const_udelay.cfi_jt+0x8/0x8
[    2.127859]  ? tasklet_action_common+0x55/0x190
[    2.127859]  ? nla_reserve_nohdr.cfi_jt+0x10/0x10
[    2.127859]  ? __do_softirq+0x159/0x394
[    2.127859]  ? asm_call_on_stack+0x12/0x20
[    2.127859]  </irq>
[    2.127859]  ? do_softirq_own_stack+0x57/0x90
[    2.127859]  ? __irq_exit_rcu.llvm.7102834934463874888+0xb1/0xc0
[    2.127859]  ? sysvec_hyperv_callback+0x8d/0xc0
[    2.127859]  ? asm_sysvec_hyperv_callback+0x12/0x20
[    2.127859]  ? macvlan_uninit+0x8/0x8
[    2.127859]  ? native_restore_fl+0x10/0x10
[    2.127859]  ? register_netdev+0x5/0x40
[    2.127859]  ? loopback_net_init+0x46/0xa0
[    2.127859]  ? raw_sysctl_init+0x8/0x8
[    2.127859]  ? ops_init+0x8f/0x150
[    2.127859]  ? register_pernet_operations.llvm.14706795064528744033+0xbc/0x2f0
[    2.127859]  ? register_pernet_device+0x2e/0x60
[    2.127859]  ? net_dev_init+0x1fe/0x26e
[    2.127859]  ? __initstub__loopback__587_277_blackhole_netdev_init6+0x8/0x8
[    2.127859]  ? do_one_initcall+0xde/0x310
[    2.127859]  ? ida_alloc_range+0x39e/0x3e0
[    2.127859]  ? ignore_unknown_bootoption+0x5/0x8
[    2.127859]  ? parse_args+0x1ae/0x450
[    2.127859]  ? do_initcall_level+0x92/0x124
[    2.127859]  ? do_initcalls+0x49/0x72
[    2.127859]  ? cpu_stopper_thread+0x8/0x8
[    2.127859]  ? kernel_init_freeable+0xe3/0x163
[    2.127859]  ? kthreadd.cfi_jt+0x8/0x8
[    2.127859]  ? kernel_init+0xa/0x1a0
[    2.127859]  ? ret_from_fork+0x22/0x30
[    2.127859] ---[ end trace c0124fb9bcdf5f82 ]---

From what I can tell, that is fixed with this diff (although I highly doubt this is correct):

diff
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index d203c541a65a..76bd76010402 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -237,7 +237,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector)
  * Runs the function on the interrupt stack if the entry hit kernel mode
  */
 #define DEFINE_IDTENTRY_SYSVEC(func)                                   \
-static void __##func(struct pt_regs *regs);                            \
+static void __##func(void *regs);                                      \
                                                                        \
 __visible noinstr void func(struct pt_regs *regs)                      \
 {                                                                      \
@@ -252,7 +252,7 @@ __visible noinstr void func(struct pt_regs *regs)                   \
        idtentry_exit_cond_rcu(regs, rcu_exit);                         \
 }                                                                      \
                                                                        \
-static noinline void __##func(struct pt_regs *regs)
+static noinline void __##func(void *regs)

 /**
  * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT

Once I patch that up and disable FTRACE, I see the following trace occasionally, which I believe is the same issue:


[    8.506329] ------------[ cut here ]------------
[    8.506333] CFI failure (target: handle_edge_irq.cfi_jt+0x0/0x8):
[    8.506337] WARNING: CPU: 3 PID: 0 at kernel/cfi.c:29 __cfi_check_fail+0x2e/0x40
[    8.506338] Modules linked in:
[    8.506340] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.7.0-next-20200612-microsoft-standard-cfi #1
[    8.506341] RIP: 0010:__cfi_check_fail+0x2e/0x40
[    8.506342] Code: 48 c7 c7 10 45 63 a5 48 c7 c6 a1 9e 46 a5 e8 99 99 55 00 85 c0 75 02 5b c3 48 c7 c7 c5 a3 42 a5 48 89 de 31 c0 e8 c2 90 f0 ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 85 f6 74 25
[    8.506342] RSP: 0018:ffffa5bf401c4e00 EFLAGS: 00010046
[    8.506343] RAX: 0deea02a7b0d0300 RBX: ffffffffa4bf10f8 RCX: ffffffffa5627858
[    8.506343] RDX: ffff96d02bbb37c0 RSI: 0000000000000086 RDI: ffffffffa5823ddc
[    8.506344] RBP: ffff96cf9f6d8c00 R08: 0000000000000001 R09: ffff96d02bbb9800
[    8.506344] R10: 000000000000015b R11: 0000000000000003 R12: ffffa5bf401c4e58
[    8.506345] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000021
[    8.506346] FS:  0000000000000000(0000) GS:ffff96d011ac0000(0000) knlGS:0000000000000000
[    8.506347] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.506347] CR2: 00007f92731f7870 CR3: 0000000530082000 CR4: 0000000000340ea0
[    8.506349] Call Trace:
[    8.506350]  <irq>
[    8.506351]  ? handle_bad_irq.cfi_jt+0x8/0x8
[    8.506352]  ? __cfi_check+0x3f3c4/0x414e0
[    8.506354]  ? common_interrupt+0x194/0x1c0
[    8.506355]  ? asm_common_interrupt+0x1e/0x40
[    8.506357]  ? hrtimer_run_softirq+0x8/0x8
[    8.506358]  ? _raw_spin_unlock_irqrestore+0xc/0x10
[    8.506360]  ? rcu_core+0x9f/0x620
[    8.506361]  ? rebalance_domains+0x223/0x2d0
[    8.506361]  ? hrtimer_run_softirq+0x8/0x8
[    8.506362]  ? __do_softirq+0x154/0x262
[    8.506363]  ? asm_call_on_stack+0x12/0x20
[    8.506363]  </irq>
[    8.506364]  ? do_softirq_own_stack+0x52/0x80
[    8.506366]  ? __irq_exit_rcu.llvm.6063037721384417487+0xb1/0xc0
[    8.506366]  ? sysvec_hyperv_stimer0+0x6f/0x80
[    8.506367]  ? asm_sysvec_hyperv_stimer0+0x12/0x20
[    8.506369]  ? release_sock.cfi_jt+0x8/0x8
[    8.506370]  ? hv_stimer_global_cleanup.cfi_jt+0x8/0x8
[    8.506370]  ? release_sock.cfi_jt+0x8/0x8
[    8.506371]  ? default_idle+0xe/0x10
[    8.506373]  ? do_idle.llvm.15736594364671973949+0xb7/0x130
[    8.506373]  ? cpu_startup_entry+0x15/0x20
[    8.506375]  ? virtio_exit+0x8/0x8
[    8.506376]  ? start_secondary+0x166/0x170
[    8.506377]  ? secondary_startup_64+0xa4/0xb0
[    8.506377] ---[ end trace 82690587b59c62bd ]---

From what I can tell, run_on_irqstack_cond (introduced by https://git.kernel.org/tip/931b94145981e411bd2c934657649347ba8a9083) casts whatever function is passed to it to void (*__func)(void *arg) and just calls that, which I assume what causes everything to explode... I have no idea how to untangle that, hence this issue.

cc

该提问来源于开源项目:ClangBuiltLinux/linux

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

4条回答

  • weixin_39549312 weixin_39549312 5月前

    It looks like there are also other mismatching function pointers passed to run_on_irqstack_cond, so changing DEFINE_IDTENTRY_SYSVEC alone isn't sufficient. We might have to add __nocfi to the function until we figure out how to best fix this.

    点赞 评论 复制链接分享
  • weixin_39581995 weixin_39581995 5月前

    This will be a problem in mainline now: https://git.kernel.org/torvalds/c/076f14be7fc942e112c94c841baec44124275cd0

    点赞 评论 复制链接分享
  • weixin_39549312 weixin_39549312 5月前

    Thomas posted a patch that fixes the type mismatch: https://lore.kernel.org/lkml/87pn6eb5tv.fsf.tec.linutronix.de/

    点赞 评论 复制链接分享
  • weixin_39581995 weixin_39581995 5月前

    Fixed in v5.9-rc7: https://git.kernel.org/linus/a7b3474cbb2864d5500d5e4f48dd57c903975cab

    点赞 评论 复制链接分享

相关推荐