在wifi驱动中使用到了 内核sk_buff为元素的一个双向链表msdu_list, 在使用过程中
static inline void
ol_rx_mpdu_list_next(
struct ol_txrx_pdev_t *pdev,
void *mpdu_list,
adf_nbuf_t *mpdu_tail,
adf_nbuf_t *next_mpdu)
{
htt_pdev_handle htt_pdev = pdev->htt_pdev;
adf_nbuf_t msdu;
/*
* For now, we use a simply flat list of MSDUs.
* So, traverse the list until we reach the last MSDU within the MPDU.
*/
TXRX_ASSERT2(mpdu_list);
msdu = mpdu_list;
while (!htt_rx_msdu_desc_completes_mpdu(
htt_pdev, htt_rx_msdu_desc_retrieve(htt_pdev, msdu)))
{
msdu = adf_nbuf_next(msdu);
TXRX_ASSERT2(msdu);
}
/* msdu now points to the last MSDU within the first MPDU */
*mpdu_tail = msdu;
*next_mpdu = adf_nbuf_next(msdu);
}
msdu = adf_nbuf_next(msdu) = msdu->next; //不断的取下一个,出现问题是:突然取得一个next被污染了,变成NULL或这是 野指针。
出错在htt_rx_msdu_desc_retrieve(htt_pdev, msdu)这一句,读取msdu的data指针。
<4>[350481.729138] Error: get encrypted from a not-first msdu.
<4>[350481.729165] Error: get pn from a not-first msdu.
<4>[350481.729189] Error: get encrypted from a not-first msdu.
<4>[350481.729197] Error: get pn from a not-first msdu.
<4>[350481.729206] Error: get encrypted from a not-first msdu.
<4>[350481.729213] Error: get pn from a not-first msdu.
<1>[350481.729244] Unable to handle kernel NULL pointer dereference at virtual address 000000d8
<1>[350481.729255] pgd = ffffff8009526000
<1>[350481.729263] [000000d8] *pgd=000000007fffe003, *pud=000000007fffe003, *pmd=0000000000000000
<0>[350481.729288] Internal error: Oops: 96000005 [#1] PREEMPT SMP
<4>[350481.729299] Modules linked in: wlan(O) smsc9500(O) smscusbnet(O) cec(O) pstnaud(O) cmbs_audio(O) dsplink(O) hpdev(O) cmac(O) kpcie(O) emac_usb(O) emac(O) msgkit(O) keypad(O) kbase(O) midgard_kbase [last unloaded: wlan]
<4>[350481.729388] CPU: 1 PID: 5363 Comm: AR6K RxCompleti Tainted: G O 4.4.167 #116
<4>[350481.729397] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT)
<4>[350481.729406] task: ffffffc045ca1b80 task.stack: ffffffc03849c000
<4>[350481.732523] PC is at htt_rx_msdu_desc_retrieve_hl+0x0/0x8 [wlan]
<4>[350481.735600] LR is at ol_rx_pn_check_base+0x194/0x498 [wlan]
<4>[350481.735619] pc : [<ffffff800106acc4>] lr : [<ffffff800103bff4>] pstate: 404001c5
<4>[350481.735627] sp : ffffffc03849f920
<4>[350481.735635] x29: ffffffc03849f970 x28: ffffff80012c92c8
<4>[350481.735651] x27: 0000000000000000 x26: ffffffc03e5be230
<4>[350481.735666] x25: ffffffc03a4e0000 x24: 0000000000000000
<4>[350481.735680] x23: ffffffc0747fe900 x22: ffffffc057522500
<4>[350481.735695] x21: ffffffc0575c0000 x20: ffffffc068aea200
<4>[350481.735709] x19: ffffffc03e5d5000 x18: ffffff80892d2ba7
<4>[350481.735724] x17: 0000000000000000 x16: ffffff80082137e8
<4>[350481.735738] x15: 0000000000000000 x14: 00000000000b207d
<4>[350481.735752] x13: 000000000000000a x12: 0000000000000030
<4>[350481.735766] x11: 00000000fffffffe x10: ffffff80092d2bb0
<4>[350481.735781] x9 : 0000000005f5e0ff x8 : 0000000000000000
<4>[350481.735796] x7 : 0000000000000000 x6 : ffffffc03cee4500
<4>[350481.735809] x5 : ffffff80012c9338 x4 : ffffff800106acc4
<4>[350481.735824] x3 : ffffff800106ac00 x2 : 0000000000000004
<4>[350481.735839] x1 : 0000000000000000 x0 : ffffffc067366c00
1、有没有什么办法通过crash堆栈找到是什么污染了?
2、如何规避这样的问题?不能直接死机。最好可以在应用层不感知的情况下?空指针比较好判断,但是野指针有点头大。