+ -
当前位置:首页 → 问答吧 → CPU#0 stuck for 61s work_struct

CPU#0 stuck for 61s work_struct

时间:2010-08-10

来源:互联网

我在调android sd卡做rootfs时碰到一个问题,内核2.6.34 ARM,卡驱动从之前正常的2.6.26移植过来的
在card queue thread中 时间超过61s,提示下面错误,网上说提示该错误时是某地方有死循环,确实在card_queue_thread中61s未sleep

[   67.650000] BUG: soft lockup - CPU#0 stuck for 61s! [kcardd:117]
[   67.650000] Modules linked in:
[   67.650000]
[   67.650000] Pid: 117, comm:               kcardd
[   67.650000] CPU: 0    Not tainted  (2.6.34 #36)
[   67.650000] PC is at card_queue_thread+0xd4/0x150
[   67.650000] LR is at 0xa881
[   67.650000] pc : [<c0254650>]    lr : [<0000a881>]    psr: 20000013
[   67.650000] sp : cfe31fb8  ip : cfd6c6c4  fp : cfe31ff4
[   67.650000] r10: cfd75a40  r9 : 00000000  r8 : 00000001
[   67.650000] r7 : cfe31fbc  r6 : c0517b9c  r5 : 00000000  r4 : cfd5bf44
[   67.650000] r3 : 00000000  r2 : 00000000  r1 : 00000003  r0 : cfd1cf00
[   67.650000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   67.650000] Control: 10c53c7d  Table: 8fe6c059  DAC: 00000017
。。。

[   67.650000] [<c0027608>] (show_regs+0x0/0x50) from [<c0070058>] (softlockup_tick+0xf8/0x150)
[   67.650000]  r4:cfe31f70 r3:c045e618
[   67.650000] [<c006ff60>] (softlockup_tick+0x0/0x150) from [<c004c36c>] (run_local_timers+0x1c/0x20)
[   67.650000] [<c004c350>] (run_local_timers+0x0/0x20) from [<c004c3a0>] (update_process_times+0x30/0x54)
[   67.650000] [<c004c370>] (update_process_times+0x0/0x54) from [<c0063a58>] (T.214+0x40/0xc4)
[   67.650000]  r5:00000000 r4:c0480090
[   67.650000] [<c0063a18>] (T.214+0x0/0xc4) from [<c0063af4>] (tick_handle_periodic+0x18/0x10
[   67.650000]  r9:00000000 r8:00000001 r7:0000000a r6:00000000 r4:c043d5c0
[   67.650000] r3:c0063adc
[   67.650000] [<c0063adc>] (tick_handle_periodic+0x0/0x10 from [<c0030050>] (my_timer_interrupt+0x54/0x5c)
[   67.650000] [<c002fffc>] (my_timer_interrupt+0x0/0x5c) from [<c0070cb4>] (handle_IRQ_event+0x58/0x120)
[   67.650000] [<c0070c5c>] (handle_IRQ_event+0x0/0x120) from [<c0072a14>] (handle_level_irq+0x7c/0x10c)
[   67.650000]  r7:cfe31fbc r6:00000000 r5:0000000a r4:c0440b84
[   67.650000] [<c0072998>] (handle_level_irq+0x0/0x10c) from [<c0025048>] (asm_do_IRQ+0x48/0x94)
[   67.650000]  r5:0000000a r4:c044f5c8
[   67.650000] [<c0025000>] (asm_do_IRQ+0x0/0x94) from [<c0025b70>] (__irq_svc+0x30/0xc0)
[   67.650000] Exception stack(0xcfe31f70 to 0xcfe31fb
[   67.650000] 1f60:                                     cfd1cf00 00000003 00000000 00000000
[   67.650000] 1f80: cfd5bf44 00000000 c0517b9c cfe31fbc 00000001 00000000 cfd75a40 cfe31ff4
[   67.650000] 1fa0: cfd6c6c4 cfe31fb8 0000a881 c0254650 20000013 ffffffff
[   67.650000]  r6:00000001 r5:f1109a40 r4:ffffffff r3:20000013
[   67.650000] [<c025457c>] (card_queue_thread+0x0/0x150) from [<c00442b8>] (do_exit+0x0/0x69c)

开机时偶尔会提示上面的错误,一旦出错  queue_flags就一直为QUEUE_FLAG_PLUGGED,
跟到blk-core.c时执行了blk_unplug_timeout但是并未执行work_struct的函数blk_unplug_work

void blk_unplug_timeout(unsigned long data)
{
        struct request_queue *q = (struct request_queue *)data;

        trace_block_unplug_timer(q);
-        kblockd_schedule_work(q, &q->unplug_work);
+        blk_unplug_work(&q->unplug_work);
}

如果把kblockd_schedule_work(q, &q->unplug_work);改成直接执行unplug_work的函数blk_unplug_work,就正常了

请教大家,我这个可能的问题在哪里?

还有一个不明白的,原本q->unplug_work.data应该是0,执行第2次blk_unplug_timeout时就变成了0xcfc51f80,我认为data只存放下面这4个值?可能我理解有错
#define WORK_STRUCT_PENDING 0                /* T if work item pending execution */
#define WORK_STRUCT_STATIC  1                /* static initializer (debugobjects) */
#define WORK_STRUCT_FLAG_MASK (3UL)
#define WORK_STRUCT_WQ_DATA_MASK (~WORK_STRUCT_FLAG_MASK)

如果理解没错,那么是被什么冲掉了?
谢谢大家指教

作者: fei1700   发布时间: 2010-08-10

我先纠正一下我自己理解的错误
work_struct data是struct cpu_workqueue_struct *和WORK_STRUCT_FLAG_MASK的组合

/*
* Set the workqueue on which a work item is to be run
* - Must *only* be called if the pending flag is set
*/
static inline void set_wq_data(struct work_struct *work,
                                struct cpu_workqueue_struct *cwq)
{
        unsigned long new;

        BUG_ON(!work_pending(work));

        new = (unsigned long) cwq | (1UL << WORK_STRUCT_PENDING);
        new |= WORK_STRUCT_FLAG_MASK & *work_data_bits(work);
        atomic_long_set(&work->data, new);
}

作者: fei1700   发布时间: 2010-08-10