Skip to content

dendrite failed with SDE errors #168

@rmustacc

Description

@rmustacc

We were running a Sidecar loopback test and Dendrite came to rest. A restart did not do anything. Based on log information, it seems to have failed somewhere deep in the SDE:

00:11:18.259Z INFO dpd: pipe_mgr_drv_learn_move_idle_one:1137 dev 0 pipe 3 moved
 learn filter to idle
    module = Pipe
    unit = bf-sde
00:11:18.260Z INFO dpd: Exiting pipe_mgr_unlock_device_internal, dev 0, with sta
tus Success
    module = Pipe
    unit = bf-sde
00:11:18.260Z INFO dpd: Entering pipe_mgr_config_complete, dev 0 
    module = Pipe
    unit = bf-sde
00:11:18.300Z DEBG dpd: LLD: FAULT: DMA error: dev_id=0, d0=00023199d600014f, d1
=000001000000000e
    module = Lld
    unit = bf-sde
00:11:18.300Z DEBG dpd: FAULT: 3 : 0000000000000000 : 00023199d600014f : 0000010
00000000e
    module = Lld
    unit = bf-sde
00:11:18.308Z ERRO dpd: Unhandled FIFO 41/opType 0 MsgId 0x10100279f0 at pipe_mg
r_drv_completion_cb:3909.
    module = Pipe
    unit = bf-sde
00:11:18.308Z ERRO dpd: ASSERTION FAILED: "0" (0) from pipe_mgr_drv_completion_cb:3910
    module = Sys
    unit = bf-sde
00:11:18.613Z ERRO dpd: pipe_mgr_drv_wr_blk_cmplt_all:4153 No progress in processing DR pipe write blk for dev: 0
    module = Pipe
    unit = bf-sde
00:11:18.614Z DEBG dpd: LLD: FAULT: DMA error: dev_id=0, d0=000220a4b2000147, d1=0000020000000176
    module = Lld
    unit = bf-sde
00:11:18.614Z DEBG dpd: FAULT: 3 : 0000000000000000 : 000220a4b2000147 : 0000020000000176
    module = Lld
    unit = bf-sde
00:11:18.614Z ERRO dpd: Instruction list DMA completion on dev 0 subdev 0 IL 0 has error 2, msgId 0x0000020000000176
    module = Pipe
    unit = bf-sde
00:11:18.614Z ERRO dpd: Dev 0 Subdev 0 IList DMA 16384 bytes to phyPipeMask f w/ MsgId 0x20000000176
    module = Pipe
    unit = bf-sde
00:11:18.617Z ERRO dpd: 
    5FFFFFFE 00000260 SetStage Stage 24
    10011C64 FFFFFFFF Stg 24 WriteReg Addr 0x4c11c64 data 0xffffffff 
    10011C68 FFFFFFFF Stg 24 WriteReg Addr 0x4c11c68 data 0xffffffff 
    10011C6C FFFFFFFF Stg 24 WriteReg Addr 0x4c11c6c data 0xffffffff 
    10011C70 FFBFFFFE Stg 24 WriteReg Addr 0x4c11c70 data 0xffbffffe 
    10011C74 FFFFFDFE Stg 24 WriteReg Addr 0x4c11c74 data 0xfffffdfe 
    10011C78 FFFFFFFF Stg 24 WriteReg Addr 0x4c11c78 data 0xffffffff 
    10011C7C FFFFFFFF Stg 24 WriteReg Addr 0x4c11c7c data 0xffffffff 
...
    1003EE30 00000100 Stg 26 WriteReg Addr 0x4d3ee30 data 0x00000100 
    1003EE34 00000100 Stg 26 WriteReg Addr 0x4d3ee34 data 0x00000100 
    1003EE38 00000100 Stg 26 WriteReg Addr 0x4d3ee38 data 0x00000100 
    5FFFFFFE 00000200 SetStage Stage 0
    module = Pipe
    unit = bf-sde
00:11:18.618Z ERRO dpd: Dev 0 subdev 0 End IList DMA decode of MsgId 0x20000000177
    module = Pipe
    unit = bf-sde
00:11:18.618Z ERRO dpd: Unhandled FIFO 36/opType 0 MsgId 0x0 at pipe_mgr_drv_completion_cb:3909.
    module = Pipe
    unit = bf-sde
00:11:18.618Z ERRO dpd: ASSERTION FAILED: "0" (0) from pipe_mgr_drv_completion_cb:3910
    module = Sys
    unit = bf-sde
00:11:18.618Z ERRO dpd: Unhandled FIFO 36/opType 0 MsgId 0xf9bf6a0000 at pipe_mgr_drv_completion_cb:3909.
    module = Pipe
    unit = bf-sde
00:11:18.618Z ERRO dpd: ASSERTION FAILED: "0" (0) from pipe_mgr_drv_completion_cb:3910
    module = Sys
    unit = bf-sde

It seems things came to rest at this point. There are probably two different things that should be going on here:

  • If we're blowing SDE assertions, we probably shouldn't remain up
  • It's not clear what this assertion represents, how it failed, and how we should recover

Cores and logs are available by bug-id.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions