Skip to content

[Deepin-Kernel-SIG] [linux 6.6-y] [HISI] Sync sdei_watchdog code with openEuler OLK-6.6#1474

Open
opsiff wants to merge 4 commits intodeepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-02-03-fix
Open

[Deepin-Kernel-SIG] [linux 6.6-y] [HISI] Sync sdei_watchdog code with openEuler OLK-6.6#1474
opsiff wants to merge 4 commits intodeepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-02-03-fix

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Feb 3, 2026

Summary by Sourcery

Guard SDEI watchdog power-management transitions with event status checks and per-CPU user enable state tracking.

Bug Fixes:

  • Prevent invalid SDEI watchdog event enable/disable transitions during CPU power management by checking event status before acting.
  • Avoid disabling or re-enabling the SDEI watchdog around CPU low-power states when it was not explicitly enabled by the user.
  • Ensure watchdog timestamps are only updated after successful hard lockup checks to maintain correct monitoring intervals across events.

Enhancements:

  • Track per-CPU SDEI watchdog user enablement state to coordinate behavior across hardlockup control and power-management notifiers.

lw8186 and others added 4 commits February 3, 2026 16:08
…kup_check is performed

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I8MZ9I
CVE: N/A

---------------------------------------

In function sdei_watchdog_callback, the last_check_time is used to
determine whether to call watchdog_hardlockup_check. But it may be
updated even when watchdog_hardlockup_check is not called.

Fixes: 0fa83fd0f8f7 ("sdei_watchdog: avoid possible false hardlockup")

Signed-off-by: Liu Wei <lw8186@163.com>
(cherry picked from commit 6c3cc702b226f258538700a05ac517f8784823bb)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC
CVE: NA

-------------------------------------------------

/proc/sys/kernel/nmi_watchdog interface is designed to close nmi
watchdog, but currently, it's not working when in low power scenario,
since pm callback will enable/disable event when power state changes.

This commit add a percpu flag to be compatible with pm callback.

Fixes: f022c4cac9c1 ("sdei_watchdog: use lockup_detector_retry_init() to init sdei watchdog")
Signed-off-by: Bowen You <youbowen2@huawei.com>
(cherry picked from commit 1155a8fb568e11dcfb23fda811127b4baacbae42)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…otifier

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC
CVE: NA

-------------------------------------------------

The `CPU_PM_ENTER_FAILED` case was added to align with the handling in
`sdei_pm_notifier`. This ensures proper masking/unmasking of events for
the CPU when entering idle fails, maintaining consistency across CPU power
 management actions.

Fixes: c8f96adca7fa ("arm64/watchdog: fix watchdog failure in low power scenarios")
Signed-off-by: Bowen You <youbowen2@huawei.com>
(cherry picked from commit f2f9b6cb6466eb51c2d654981c767bc5d719b001)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
hulk inclusion
category: bugfix
bugzilla: https://atomgit.com/openeuler/kernel/issues/8325

--------------------------------

Add sdei event status check in sdei pm notifier, If event is already
unregistered, don't do disable/enable in pm notifier. In this way, we
can prevent incorrect state transitions, e.g., disabling or enabling
after unregistering.

Fixes: c8f96adca7fa ("arm64/watchdog: fix watchdog failure in low power scenarios")
Signed-off-by: Bowen You <youbowen2@huawei.com>
(cherry picked from commit 4f1e47d2241d253a802be3e0f5288626a590bec5)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
@sourcery-ai
Copy link

sourcery-ai bot commented Feb 3, 2026

Reviewer's Guide

Syncs the arm64 SDEI-based NMI watchdog implementation with openEuler OLK-6.6 by tracking per-CPU user enable state, tightening when last_check_time is updated, and making the CPU PM notifier robust against incorrect SDEI event state transitions and low-power idle cycles.

Sequence diagram for CPU power management notifications and SDEI watchdog state

sequenceDiagram
  participant CPU_PM
  participant sdei_watchdog_pm_notifier
  participant SDEI_API

  CPU_PM->>sdei_watchdog_pm_notifier: notify(action, data)
  sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_status(sdei_watchdog_event_num, result)
  SDEI_API-->>sdei_watchdog_pm_notifier: rv, result

  alt rv != 0
    sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_ERR (notifier_from_errno)
  else result == 0
    sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_OK
  else result != 0
    alt action == CPU_PM_ENTER
      sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: check per_cpu(sdei_usr_en)
      alt sdei_usr_en == true
        sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_disable(sdei_watchdog_event_num)
        SDEI_API-->>sdei_watchdog_pm_notifier: rv
      else sdei_usr_en == false
        sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: skip disable
      end
    else action == CPU_PM_EXIT or action == CPU_PM_ENTER_FAILED
      sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: check per_cpu(sdei_usr_en)
      alt sdei_usr_en == true
        sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_enable(sdei_watchdog_event_num)
        SDEI_API-->>sdei_watchdog_pm_notifier: rv
      else sdei_usr_en == false
        sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: skip enable
      end
    else other action
      sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_DONE
    end

    alt rv != 0
      sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_ERR (notifier_from_errno)
    else rv == 0
      sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_OK
    end
  end
Loading

File-Level Changes

Change Details Files
Track and use per-CPU user enable state for the SDEI NMI watchdog.
  • Introduce a new per-CPU boolean flag to record whether the watchdog was explicitly enabled on each CPU.
  • Set the per-CPU flag when the watchdog hardlockup is enabled and clear it when disabled.
  • Consult the per-CPU flag in power management paths to decide whether to disable or re-enable the SDEI event.
arch/arm64/kernel/watchdog_sdei.c
Adjust watchdog callback to update last_check_time only after a successful hardlockup check decision.
  • Move the last_check_time update from the top of the callback to after watchdog_hardlockup_check, so the timestamp reflects a completed check rather than mere callback entry.
arch/arm64/kernel/watchdog_sdei.c
Harden SDEI watchdog power-management notifier against invalid state transitions and handle low power idle cycles correctly.
  • Initialize the notifier return code and add a local variable to hold SDEI event status.
  • Query the SDEI event status before attempting enable/disable to avoid operating on an unregistered or inactive event, and return early if the event is not active.
  • Guard event enable/disable calls in CPU PM transitions with the per-CPU user enable flag to avoid changing state for CPUs where the watchdog is logically disabled.
  • Allow both CPU_PM_EXIT and CPU_PM_ENTER_FAILED to re-enable the event when appropriate, ensuring the EL3 secure timer is re-armed after LPI power transitions.
  • Standardize notifier exit paths to return NOTIFY_OK on success and notifier_from_errno on error.
arch/arm64/kernel/watchdog_sdei.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In sdei_watchdog_pm_notifier, you call sdei_api_event_status and may return NOTIFY_OK even for unrelated action values, whereas the previous implementation returned NOTIFY_DONE in those cases; consider moving the status check inside the CPU_PM_* cases or early-returning NOTIFY_DONE for unsupported actions before touching the event state.
  • Since preemption is already disabled in sdei_watchdog_pm_notifier, you can simplify and optimize the per-CPU user enable checks by using __this_cpu_read(sdei_usr_en) instead of per_cpu(sdei_usr_en, smp_processor_id()).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `sdei_watchdog_pm_notifier`, you call `sdei_api_event_status` and may return `NOTIFY_OK` even for unrelated `action` values, whereas the previous implementation returned `NOTIFY_DONE` in those cases; consider moving the status check inside the `CPU_PM_*` cases or early-returning `NOTIFY_DONE` for unsupported actions before touching the event state.
- Since preemption is already disabled in `sdei_watchdog_pm_notifier`, you can simplify and optimize the per-CPU user enable checks by using `__this_cpu_read(sdei_usr_en)` instead of `per_cpu(sdei_usr_en, smp_processor_id())`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request syncs the SDEI watchdog code with openEuler OLK-6.6 to improve hardlockup detection reliability during CPU power management transitions.

Changes:

  • Adds per-CPU sdei_usr_en state tracking to coordinate watchdog behavior across enable/disable and power management operations
  • Guards SDEI event enable/disable operations during CPU PM transitions with event status checks and user-enable state verification
  • Moves watchdog timestamp update to after the hardlockup check completes to maintain accurate monitoring intervals

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to 62
__this_cpu_write(sdei_usr_en, 0);
if (ret)
pr_err("Disable NMI Watchdog failed on cpu%d\n",
smp_processor_id());
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-CPU variable sdei_usr_en is cleared immediately after calling sdei_api_event_disable, but before checking if the disable operation succeeded. If the disable operation fails, the state becomes inconsistent: the watchdog event remains enabled but sdei_usr_en is cleared. This could lead to the PM notifier not properly managing the event during CPU power transitions. Consider moving the __this_cpu_write(sdei_usr_en, 0) statement after the error check, similar to how sdei_watchdog_hardlockup_enable sets it after checking for errors.

Suggested change
__this_cpu_write(sdei_usr_en, 0);
if (ret)
pr_err("Disable NMI Watchdog failed on cpu%d\n",
smp_processor_id());
if (ret)
pr_err("Disable NMI Watchdog failed on cpu%d\n",
smp_processor_id());
else
__this_cpu_write(sdei_usr_en, 0);

Copilot uses AI. Check for mistakes.
return NOTIFY_DONE;
}

error:
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error label at line 152 is misleading because control can reach it through the switch statement's normal fallthrough, not just on error. When action is CPU_PM_ENTER, CPU_PM_EXIT, or CPU_PM_ENTER_FAILED, execution falls through from the switch statement (line 150) directly to the error label (line 152), where rv is checked. A better label name would be check_error or out to reflect that this is a normal exit path, not specifically an error path.

Copilot uses AI. Check for mistakes.
Comment on lines 47 to +48
}
__this_cpu_write(sdei_usr_en, 1);
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the enable function, sdei_usr_en is set to 1 after attempting to enable the event, even if the enable operation failed. This means that if sdei_api_event_enable fails, sdei_usr_en will still be set to true, causing the PM notifier to attempt to disable/enable an event that was never successfully enabled. Consider only setting sdei_usr_en to 1 when the enable operation succeeds (i.e., ret == 0). This would mirror the proper state tracking and prevent incorrect PM transitions.

Suggested change
}
__this_cpu_write(sdei_usr_en, 1);
} else {
__this_cpu_write(sdei_usr_en, 1);
}

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +145
if (per_cpu(sdei_usr_en, smp_processor_id()))
rv = sdei_api_event_disable(sdei_watchdog_event_num);
break;
case CPU_PM_EXIT:
rv = sdei_api_event_enable(sdei_watchdog_event_num);
case CPU_PM_ENTER_FAILED:
if (per_cpu(sdei_usr_en, smp_processor_id()))
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The power management notifier uses per_cpu(sdei_usr_en, smp_processor_id()) to access the per-CPU variable, while the enable/disable functions use __this_cpu_write. Both should use the same accessor type for consistency. Since the PM notifier runs with preemption disabled (verified by WARN_ON_ONCE at line 119), it should use __this_cpu_read instead of per_cpu for better performance and consistency with the write operations. Alternatively, if cross-CPU access is intended, the enable/disable functions should also use per_cpu.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +136
* After powering on/off the LPI (Low Power Idle),
* the enable function must be called to enable the
* EL3 Secure Timer, ensuring proper handling of
* secure timer functionality.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "After powering on/off the LPI (Low Power Idle), the enable function must be called to enable the EL3 Secure Timer" but this comment is placed before the switch statement that handles both enable and disable operations. The comment only describes the enable case (CPU_PM_EXIT and CPU_PM_ENTER_FAILED) but not the disable case (CPU_PM_ENTER). Consider clarifying that the disable function is called before entering low power mode, and the enable function is called after exiting or on failed entry to maintain the secure timer state across power transitions.

Suggested change
* After powering on/off the LPI (Low Power Idle),
* the enable function must be called to enable the
* EL3 Secure Timer, ensuring proper handling of
* secure timer functionality.
* Around Low Power Idle (LPI) transitions, the watchdog event
* must be disabled before entering low power and re-enabled
* after exiting low power or on a failed entry. This preserves
* the EL3 Secure Timer state and ensures correct secure timer
* functionality across power transitions.

Copilot uses AI. Check for mistakes.
* incorrect state transitions, e.g., disabling after
* unregistering.
*/
rv = sdei_api_event_status(sdei_watchdog_event_num, &result);
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function sdei_api_event_status is not defined in the codebase. While the SDEI specification defines SDEI_1_0_FN_SDEI_EVENT_STATUS in the UAPI header (include/uapi/linux/arm_sdei.h), there is no corresponding sdei_api_event_status wrapper function implemented in drivers/firmware/arm_sdei.c. This will cause a compilation error due to an undefined reference. You need to implement this function in the SDEI driver similar to other API wrapper functions like sdei_api_event_enable and sdei_api_event_disable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants