[Deepin-Kernel-SIG] [linux 6.6-y] [HISI] Sync sdei_watchdog code with openEuler OLK-6.6#1474
[Deepin-Kernel-SIG] [linux 6.6-y] [HISI] Sync sdei_watchdog code with openEuler OLK-6.6#1474opsiff wants to merge 4 commits intodeepin-community:linux-6.6.yfrom
Conversation
…kup_check is performed ascend inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MZ9I CVE: N/A --------------------------------------- In function sdei_watchdog_callback, the last_check_time is used to determine whether to call watchdog_hardlockup_check. But it may be updated even when watchdog_hardlockup_check is not called. Fixes: 0fa83fd0f8f7 ("sdei_watchdog: avoid possible false hardlockup") Signed-off-by: Liu Wei <lw8186@163.com> (cherry picked from commit 6c3cc702b226f258538700a05ac517f8784823bb) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA ------------------------------------------------- /proc/sys/kernel/nmi_watchdog interface is designed to close nmi watchdog, but currently, it's not working when in low power scenario, since pm callback will enable/disable event when power state changes. This commit add a percpu flag to be compatible with pm callback. Fixes: f022c4cac9c1 ("sdei_watchdog: use lockup_detector_retry_init() to init sdei watchdog") Signed-off-by: Bowen You <youbowen2@huawei.com> (cherry picked from commit 1155a8fb568e11dcfb23fda811127b4baacbae42) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…otifier hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LQCC CVE: NA ------------------------------------------------- The `CPU_PM_ENTER_FAILED` case was added to align with the handling in `sdei_pm_notifier`. This ensures proper masking/unmasking of events for the CPU when entering idle fails, maintaining consistency across CPU power management actions. Fixes: c8f96adca7fa ("arm64/watchdog: fix watchdog failure in low power scenarios") Signed-off-by: Bowen You <youbowen2@huawei.com> (cherry picked from commit f2f9b6cb6466eb51c2d654981c767bc5d719b001) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8325 -------------------------------- Add sdei event status check in sdei pm notifier, If event is already unregistered, don't do disable/enable in pm notifier. In this way, we can prevent incorrect state transitions, e.g., disabling or enabling after unregistering. Fixes: c8f96adca7fa ("arm64/watchdog: fix watchdog failure in low power scenarios") Signed-off-by: Bowen You <youbowen2@huawei.com> (cherry picked from commit 4f1e47d2241d253a802be3e0f5288626a590bec5) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewer's GuideSyncs the arm64 SDEI-based NMI watchdog implementation with openEuler OLK-6.6 by tracking per-CPU user enable state, tightening when last_check_time is updated, and making the CPU PM notifier robust against incorrect SDEI event state transitions and low-power idle cycles. Sequence diagram for CPU power management notifications and SDEI watchdog statesequenceDiagram
participant CPU_PM
participant sdei_watchdog_pm_notifier
participant SDEI_API
CPU_PM->>sdei_watchdog_pm_notifier: notify(action, data)
sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_status(sdei_watchdog_event_num, result)
SDEI_API-->>sdei_watchdog_pm_notifier: rv, result
alt rv != 0
sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_ERR (notifier_from_errno)
else result == 0
sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_OK
else result != 0
alt action == CPU_PM_ENTER
sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: check per_cpu(sdei_usr_en)
alt sdei_usr_en == true
sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_disable(sdei_watchdog_event_num)
SDEI_API-->>sdei_watchdog_pm_notifier: rv
else sdei_usr_en == false
sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: skip disable
end
else action == CPU_PM_EXIT or action == CPU_PM_ENTER_FAILED
sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: check per_cpu(sdei_usr_en)
alt sdei_usr_en == true
sdei_watchdog_pm_notifier->>SDEI_API: sdei_api_event_enable(sdei_watchdog_event_num)
SDEI_API-->>sdei_watchdog_pm_notifier: rv
else sdei_usr_en == false
sdei_watchdog_pm_notifier->>sdei_watchdog_pm_notifier: skip enable
end
else other action
sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_DONE
end
alt rv != 0
sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_ERR (notifier_from_errno)
else rv == 0
sdei_watchdog_pm_notifier-->>CPU_PM: NOTIFY_OK
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
sdei_watchdog_pm_notifier, you callsdei_api_event_statusand may returnNOTIFY_OKeven for unrelatedactionvalues, whereas the previous implementation returnedNOTIFY_DONEin those cases; consider moving the status check inside theCPU_PM_*cases or early-returningNOTIFY_DONEfor unsupported actions before touching the event state. - Since preemption is already disabled in
sdei_watchdog_pm_notifier, you can simplify and optimize the per-CPU user enable checks by using__this_cpu_read(sdei_usr_en)instead ofper_cpu(sdei_usr_en, smp_processor_id()).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `sdei_watchdog_pm_notifier`, you call `sdei_api_event_status` and may return `NOTIFY_OK` even for unrelated `action` values, whereas the previous implementation returned `NOTIFY_DONE` in those cases; consider moving the status check inside the `CPU_PM_*` cases or early-returning `NOTIFY_DONE` for unsupported actions before touching the event state.
- Since preemption is already disabled in `sdei_watchdog_pm_notifier`, you can simplify and optimize the per-CPU user enable checks by using `__this_cpu_read(sdei_usr_en)` instead of `per_cpu(sdei_usr_en, smp_processor_id())`.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This pull request syncs the SDEI watchdog code with openEuler OLK-6.6 to improve hardlockup detection reliability during CPU power management transitions.
Changes:
- Adds per-CPU
sdei_usr_enstate tracking to coordinate watchdog behavior across enable/disable and power management operations - Guards SDEI event enable/disable operations during CPU PM transitions with event status checks and user-enable state verification
- Moves watchdog timestamp update to after the hardlockup check completes to maintain accurate monitoring intervals
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| __this_cpu_write(sdei_usr_en, 0); | ||
| if (ret) | ||
| pr_err("Disable NMI Watchdog failed on cpu%d\n", | ||
| smp_processor_id()); |
There was a problem hiding this comment.
The per-CPU variable sdei_usr_en is cleared immediately after calling sdei_api_event_disable, but before checking if the disable operation succeeded. If the disable operation fails, the state becomes inconsistent: the watchdog event remains enabled but sdei_usr_en is cleared. This could lead to the PM notifier not properly managing the event during CPU power transitions. Consider moving the __this_cpu_write(sdei_usr_en, 0) statement after the error check, similar to how sdei_watchdog_hardlockup_enable sets it after checking for errors.
| __this_cpu_write(sdei_usr_en, 0); | |
| if (ret) | |
| pr_err("Disable NMI Watchdog failed on cpu%d\n", | |
| smp_processor_id()); | |
| if (ret) | |
| pr_err("Disable NMI Watchdog failed on cpu%d\n", | |
| smp_processor_id()); | |
| else | |
| __this_cpu_write(sdei_usr_en, 0); |
| return NOTIFY_DONE; | ||
| } | ||
|
|
||
| error: |
There was a problem hiding this comment.
The error label at line 152 is misleading because control can reach it through the switch statement's normal fallthrough, not just on error. When action is CPU_PM_ENTER, CPU_PM_EXIT, or CPU_PM_ENTER_FAILED, execution falls through from the switch statement (line 150) directly to the error label (line 152), where rv is checked. A better label name would be check_error or out to reflect that this is a normal exit path, not specifically an error path.
| } | ||
| __this_cpu_write(sdei_usr_en, 1); |
There was a problem hiding this comment.
In the enable function, sdei_usr_en is set to 1 after attempting to enable the event, even if the enable operation failed. This means that if sdei_api_event_enable fails, sdei_usr_en will still be set to true, causing the PM notifier to attempt to disable/enable an event that was never successfully enabled. Consider only setting sdei_usr_en to 1 when the enable operation succeeds (i.e., ret == 0). This would mirror the proper state tracking and prevent incorrect PM transitions.
| } | |
| __this_cpu_write(sdei_usr_en, 1); | |
| } else { | |
| __this_cpu_write(sdei_usr_en, 1); | |
| } |
| if (per_cpu(sdei_usr_en, smp_processor_id())) | ||
| rv = sdei_api_event_disable(sdei_watchdog_event_num); | ||
| break; | ||
| case CPU_PM_EXIT: | ||
| rv = sdei_api_event_enable(sdei_watchdog_event_num); | ||
| case CPU_PM_ENTER_FAILED: | ||
| if (per_cpu(sdei_usr_en, smp_processor_id())) |
There was a problem hiding this comment.
The power management notifier uses per_cpu(sdei_usr_en, smp_processor_id()) to access the per-CPU variable, while the enable/disable functions use __this_cpu_write. Both should use the same accessor type for consistency. Since the PM notifier runs with preemption disabled (verified by WARN_ON_ONCE at line 119), it should use __this_cpu_read instead of per_cpu for better performance and consistency with the write operations. Alternatively, if cross-CPU access is intended, the enable/disable functions should also use per_cpu.
| * After powering on/off the LPI (Low Power Idle), | ||
| * the enable function must be called to enable the | ||
| * EL3 Secure Timer, ensuring proper handling of | ||
| * secure timer functionality. |
There was a problem hiding this comment.
The comment states "After powering on/off the LPI (Low Power Idle), the enable function must be called to enable the EL3 Secure Timer" but this comment is placed before the switch statement that handles both enable and disable operations. The comment only describes the enable case (CPU_PM_EXIT and CPU_PM_ENTER_FAILED) but not the disable case (CPU_PM_ENTER). Consider clarifying that the disable function is called before entering low power mode, and the enable function is called after exiting or on failed entry to maintain the secure timer state across power transitions.
| * After powering on/off the LPI (Low Power Idle), | |
| * the enable function must be called to enable the | |
| * EL3 Secure Timer, ensuring proper handling of | |
| * secure timer functionality. | |
| * Around Low Power Idle (LPI) transitions, the watchdog event | |
| * must be disabled before entering low power and re-enabled | |
| * after exiting low power or on a failed entry. This preserves | |
| * the EL3 Secure Timer state and ensures correct secure timer | |
| * functionality across power transitions. |
| * incorrect state transitions, e.g., disabling after | ||
| * unregistering. | ||
| */ | ||
| rv = sdei_api_event_status(sdei_watchdog_event_num, &result); |
There was a problem hiding this comment.
The function sdei_api_event_status is not defined in the codebase. While the SDEI specification defines SDEI_1_0_FN_SDEI_EVENT_STATUS in the UAPI header (include/uapi/linux/arm_sdei.h), there is no corresponding sdei_api_event_status wrapper function implemented in drivers/firmware/arm_sdei.c. This will cause a compilation error due to an undefined reference. You need to implement this function in the SDEI driver similar to other API wrapper functions like sdei_api_event_enable and sdei_api_event_disable.
Summary by Sourcery
Guard SDEI watchdog power-management transitions with event status checks and per-CPU user enable state tracking.
Bug Fixes:
Enhancements: