Skip to content

Conversation

@abdelaziz-mahdy
Copy link

Summary

This PR fixes a critical null pointer dereference crash (SIGSEGV) in the XNNPACK weights cache implementation that occurs when look_up_or_insert attempts to compare cached data with null pointers.

Problem

The crash occurs in XNNWeightsCache::look_up_or_insert at line 211 (original) when memcmp is called without validating that the input pointers are non-null. This can happen in the following scenarios:

  1. When saved_ptr is null because packed data was previously deleted (set to nullptr at line 118 in delete_packed_data)
  2. When XNNPACK passes a null ptr parameter to the function
  3. When there's a race condition or cache invalidation between lookup and comparison

The crash manifests as:

signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
Cause: null pointer dereference

Solution

Added a null pointer check before calling memcmp to validate both ptr and saved_ptr. When either pointer is null, the function returns SIZE_MAX to indicate the cache entry is invalid, allowing the normal insertion path to proceed instead of crashing.

Fixes #16241

Test plan

I Was not able to test the changes due to not having enough knowledge to integrate the direct build into android build.

Release Notes

Category: Release notes: backends

Fixed critical null pointer dereference crash in XNNPACK weights cache when comparing cached data. The fix adds null pointer validation before memory comparison, preventing SIGSEGV crashes during model loading and runtime creation.

Copilot AI review requested due to automatic review settings December 14, 2025 23:59
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16242

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 11d2980 with merge base c493e2d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla
Copy link

meta-cla bot commented Dec 14, 2025

Hi @abdelaziz-mahdy!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a null pointer check in XNNWeightsCache::look_up_or_insert to prevent a SIGSEGV crash when comparing cached weight data. The fix validates that both the input pointer and the cached pointer are non-null before calling memcmp, returning SIZE_MAX (cache miss) when either is null.

Key Changes:

  • Added null pointer validation for ptr and saved_ptr parameters before memory comparison
  • Returns SIZE_MAX when null pointers are detected, allowing normal cache insertion to proceed

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 212 to 215
if (ptr == nullptr || saved_ptr == nullptr) {
// If either pointer is null, cache is invalid
return SIZE_MAX;
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null pointer check for saved_ptr is a defensive fix but may indicate a data consistency issue. When saved_ptr is null, it means delete_packed_data set packed_data_ptrs_[offset] = nullptr (line 118), but the corresponding entry in name_to_packed_data_metadata_ still exists (it's only removed when ref_count reaches 0). This allows look_up to return a valid offset for deleted data.

Returning SIZE_MAX here triggers cache reinsertion, which could lead to accumulating stale entries in name_to_packed_data_metadata_ over multiple delete/recreate cycles. Consider whether look_up should validate that the pointer is non-null before returning the offset, or if the metadata cleanup in delete_packed_data should be more comprehensive.

Copilot uses AI. Check for mistakes.
Comment on lines 211 to 215
// Check for null pointers before calling memcmp
if (ptr == nullptr || saved_ptr == nullptr) {
// If either pointer is null, cache is invalid
return SIZE_MAX;
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix adds null pointer validation for a crash scenario described in the PR, but there are no test cases covering this scenario. Consider adding a test that validates the behavior when either ptr or saved_ptr is null. For example, a test could simulate the case where delete_packed_data is called and then look_up_or_insert is called with the same cache key, verifying that it returns SIZE_MAX without crashing.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 15, 2025 00:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +212 to +216
if (ptr == nullptr || saved_ptr == nullptr) {
// If either pointer is null (due to deleted data or allocation failure), treat as cache miss
// and return SIZE_MAX to trigger reinsertion.
return SIZE_MAX;
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null pointer check prevents an immediate crash, but introduces a potential issue when ptr is null. If ptr is null and SIZE_MAX is returned, the function continues to the insertion path (lines 224-252) where ptr will be stored in packed_data_ptrs_ at line 250. This means a null pointer gets cached, which could cause the same crash later when another call to look_up_or_insert retrieves it via offset_to_addr.

Consider handling the two null scenarios differently:

  • If saved_ptr is null (cache corruption/deletion), returning SIZE_MAX to re-insert is correct
  • If ptr is null (XNNPACK packing failure), the function should either return an error/special value that doesn't trigger insertion, or validate ptr is non-null before insertion at line 250

A safer approach would be to check if ptr is null before attempting insertion and handle it as an error case, while only checking saved_ptr for null at line 212.

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +216
// Check for null pointers before calling memcmp
if (ptr == nullptr || saved_ptr == nullptr) {
// If either pointer is null (due to deleted data or allocation failure), treat as cache miss
// and return SIZE_MAX to trigger reinsertion.
return SIZE_MAX;
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses a critical crash scenario but lacks test coverage. Consider adding a test case that verifies the null pointer handling, particularly:

  1. A test that simulates the scenario where delete_packed_data sets a pointer to null, then look_up_or_insert is called with the same cache key
  2. Verification that the function returns SIZE_MAX and doesn't crash when encountering null pointers
  3. Validation that the cache remains in a consistent state after handling null pointers

This would help prevent regressions and validate the fix works as intended for the crash scenario described in issue #16241.

Copilot uses AI. Check for mistakes.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 15, 2025
@SS-JIA SS-JIA requested a review from GregoryComer December 15, 2025 14:54
@SS-JIA
Copy link
Contributor

SS-JIA commented Dec 15, 2025

cc: @GregoryComer since this is related to the XNNPACK backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

crash while using the android xnn runner

2 participants