Skip to content

Add new size field to H5D_compact_storage_t to address buffer overflows#5950

Draft
glennsong09 wants to merge 8 commits intoHDFGroup:developfrom
glennsong09:issue5892
Draft

Add new size field to H5D_compact_storage_t to address buffer overflows#5950
glennsong09 wants to merge 8 commits intoHDFGroup:developfrom
glennsong09:issue5892

Conversation

@glennsong09
Copy link
Collaborator

@glennsong09 glennsong09 commented Oct 28, 2025

Important

Add size field to H5D_compact_storage_t and update related functions to prevent buffer overflows.

  • Behavior:
    • Add size field to H5D_compact_storage_t to track buffer size.
    • Update H5D__compact_readvv() in H5Dcompact.c to check buffer size before reading.
    • Modify H5D__chunk_lock() in H5Dchunk.c to initialize and set alloc_chunk_size.
  • Functions:
    • Update H5D__chunk_lock() to accept size_t *alloc_chunk_size and set it during chunk allocation.
    • Modify H5D__chunk_read() and H5D__chunk_write() in H5Dchunk.c to use alloc_chunk_size.
  • Structures:
    • Add size field to H5D_compact_storage_t in H5Dpkg.h to store buffer size.

This description was created by Ellipsis for 72bdcc4. You can customize this summary. It will automatically update as commits are pushed.

@glennsong09 glennsong09 marked this pull request as draft October 28, 2025 20:39
@nbagha1 nbagha1 added this to the Release 2.0.0 milestone Oct 29, 2025
src/H5VM.c Outdated

/* Check if src buffer size is less than expected */
if (src_alloc_size > 0) {
if (src_alloc_size < tmp_src_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking tmp_src_len isn't enough here, as there may be more than one sequence and depending on which branch is taken below, either tmp_src_len or tmp_dst_len will need to be checked. You can see that tmp_src_len and tmp_dst_len are continually updated in the loops below, so you'll need to check tmp_src_len/tmp_dst_len against the remaining space left in _src on each iteration, which will involve keeping track of the end of _src and using that alongside the current offset into it (src). The checking can probably be done in the usual way with the H5_IS_BUFFER_OVERFLOW macro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an update to this in my PR.

src/H5VM.c Outdated
/* Check if src buffer size is less than expected */
if (src_alloc_size > 0) {
if (src_alloc_size < tmp_src_len)
HGOTO_ERROR(H5E_INTERNAL, H5E_CANTOPERATE, FAIL, "src buffer size is less than expected");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values are the same, but since this function returns an ssize_t currently, (-1) should be explicitly returned for consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this.

src/H5Dselect.c Outdated
/* Perform vectorized memcpy from src_buf to dst_buf */
if ((bytes_copied = H5VM_memcpyvv(dst_buf, dst_nseq, &curr_dst_seq, dst_len, dst_off, src_buf,
src_nseq, &curr_src_seq, src_len, src_off)) < 0)
src_nseq, &curr_src_seq, src_len, src_off, 0)) < 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe consider SIZE_MAX instead of 0 for an unknown value, but in this case we should probably make sure we always have a known value to pass to H5VM_memcpyvv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in my update. However, I kept two SIZE_MAX because I looked into finding their size, but does not appear to be a good way to find or easily store their size due to many places messing with the size of dset_info->buf.cvp. Do you mind taking a look and letting me know if you have any good ideas?

static void *
H5D__chunk_lock(const H5D_io_info_t H5_ATTR_NDEBUG_UNUSED *io_info, const H5D_dset_io_info_t *dset_info,
H5D_chunk_ud_t *udata, bool relax, bool prev_unfilt_chunk)
H5D_chunk_ud_t *udata, bool relax, bool prev_unfilt_chunk, size_t *alloc_chunk_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, it looks like there are several places where the final value for alloc_chunk_size should be set. At least:

  • After the chunk entry is retrieved from the cache on line 4372 (which means the H5D_rdcc_ent_t structure is probably also going to need to track the size of the allocated chunk buffer)
  • After the call to H5D__chunk_mem_alloc() on line 4408
  • After the call to H5D__chunk_mem_alloc() on line 4434
  • After the call to H5D__chunk_mem_alloc() on line 4515
  • After the call to H5D__chunk_mem_alloc() on line 4535 (as is done currently in this PR)
  • After the call to H5Z_pipeline() on line 4553 (as is done currently in this PR, but using the value of buf_alloc that comes back rather than setting alloc_chunk_size to 0)
  • After the call to H5D__chunk_mem_alloc() on line 4561
  • After the call to H5D__chunk_mem_alloc() on line 4582

Due to all these places, it would probably be easiest to keep track of the value in a local variable and assign the value to alloc_chunk_size somewhere toward the end of the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this comment.

@ajelenak ajelenak removed this from the Release 2.0.0 milestone Oct 30, 2025
@nbagha1 nbagha1 moved this from To be triaged to Backlog in HDF5 - TRIAGE & TRACK Nov 22, 2025
@fortnern
Copy link
Member

Where does the invalid chunk size come from? In H5D__chunk_lock it's just taken from the layout as far as I can see so I'm not sure why that function needs to return it.

@fortnern
Copy link
Member

The size of the valid data in the chunk should be constant an predictable without needing H5D__chunk_lock to return it. Does the buffer overflow occur because the selection goes beyond the chunk or because the chunk is allocated too small? Is it the case where an unfiltered edge chunk of an otherwise filtered dataset has the wrong size stored in the index?

@jhendersonHDF
Copy link
Collaborator

The size of the valid data in the chunk should be constant an predictable without needing H5D__chunk_lock to return it. Does the buffer overflow occur because the selection goes beyond the chunk or because the chunk is allocated too small? Is it the case where an unfiltered edge chunk of an otherwise filtered dataset has the wrong size stored in the index?

I don't remember exactly where the invalid value exists in the file, but if I'm remembering correctly it's the case that file corruption causes H5D__chunk_lock() to incorrectly allocate a buffer that is too small without giving back any information about how big that buffer is. Later on, the compact I/O routines run off the buffer since there's an assumption of how big the buffer they're passed is going to be.

@fortnern
Copy link
Member

H5D__chunk_lock shouldn't need to report the size of the buffer, it should always be the same size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Planning

Development

Successfully merging this pull request may close these issues.

5 participants