Skip to content

Conversation

@GarrettBeatty
Copy link
Contributor

@GarrettBeatty GarrettBeatty commented Dec 11, 2025

Description

This PR refactors the buffered multipart download data handling by replacing the single large ArrayPool buffer approach with a chunked buffer stream design. The key changes include:

  • New ChunkedBufferStream class: A writable/readable stream that stores data across multiple small (80KB) ArrayPool buffers, avoiding Large Object Heap (LOH) allocations
  • New ChunkedPartDataSource class: An IPartDataSource implementation that wraps ChunkedBufferStream for seamless integration with the PartBufferManager
  • Removed BufferedDataSource and StreamPartBuffer classes: These are replaced by the new chunked approach
  • Updated BufferedPartDataHandler: Modified to use the new ChunkedBufferStream instead of StreamPartBuffer
  • Updated PartBufferManager and IPartBufferManager: Simplified interface and implementation to work with the new data source pattern

Motivation and Context

The previous buffering implementation used single large ArrayPool buffers which had two issues:

  1. LOH allocations: Buffers larger than 85KB are allocated on the Large Object Heap, causing memory fragmentation and GC pressure
  2. 2GB size limitation: Single arrays in .NET cannot exceed ~2GB (int.MaxValue bytes), limiting the maximum part size

The new chunked approach:

  • Keeps each buffer chunk at 64KB to stay safely below the 85KB LOH threshold
  • Supports streams larger than 2GB by distributing data across multiple chunks

#3806

Testing

  • Added comprehensive unit tests for ChunkedBufferStream (878 lines) covering:
    • Write and read operations
    • Mode switching (write-to-read)
    • Disposal and ArrayPool buffer returns
    • Edge cases and error conditions
  • Added comprehensive unit tests for ChunkedPartDataSource (602 lines) covering:
    • Construction and validation
    • Read operations
    • Disposal behavior
    • Integration scenarios
  • Updated existing BufferedPartDataHandlerTests and PartBufferManagerTests to work with the new implementation
  • Removed obsolete tests for BufferedDataSource and StreamPartBuffer

Performance Testing

I did notice some slowdown from the older implementation

New

Total bytes per run: 5,368,709,120

Run:1 Secs:4.541582 Gb/s:9.456985
Run:2 Secs:8.800254 Gb/s:4.880504
Run:3 Secs:2.014526 Gb/s:21.319988
Run:4 Secs:3.785906 Gb/s:11.344622
Run:5 Secs:11.958203 Gb/s:3.591649
Run:6 Secs:4.626160 Gb/s:9.284086
Run:7 Secs:2.009955 Gb/s:21.368480
Run:8 Secs:1.873424 Gb/s:22.925757
Run:9 Secs:2.329907 Gb/s:18.434070
Run:10 Secs:1.595272 Gb/s:26.923105

vs

Old

Total bytes per run: 5,368,709,120

Run:1 Secs:1.636832 Gb/s:26.239515
Run:2 Secs:1.026863 Gb/s:41.826098
Run:3 Secs:2.580466 Gb/s:16.644151
Run:4 Secs:0.925399 Gb/s:46.412059
Run:5 Secs:0.841140 Gb/s:51.061266
Run:6 Secs:0.816797 Gb/s:52.583070
Run:7 Secs:0.773071 Gb/s:55.557209
Run:8 Secs:0.821784 Gb/s:52.263931
Run:9 Secs:0.816619 Gb/s:52.594513
Run:10 Secs:0.809192 Gb/s:53.077240

50TB Testing

I also tested with the 50TB file and didnt see any issues

Screenshots (if appropriate)

Ran using visual studio performance analyzer

before change

image

after change

no LOH allocations

image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the S3 multipart download buffering implementation to use a chunked stream approach instead of single large ArrayPool buffers. The change addresses potential issues with very large parts (>2GB) and Large Object Heap allocations while maintaining the same functional behavior.

Key Changes:

  • Replaces StreamPartBuffer and BufferedDataSource with ChunkedBufferStream and ChunkedPartDataSource that use multiple small 80KB ArrayPool chunks instead of single large buffers
  • Removes the obsolete AddBuffer overload methods from IPartBufferManager interface, consolidating on AddDataSource
  • Removes unused config parameter from BufferedMultipartStream constructor

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sdk/src/Services/S3/Custom/Transfer/Internal/ChunkedBufferStream.cs New core stream implementation that manages data across multiple 80KB ArrayPool chunks to avoid LOH and support parts >2GB
sdk/src/Services/S3/Custom/Transfer/Internal/ChunkedPartDataSource.cs New IPartDataSource wrapper for ChunkedBufferStream with part number tracking and disposal management
sdk/src/Services/S3/Custom/Transfer/Internal/BufferedPartDataHandler.cs Updated to create ChunkedPartDataSource instead of StreamPartBuffer for out-of-order parts
sdk/src/Services/S3/Custom/Transfer/Internal/BufferedMultipartStream.cs Removed unused config parameter from constructor
sdk/src/Services/S3/Custom/Transfer/Internal/PartBufferManager.cs Removed obsolete AddBuffer overload methods
sdk/src/Services/S3/Custom/Transfer/Internal/IPartBufferManager.cs Removed obsolete AddBuffer method declarations from interface
sdk/src/Services/S3/Custom/Transfer/Internal/StreamPartBuffer.cs Deleted - replaced by ChunkedBufferStream
sdk/src/Services/S3/Custom/Transfer/Internal/BufferedDataSource.cs Deleted - replaced by ChunkedPartDataSource
sdk/test/Services/S3/UnitTests/Custom/StreamPartBufferTests.cs Deleted - tests for removed class
sdk/test/Services/S3/UnitTests/Custom/BufferedDataSourceTests.cs Deleted - tests for removed class
sdk/test/Services/S3/UnitTests/Custom/PartBufferManagerTests.cs Updated all test cases to use ChunkedPartDataSource instead of StreamPartBuffer
sdk/test/Services/S3/UnitTests/Custom/BufferedPartDataHandlerTests.cs Updated mocks and assertions to expect ChunkedPartDataSource instead of StreamPartBuffer
sdk/test/Services/S3/UnitTests/Custom/BufferedMultipartStreamTests.cs Updated constructor calls to remove config parameter

@GarrettBeatty GarrettBeatty changed the title chunked stream Multi Part Download: Use Chunked Stream Dec 13, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

#endregion

#region Logger
private readonly Logger _logger = Logger.GetLogger(typeof(PartBufferManager));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing this to be consistent with the other classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant