Skip to content

Conversation

@CyCle1024
Copy link
Collaborator

@CyCle1024 CyCle1024 commented Jan 7, 2026

Motivation

In the case of len(params)==1, foreach_all_gather is in low performance due to copyin operation.
Besides, in the case of _get_fused_params usage, local tensor of params are padded to the same size on each rank, this behaviour can avoid one all gather communication of num elements.

Key Changes

  1. Add params_shapes_across_group parameter in foreach_all_gather, it aims to provide static shape information for the foreach_all_gather rather than all gather num elements metadata.
  2. Add optimized implementation for len(params)==1 of foreach_all_gather, avoiding copyin operation and copyout metadata all gather. It's compatible whether we provide the params_shapes_across_group or not.

@CyCle1024 CyCle1024 requested a review from HAOCHENYE January 7, 2026 13:59
@CyCle1024 CyCle1024 force-pushed the refactor_foreach_allgather branch from 8fbecd5 to 92e40de Compare January 13, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant