[Feature] Add LoRA Inference Support for WAN Models via Flax NNX by Perseus14 · Pull Request #308 · AI-Hypercomputer/maxdiffusion

Perseus14 · 2026-01-15T19:15:35Z

Summary

This PR introduces full Low-Rank Adaptation (LoRA) inference support for the WAN family of models in MaxDiffusion.

Unlike previous implementations in this codebase that rely on flax.linen, this implementation leverages flax.nnx. This allows for a more Pythonic, object-oriented approach to weight injection, enabling us to modify the transformer model in-place.

Key Features

1. Transition to `flax.nnx`

WAN models in MaxDiffusion are implemented using flax.nnx. To support LoRA, we implemented a native NNX loader rather than wrapping linen modules.

In-Place Merging: We iterate through the NNX graph (nnx.iter_graph) to identify target layers (nnx.Linear, nnx.Conv, nnx.Embed, nnx.LayerNorm) and merge LoRA weights directly into the kernel values.
Graph Traversal: This approach avoids complex module replacement logic common in functional frameworks, allowing us to simply "visit" nodes and apply updates.

2. Robust Weight Merging Strategy

This implementation solves several critical distributed training/inference challenges:

Device-Side Merging (jax.jit): To avoid ShardingMismatch and DeviceArray errors that occur when mixing sharded TPU weights with CPU-based LoRA weights, all merge computations (kernel + delta) are performed within JIT-compiled functions (_compute_and_add_*_jit). This ensures weight updates occur efficiently on-device across the TPU mesh.
Zero-Copy Transfer: Utilizes jax.dlpack where possible to efficiently move PyTorch tensors to JAX arrays without unnecessary memory overhead.

3. Advanced LoRA Support

Beyond standard Linear rank reduction, this PR supports:

LoCON / LoRA for Convolutions: Supports LoRA for both 1x1 and kxk convolutions. 1x1 convolutions are merged efficiently inside the JIT like linear layers, while kxk convolution deltas (LoCON) are pre-calculated on the host and added to any existing diff weights before device-side merging.
Full Weight & Bias Diffs (diff, diff_b): Supports checkpoints that include full-parameter fine-tuning offsets (difference injections) and bias tuning, which are common in high-fidelity WAN fine-tunes.
Embeddings & Norms: Includes support for patching text_embedding, time_embedding, and LayerNorm/RMSNorm scales and biases.

4. Scanned vs. Unscanned Layers

MaxDiffusion supports enabling jax.scan for transformer layers via the scan_layers: True configuration flag. This improves training memory efficiency by stacking weights of repeated layers (e.g., Attention, FFN) along a new leading dimension. Since users may run inference with or without this flag enabled, this LoRA implementation is designed to transparently support both modes.

The loader distinguishes between:

scan_layers: False: The model graph is "unrolled." The merge_lora() function is used, which iterates through each layer and merges weights individually via efficient, on-device JIT calls (_compute_and_add_single_jit).
scan_layers: True: The merge_lora_for_scanned() function is used. It detects which parameters are stacked (e.g., kernel.ndim > 2) and which are not.
- For stacked parameters: It gathers all corresponding LoRA weights on the host CPU into stacked NumPy arrays and dispatches a single, batched call to _compute_and_add_scanned_jit. This updates all layers in the stack at once on-device, which is significantly more efficient than merging layer-by-layer.
- For non-stacked parameters (e.g., embeddings, proj_out): It merges them individually using the single-layer JIT logic.

This dual approach ensures correct weight injection whether or not layers are scanned, while maximizing performance in scanned mode through batching.

Files Added / Modified

src/maxdiffusion/models/lora_nnx.py: [NEW] Core logic. Contains the JIT merge functions, parse_lora_dict, and the graph traversal logic (merge_lora, merge_lora_for_scanned) to inject weights into NNX modules.
src/maxdiffusion/loaders/wan_lora_nnx_loader.py: [NEW] Orchestrates the loading process. Handles the download of safetensors, conversion of keys, and delegation to the merge functions.
src/maxdiffusion/generate_wan.py: Updated the generation pipeline to identify if lora is enabled and trigger the loading sequence before inference.
src/maxdiffusion/lora_conversion_utils.py: Updated translate_wan_nnx_path_to_diffusers_lora to accurately map NNX paths (including embeddings and time projections) to Diffusers-style keys.

Testing

Scenario 1: Validation of LoRA weights for WAN2.1 and WAN2.2, designed to enable high-quality video generation in a reduced number of inference steps.

Model	LoRA Type	Video Link	Inference Steps	Generation Time
WAN 2.1	T2V	Link to Video	4 steps	~16s
WAN 2.1	I2V	Link to Video	4 steps	~16s
WAN 2.2	T2V	Link to Video	8 steps	~25s
WAN 2.2	I2V	Link to Video	8 steps	~41s

Scenario 2: Validation of Multiple LoRA weights
- WAN2.1 distill_lora and divine_power_lora
- WAN2.2 distill_lora and orbit_shot_lora

Model	LoRA Type	Video Link	Inference Steps	Generation Time	Dimensions
WAN 2.1	I2V	Link to Video	4 steps	~16s	720p
WAN 2.2	I2V	Link to Video	8 steps	~16s	480p

github-actions · 2026-01-15T19:15:44Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

prishajain1 · 2026-01-19T05:16:09Z

Does LoRA support the I2V pipelines as well?

Perseus14 · 2026-01-23T12:40:42Z

Added examples of I2V support

entrpn

can this implementation load multiple loras at once?

src/maxdiffusion/loaders/wan_lora_nnx_loader.py

src/maxdiffusion/models/lora_nnx.py

entrpn · 2026-01-23T18:42:46Z

src/maxdiffusion/models/lora_nnx.py

+  return jnp.array(v)
+
+
+def parse_lora_dict(state_dict):


do you know which lora formats are supported by this function? There are a couple lora trainers out there, might want to specify in a comment or readme which ones we're specifically targeting (diffusers, or others).

Added comment that it supports ComfyUI and AI Toolkit lora formats

src/maxdiffusion/models/lora_nnx.py

Perseus14 · 2026-01-27T12:43:29Z

Now supports multiple loras at once. Example added to description

entrpn · 2026-01-28T01:27:21Z

@Perseus14 please squash your commit and make sure linter tests pass. Other than than, looks good.

Perseus14 marked this pull request as ready for review January 16, 2026 05:03

Perseus14 requested review from entrpn and prishajain1 January 16, 2026 05:04

Perseus14 marked this pull request as draft January 21, 2026 15:34

Perseus14 force-pushed the wan_lora branch 2 times, most recently from 9290a6e to e1b7221 Compare January 22, 2026 20:42

Perseus14 marked this pull request as ready for review January 23, 2026 12:40

Perseus14 force-pushed the wan_lora branch from 3a82d8d to 6b2b704 Compare January 23, 2026 15:30

entrpn reviewed Jan 23, 2026

View reviewed changes

Perseus14 force-pushed the wan_lora branch 2 times, most recently from 7f018e4 to 9b5051c Compare January 27, 2026 13:22

Perseus14 marked this pull request as draft January 27, 2026 15:34

entrpn previously approved these changes Jan 28, 2026

View reviewed changes

Perseus14 dismissed entrpn’s stale review via 76afafa January 28, 2026 08:49

Perseus14 force-pushed the wan_lora branch from 8154762 to 76afafa Compare January 28, 2026 08:49

entrpn approved these changes Jan 28, 2026

View reviewed changes

entrpn marked this pull request as ready for review January 28, 2026 14:09

Perseus14 added the pull ready label Jan 28, 2026

Add LoRA support for WAN models

b13347c

Perseus14 force-pushed the wan_lora branch from 76afafa to b13347c Compare January 28, 2026 14:37

Perseus14 self-assigned this Jan 28, 2026

prishajain1 approved these changes Jan 28, 2026

View reviewed changes

entrpn merged commit 9622341 into main Jan 29, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add LoRA Inference Support for WAN Models via Flax NNX#308

[Feature] Add LoRA Inference Support for WAN Models via Flax NNX#308
entrpn merged 1 commit intomainfrom
wan_lora

Perseus14 commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

prishajain1 commented Jan 19, 2026

Uh oh!

Perseus14 commented Jan 23, 2026

Uh oh!

entrpn left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

entrpn Jan 23, 2026

Uh oh!

Perseus14 Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented Jan 27, 2026

Uh oh!

entrpn commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Perseus14 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

1. Transition to flax.nnx

2. Robust Weight Merging Strategy

3. Advanced LoRA Support

4. Scanned vs. Unscanned Layers

Files Added / Modified

Testing

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

prishajain1 commented Jan 19, 2026

Uh oh!

Perseus14 commented Jan 23, 2026

Uh oh!

entrpn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

entrpn Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Perseus14 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented Jan 27, 2026

Uh oh!

entrpn commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Perseus14 commented Jan 15, 2026 •

edited

Loading

1. Transition to `flax.nnx`