Skip to content

Conversation

@jlarson4
Copy link
Collaborator

@jlarson4 jlarson4 commented Jan 21, 2026

v2.17.0 Release

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

huseyincavusbi and others added 9 commits January 15, 2026 08:07
* Add Gemma 3 270M model support

- Add google/gemma-3-270m and google/gemma-3-270m-it to supported models
- Add architecture detection for Gemma3ForCausalLM
- Add hardcoded configuration with d_head=256 and use_qk_norm=True
- Add Q/K normalization weight loading in gemma weight converter

* Add Gemma 3 1B model support

- Add google/gemma-3-1b-pt and google/gemma-3-1b-it to supported models
- Add configuration with d_model=1152, d_mlp=6912, n_layers=26
- Maintains d_head=256 (hardcoded for all Gemma models)
- Includes use_qk_norm=True and use_normalization_before_and_after=True

* Add Gemma 3 and MedGemma 4B multimodal model support with text-only extraction

- Add google/gemma-3-4b-pt, gemma-3-4b-it, medgemma-4b-pt, medgemma-4b-it
- Implement pattern-based architecture detection (CausalLM vs ConditionalGeneration)
- Add 4B config with GQA support (n_key_value_heads=4)
- Extract text-only weights from multimodal models via language_model component
- Add AutoModel loader for Gemma3ForConditionalGeneration architecture

* Fix device mismatch for Gemma models on MPS

Add device parameter to all torch.zeros() calls in gemma weight conversion
to ensure bias tensors are created on the same device as weight tensors.
This fixes RuntimeError when loading Gemma models on Apple Silicon with MPS backend.

- Add device parameter to attention biases (b_Q, b_K, b_V, b_O)
- Add device parameter to MLP biases (b_in, b_out)
- Add device parameter to unembed bias (b_U)
- Handle both lm_head and tied embeddings for unembed device

* feat: Gemma 3 memory optimization and n_ctx override

- Reduce default context: 270M/1B (32K->8K), 4B (131K->8K)
- Add n_ctx parameter for context length override
- Fix multimodal weight extraction (nested model access)
- Add kwargs filtering for n_ctx parameter

* feat: Add Gemma 3 12B and 27B model support

- Added 6 new models: gemma-3-12b-pt/it, gemma-3-27b-pt/it, medgemma-27b-it/text-it
- 12B config: 3840 d_model, 48 layers, 16 heads, 8 KV heads (2:1 GQA)
- 27B config: 5376 d_model, 62 layers, 32 heads, 16 KV heads (2:1 GQA)
- All use safe 8K default context (overridable to 131K)
- Special handling for medgemma-27b-text-it (text-only, 262144 vocab)

* fix: Implement Gemma 3 hybrid local/global attention architecture (5:1 pattern)

* feat: Add per-layer RoPE base support for Gemma 3

* Fix Gemma 3 head dimensions

* Fix formatting issues

* Fix Colab_Compatibility notebook CI failure

* Fix formatting regression (black 23.3.0)

* Fix Interactive_Neuroscope CI failure (deps & notebook)

* Add protobuf dependency to fix Main_Demo.ipynb import error

* Pin transformers to 4.46.3 to fix huggingface-hub version conflict

* Add huggingface-hub<1.0 constraint to match transformers requirements

* Fix CI: Force Poetry to sync dependencies with lock file

* Fix CI: Force huggingface-hub <1.0 for transformers compatibility

* Skip build-docs and deploy-docs jobs on forks

* Fix notebook-checks: Force huggingface-hub <1.0 after poetry install

* Add disk cleanup to CI jobs to prevent 'No space left on device' errors

* Fix notebook-checks: Disable Poetry cache and force uninstall/reinstall huggingface-hub

* Fix notebook kernel to use Poetry venv

* Fix huggingface-hub version conflict in notebook CI

* Move huggingface-hub fix after ipykernel install

* Skip pip installs in GitHub CI for Interactive_Neuroscope

* Install gradio in GitHub CI without overriding poetry deps

* Add gradio as dev dependency for notebooks

* Regenerate poetry.lock after adding gradio

* Add unit tests for Gemma 3 and MedGemma model support

* fix: Remove unused imports to pass CI format check

* fix: Sort imports with isort

* fix: Format code with black

* docs: Add docstrings for use_qk_norm and rotary_base_local parameters

* fix: Format HookedTransformerConfig.py with black 23.x

* Update demos/Interactive_Neuroscope.ipynb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update demos/Interactive_Neuroscope.ipynb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert "Update demos/Interactive_Neuroscope.ipynb"

This reverts commit 95cc561.

* test: Update transformers to >=4.51 to test CI compatibility

* Fix Gemma 3 long-context generation and Q/K norm weights

* style: Format with black

* fix: Add type assertion for rotary_dim

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: George M <georgem17636315081@outlook.com>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* Add support for Qwen/Qwen3-0.6B-Base model

This commit adds support for the base (non-instruct) version of Qwen3-0.6B.
The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B)
share the same architecture but have different weights. The base model is
suitable for fine-tuning, while the instruct model is optimized for
instruction-following and chat.

Changes:
- Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES
- Added alias "qwen3-0.6b-base" to MODEL_ALIASES

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update Colab_Compatibility notebook to include Qwen3-0.6B-Base

Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the
Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES
are accounted for in the test suite.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix notebook output to reflect 217 models

Update the model count in Colab_Compatibility notebook output
from 216 to 217 to reflect the addition of Qwen3-0.6B-Base.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: name <email@example.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
* Repairing tests that were broken by module updates included with the Gemma3 feature
- rigidity of test_cross_attention confidence testing reduced slightly
- updated ActivationCache to ensure the `tokens` variable maintains the correct type for the `tokens_to_residual_directions` functions
- Fixed type checking bug in `transformer_lens/utilities/devices.py`

* resolving format errors
* resolving more tests
* Revert changes to the CI
* Fixing mypy type checking issue
* lock update
* - fix test_eigenvalues_property error via type change.
- fix #934

* - ci/cd fix

* - ci/cd poetry fix 2

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: kapedalex <kapedalex@gmail.com>
Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
* Move wandb into train

* add tests

* ci fix

* ci fix 2

* - add explanation for 1102

* Remove additional huggingfacehub inclusion in pyproject.toml

* resolving poetry lock changes

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: kapedalex <kapedalex@gmail.com>
Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
…om n_key_value_heads (#981)

* Fix the case where n_head and n_key_value_heads are different for a model

* Update doc string

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
@jlarson4 jlarson4 changed the title New Release – v2.17.1 New Release – v2.17.0 Jan 21, 2026
@jlarson4 jlarson4 merged commit 7df72ff into main Jan 21, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants