New Release – v2.17.0 #1159

jlarson4 · 2026-01-21T21:58:42Z

v2.17.0 Release

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* Add Gemma 3 270M model support - Add google/gemma-3-270m and google/gemma-3-270m-it to supported models - Add architecture detection for Gemma3ForCausalLM - Add hardcoded configuration with d_head=256 and use_qk_norm=True - Add Q/K normalization weight loading in gemma weight converter * Add Gemma 3 1B model support - Add google/gemma-3-1b-pt and google/gemma-3-1b-it to supported models - Add configuration with d_model=1152, d_mlp=6912, n_layers=26 - Maintains d_head=256 (hardcoded for all Gemma models) - Includes use_qk_norm=True and use_normalization_before_and_after=True * Add Gemma 3 and MedGemma 4B multimodal model support with text-only extraction - Add google/gemma-3-4b-pt, gemma-3-4b-it, medgemma-4b-pt, medgemma-4b-it - Implement pattern-based architecture detection (CausalLM vs ConditionalGeneration) - Add 4B config with GQA support (n_key_value_heads=4) - Extract text-only weights from multimodal models via language_model component - Add AutoModel loader for Gemma3ForConditionalGeneration architecture * Fix device mismatch for Gemma models on MPS Add device parameter to all torch.zeros() calls in gemma weight conversion to ensure bias tensors are created on the same device as weight tensors. This fixes RuntimeError when loading Gemma models on Apple Silicon with MPS backend. - Add device parameter to attention biases (b_Q, b_K, b_V, b_O) - Add device parameter to MLP biases (b_in, b_out) - Add device parameter to unembed bias (b_U) - Handle both lm_head and tied embeddings for unembed device * feat: Gemma 3 memory optimization and n_ctx override - Reduce default context: 270M/1B (32K->8K), 4B (131K->8K) - Add n_ctx parameter for context length override - Fix multimodal weight extraction (nested model access) - Add kwargs filtering for n_ctx parameter * feat: Add Gemma 3 12B and 27B model support - Added 6 new models: gemma-3-12b-pt/it, gemma-3-27b-pt/it, medgemma-27b-it/text-it - 12B config: 3840 d_model, 48 layers, 16 heads, 8 KV heads (2:1 GQA) - 27B config: 5376 d_model, 62 layers, 32 heads, 16 KV heads (2:1 GQA) - All use safe 8K default context (overridable to 131K) - Special handling for medgemma-27b-text-it (text-only, 262144 vocab) * fix: Implement Gemma 3 hybrid local/global attention architecture (5:1 pattern) * feat: Add per-layer RoPE base support for Gemma 3 * Fix Gemma 3 head dimensions * Fix formatting issues * Fix Colab_Compatibility notebook CI failure * Fix formatting regression (black 23.3.0) * Fix Interactive_Neuroscope CI failure (deps & notebook) * Add protobuf dependency to fix Main_Demo.ipynb import error * Pin transformers to 4.46.3 to fix huggingface-hub version conflict * Add huggingface-hub<1.0 constraint to match transformers requirements * Fix CI: Force Poetry to sync dependencies with lock file * Fix CI: Force huggingface-hub <1.0 for transformers compatibility * Skip build-docs and deploy-docs jobs on forks * Fix notebook-checks: Force huggingface-hub <1.0 after poetry install * Add disk cleanup to CI jobs to prevent 'No space left on device' errors * Fix notebook-checks: Disable Poetry cache and force uninstall/reinstall huggingface-hub * Fix notebook kernel to use Poetry venv * Fix huggingface-hub version conflict in notebook CI * Move huggingface-hub fix after ipykernel install * Skip pip installs in GitHub CI for Interactive_Neuroscope * Install gradio in GitHub CI without overriding poetry deps * Add gradio as dev dependency for notebooks * Regenerate poetry.lock after adding gradio * Add unit tests for Gemma 3 and MedGemma model support * fix: Remove unused imports to pass CI format check * fix: Sort imports with isort * fix: Format code with black * docs: Add docstrings for use_qk_norm and rotary_base_local parameters * fix: Format HookedTransformerConfig.py with black 23.x * Update demos/Interactive_Neuroscope.ipynb Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update demos/Interactive_Neuroscope.ipynb Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Revert "Update demos/Interactive_Neuroscope.ipynb" This reverts commit 95cc561. * test: Update transformers to >=4.51 to test CI compatibility * Fix Gemma 3 long-context generation and Q/K norm weights * style: Format with black * fix: Add type assertion for rotary_dim --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: George M <georgem17636315081@outlook.com>

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Add support for Qwen/Qwen3-0.6B-Base model This commit adds support for the base (non-instruct) version of Qwen3-0.6B. The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B) share the same architecture but have different weights. The base model is suitable for fine-tuning, while the instruct model is optimized for instruction-following and chat. Changes: - Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES - Added alias "qwen3-0.6b-base" to MODEL_ALIASES 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update Colab_Compatibility notebook to include Qwen3-0.6B-Base Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES are accounted for in the test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix notebook output to reflect 217 models Update the model count in Colab_Compatibility notebook output from 216 to 217 to reflect the addition of Qwen3-0.6B-Base. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: name <email@example.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>

* Repairing tests that were broken by module updates included with the Gemma3 feature - rigidity of test_cross_attention confidence testing reduced slightly - updated ActivationCache to ensure the `tokens` variable maintains the correct type for the `tokens_to_residual_directions` functions - Fixed type checking bug in `transformer_lens/utilities/devices.py` * resolving format errors * resolving more tests * Revert changes to the CI * Fixing mypy type checking issue * lock update

* - fix test_eigenvalues_property error via type change. - fix #934 * - ci/cd fix * - ci/cd poetry fix 2 --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: kapedalex <kapedalex@gmail.com> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>

* Move wandb into train * add tests * ci fix * ci fix 2 * - add explanation for 1102 * Remove additional huggingfacehub inclusion in pyproject.toml * resolving poetry lock changes --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: kapedalex <kapedalex@gmail.com> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>

…om n_key_value_heads (#981) * Fix the case where n_head and n_key_value_heads are different for a model * Update doc string --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

huseyincavusbi and others added 9 commits January 15, 2026 08:07

Add timestamp for 2.0 announcement (#983)

d0a430b

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

Fix key and value heads patching for models with different n_heads fr…

e9e7448

…om n_key_value_heads (#981) * Fix the case where n_head and n_key_value_heads are different for a model * Update doc string --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

updating the compatibility notebook (#1158)

cfc3b3f

Adjusting the formatting of the v2 release announcement to pass CI

c447e78

jlarson4 changed the title ~~New Release – v2.17.1~~ New Release – v2.17.0 Jan 21, 2026

jlarson4 merged commit 7df72ff into main Jan 21, 2026
38 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Release – v2.17.0 #1159

New Release – v2.17.0 #1159

Uh oh!

jlarson4 commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

New Release – v2.17.0 #1159

New Release – v2.17.0 #1159

Uh oh!

Conversation

jlarson4 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jlarson4 commented Jan 21, 2026 •

edited

Loading