-
Notifications
You must be signed in to change notification settings - Fork 499
New Release – v2.17.0 #1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
New Release – v2.17.0 #1159
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add Gemma 3 270M model support - Add google/gemma-3-270m and google/gemma-3-270m-it to supported models - Add architecture detection for Gemma3ForCausalLM - Add hardcoded configuration with d_head=256 and use_qk_norm=True - Add Q/K normalization weight loading in gemma weight converter * Add Gemma 3 1B model support - Add google/gemma-3-1b-pt and google/gemma-3-1b-it to supported models - Add configuration with d_model=1152, d_mlp=6912, n_layers=26 - Maintains d_head=256 (hardcoded for all Gemma models) - Includes use_qk_norm=True and use_normalization_before_and_after=True * Add Gemma 3 and MedGemma 4B multimodal model support with text-only extraction - Add google/gemma-3-4b-pt, gemma-3-4b-it, medgemma-4b-pt, medgemma-4b-it - Implement pattern-based architecture detection (CausalLM vs ConditionalGeneration) - Add 4B config with GQA support (n_key_value_heads=4) - Extract text-only weights from multimodal models via language_model component - Add AutoModel loader for Gemma3ForConditionalGeneration architecture * Fix device mismatch for Gemma models on MPS Add device parameter to all torch.zeros() calls in gemma weight conversion to ensure bias tensors are created on the same device as weight tensors. This fixes RuntimeError when loading Gemma models on Apple Silicon with MPS backend. - Add device parameter to attention biases (b_Q, b_K, b_V, b_O) - Add device parameter to MLP biases (b_in, b_out) - Add device parameter to unembed bias (b_U) - Handle both lm_head and tied embeddings for unembed device * feat: Gemma 3 memory optimization and n_ctx override - Reduce default context: 270M/1B (32K->8K), 4B (131K->8K) - Add n_ctx parameter for context length override - Fix multimodal weight extraction (nested model access) - Add kwargs filtering for n_ctx parameter * feat: Add Gemma 3 12B and 27B model support - Added 6 new models: gemma-3-12b-pt/it, gemma-3-27b-pt/it, medgemma-27b-it/text-it - 12B config: 3840 d_model, 48 layers, 16 heads, 8 KV heads (2:1 GQA) - 27B config: 5376 d_model, 62 layers, 32 heads, 16 KV heads (2:1 GQA) - All use safe 8K default context (overridable to 131K) - Special handling for medgemma-27b-text-it (text-only, 262144 vocab) * fix: Implement Gemma 3 hybrid local/global attention architecture (5:1 pattern) * feat: Add per-layer RoPE base support for Gemma 3 * Fix Gemma 3 head dimensions * Fix formatting issues * Fix Colab_Compatibility notebook CI failure * Fix formatting regression (black 23.3.0) * Fix Interactive_Neuroscope CI failure (deps & notebook) * Add protobuf dependency to fix Main_Demo.ipynb import error * Pin transformers to 4.46.3 to fix huggingface-hub version conflict * Add huggingface-hub<1.0 constraint to match transformers requirements * Fix CI: Force Poetry to sync dependencies with lock file * Fix CI: Force huggingface-hub <1.0 for transformers compatibility * Skip build-docs and deploy-docs jobs on forks * Fix notebook-checks: Force huggingface-hub <1.0 after poetry install * Add disk cleanup to CI jobs to prevent 'No space left on device' errors * Fix notebook-checks: Disable Poetry cache and force uninstall/reinstall huggingface-hub * Fix notebook kernel to use Poetry venv * Fix huggingface-hub version conflict in notebook CI * Move huggingface-hub fix after ipykernel install * Skip pip installs in GitHub CI for Interactive_Neuroscope * Install gradio in GitHub CI without overriding poetry deps * Add gradio as dev dependency for notebooks * Regenerate poetry.lock after adding gradio * Add unit tests for Gemma 3 and MedGemma model support * fix: Remove unused imports to pass CI format check * fix: Sort imports with isort * fix: Format code with black * docs: Add docstrings for use_qk_norm and rotary_base_local parameters * fix: Format HookedTransformerConfig.py with black 23.x * Update demos/Interactive_Neuroscope.ipynb Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update demos/Interactive_Neuroscope.ipynb Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Revert "Update demos/Interactive_Neuroscope.ipynb" This reverts commit 95cc561. * test: Update transformers to >=4.51 to test CI compatibility * Fix Gemma 3 long-context generation and Q/K norm weights * style: Format with black * fix: Add type assertion for rotary_dim --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: George M <georgem17636315081@outlook.com>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* Add support for Qwen/Qwen3-0.6B-Base model This commit adds support for the base (non-instruct) version of Qwen3-0.6B. The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B) share the same architecture but have different weights. The base model is suitable for fine-tuning, while the instruct model is optimized for instruction-following and chat. Changes: - Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES - Added alias "qwen3-0.6b-base" to MODEL_ALIASES 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update Colab_Compatibility notebook to include Qwen3-0.6B-Base Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES are accounted for in the test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix notebook output to reflect 217 models Update the model count in Colab_Compatibility notebook output from 216 to 217 to reflect the addition of Qwen3-0.6B-Base. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: name <email@example.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
* Repairing tests that were broken by module updates included with the Gemma3 feature - rigidity of test_cross_attention confidence testing reduced slightly - updated ActivationCache to ensure the `tokens` variable maintains the correct type for the `tokens_to_residual_directions` functions - Fixed type checking bug in `transformer_lens/utilities/devices.py` * resolving format errors * resolving more tests * Revert changes to the CI * Fixing mypy type checking issue * lock update
* Move wandb into train * add tests * ci fix * ci fix 2 * - add explanation for 1102 * Remove additional huggingfacehub inclusion in pyproject.toml * resolving poetry lock changes --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: kapedalex <kapedalex@gmail.com> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
…om n_key_value_heads (#981) * Fix the case where n_head and n_key_value_heads are different for a model * Update doc string --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
v2.17.0 Release
Type of change
Please delete options that are not relevant.
Checklist: