-
Notifications
You must be signed in to change notification settings - Fork 0
Transformer Hyperparameters Tunning #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces a new transformer model with hyperparameter tuning using Optuna, alongside relevant documentation updates.
- Adds a new module for building and tuning a transformer model.
- Implements data processing, model construction, and hyperparameter optimization logic.
src/modules/optuna_transformer.py
Outdated
| ] | ||
|
|
||
| # Train the model | ||
| with tf.device("/GPU:0"): |
Copilot
AI
May 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider checking for GPU availability before explicitly setting the device to '/GPU:0' to ensure compatibility on systems without a GPU. This can help avoid potential runtime errors on CPU-only machines.
| with tf.device("/GPU:0"): | |
| device = "/GPU:0" if tf.config.list_physical_devices('GPU') else "/CPU:0" | |
| with tf.device(device): |
src/modules/optuna_transformer.py
Outdated
| x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs) | ||
| encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x) | ||
|
|
||
| decoder_inputs = tf.keras.Input(shape=(None,), dtype="int32", name="french") | ||
| x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs) | ||
| x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs) | ||
| x = tf.keras.layers.Dropout(dropout_rate)(x) | ||
| decoder_outputs = tf.keras.layers.Dense(vocab_size, activation="softmax")(x) |
Copilot
AI
May 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using a more descriptive variable name instead of 'x' to improve code readability, especially within the transformer pipeline where multiple transformations are applied.
| x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs) | |
| encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x) | |
| decoder_inputs = tf.keras.Input(shape=(None,), dtype="int32", name="french") | |
| x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs) | |
| x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs) | |
| x = tf.keras.layers.Dropout(dropout_rate)(x) | |
| decoder_outputs = tf.keras.layers.Dense(vocab_size, activation="softmax")(x) | |
| encoder_embeddings = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs) | |
| encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(encoder_embeddings) | |
| decoder_inputs = tf.keras.Input(shape=(None,), dtype="int32", name="french") | |
| decoder_embeddings = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs) | |
| decoder_outputs = TransformerDecoder(embed_dim, dense_dim, num_heads)(decoder_embeddings, encoder_outputs) | |
| dropout_outputs = tf.keras.layers.Dropout(dropout_rate)(decoder_outputs) | |
| final_outputs = tf.keras.layers.Dense(vocab_size, activation="softmax")(dropout_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request focuses on tuning the hyperparameters for the Transformer model used for French-to-English translation and updating related documentation. Key changes include:
- Adding a new argument (transformer_model_path) to load a saved model.
- Adjusting the model hyperparameters (embed_dim, dense_dim, num_heads, and dropout_rate) in the transformer_model function.
- Introducing a new module (optuna_transformer.py) for hyperparameter optimization using Optuna.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/translation_french_english.py | Updated transformer_model: added transformer_model_path parameter and modified hyperparameters. |
| src/modules/optuna_transformer.py | Added a new module to enable hyperparameter tuning with Optuna. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements transformer hyperparameter tuning by introducing new testing scripts and an Optuna-based module for hyperparameter optimization while updating the Transformer model parameters and documentation. Key changes include:
- New tests in tests/test_transformer_model.py for model building, training, evaluation, BLEU scoring, and loading.
- Updates to the transformer_model in src/translation_french_english.py with reduced hyperparameter values and dropout addition.
- Introduction of src/modules/optuna_transformer.py for Optuna hyperparameter optimization and workflow updates in .github/workflows/test.yaml.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/test_transformer_model.py | Added tests to validate model build, training, evaluation, BLEU score computation, and loading. |
| src/translation_french_english.py | Modified model hyperparameters and added dropout in the Transformer architecture. |
| src/modules/optuna_transformer.py | Created an Optuna optimization module for hyperparameter tuning of the Transformer model. |
| .github/workflows/test.yaml | Updated workflow file to include additional file paths for triggering tests. |
Comments suppressed due to low confidence (2)
.github/workflows/test.yaml:10
- [nitpick] The file path 'src/modules/transformer_component.py' may be a typo; consider confirming if the intended filename is 'transformer_components.py' to ensure the workflow triggers correctly.
- "src/modules/transformer_component.py"
src/translation_french_english.py:59
- [nitpick] The reduced embedding dimensions, dense layer size, and number of heads significantly lower the model's capacity; please verify that these changes are intentional and sufficient for achieving the desired performance.
embed_dim = 64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces new features and tests for tuning Transformer hyperparameters, updates the Transformer model implementation, and enhances CI workflow triggers.
- Added comprehensive tests for building, training, evaluating, and loading the Transformer model.
- Updated model hyperparameters and refined model architecture in the translation module.
- Introduced a new module for hyperparameter tuning using Optuna and updated GitHub Actions paths for testing.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/test_transformer_model.py | Added unit tests for building, training, evaluation, and BLEU scoring of the model. |
| src/translation_french_english.py | Updated Transformer model parameters and architecture for improved tuning. |
| src/modules/optuna_transformer.py | Introduces hyperparameter optimization using Optuna for the Transformer model. |
| .github/workflows/test.yaml | Updated workflow paths to include new and modified module files for testing. |
Comments suppressed due to low confidence (1)
tests/test_transformer_model.py:99
- [nitpick] Consider using explicit metric names or indices rather than relying on the number of elements in 'results', which would make the test more robust if additional metrics are ever included.
assert len(results) == 2, "Evaluation did not return loss and accuracy."
| # Define model parameters | ||
| embed_dim = 128 | ||
| dense_dim = 2048 | ||
| num_heads = 8 | ||
| embed_dim = 64 | ||
| dense_dim = 1536 | ||
| num_heads = 2 |
Copilot
AI
May 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding a comment or updating the docstring to explain the rationale behind the updated hyperparameter values (e.g., embed_dim, dense_dim, and num_heads) for improved clarity and maintainability.
Describe your changes
Provide a clear and concise description of the changes made in this pull request. Include any relevant context or background information.
Issue ticket number and link
Type of Change
Check the type of change your pull request introduces:
Checklist before requesting a review
Before submitting your pull request, ensure the following: