Skip to content

Conversation

@boxin-wbx
Copy link

No description provided.

Copy link
Owner

@timoschick timoschick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request 👍
Is there a particular reason for not actually replacing calls to _apply_decay_mask with calls to the new _apply_decay_mask_logits (and deleting the former function altogether)?

@boxin-wbx
Copy link
Author

Thanks for the comment.

I think we can keep both versions. If there is no numerical instability issue, we can of course use _apply_decay_mask. Also, if you have time, you can run both versions and have a comparison between these two implementations. I suppose these two versions should not differ much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants