Skip to content

remove N (size of corpus vocabulary) from phrase model formula #5

Open
Alexjmsherman wants to merge 1 commit intopwharrison:masterfrom
Alexjmsherman:master
Open

remove N (size of corpus vocabulary) from phrase model formula #5
Alexjmsherman wants to merge 1 commit intopwharrison:masterfrom
Alexjmsherman:master

Conversation

@Alexjmsherman
Copy link

According to the gensim documentation (https://radimrehurek.com/gensim/models/phrases.html#id2) for the models.phrases class, the formula for the phase model is from Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

In the paper, the equation does not include N (size of the corpus vocabulary) as is listed in your notebook. I updated the equation removing N and it's definition
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

FYI, I saw you present this at PyData D.C. I thought it was a great presentation and still, clearly, refer to this notebook often. Thanks for putting it together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant