Skip to content

Conversation

@mhmills
Copy link

@mhmills mhmills commented Jul 27, 2023

As mentioned in #358, if you were to search playerid_lookup("tatis", "fernando", fuzzy=True) right now, you would get duplicate rows for Fernando Tatís Jr and Sr. This is because fuzzy=True and the search doesn't produce an exact match because the correct name is Tatís with the accented í, not Tatis. Since the Chadwick names for Tatís Jr and Sr are the same, 'Fernando Tatís' is 2/5 names in fuzzy_matches when the merge is done with the player table in get_closest_names(). Each copy of the name matches with the table data for Tatís Jr and Sr, so we get duplicates for each.

The change I made was to drop the duplicate name before the merge (making the length of fuzzy_matches 4 not 5), so now the single copy of the name can match data for both Jr and Sr. Since the one copy of the name matches data for both players, we still end up returning 5 players after the merge as expected. The same effect can be seen if you were to do a fuzzy search for Vladimir Guerrero Jr and Sr, such as playerid_lookup("guerrero", "vladimi", fuzzy=True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant