Open
Conversation
Added pandas to loaded packages, then made it so iterated data would be paired on index if paired.
Author
|
I am looking at the specific test on my computer, I don't think there is a problem. It looks like the previous commit set 0.8, because that was what it returned initially for that person, and maybe someone else made a change that technically broke it, but the plot looks perfectly fine. The value that it returns now is 0.7399322328869139, and looking at the plot I don't see any reason that this is bad. I'll write a test suite to add to this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I sent a fix to this. There is a question I would ask, maybe whoever is reviewing this, to ask what behavior they'd prefer.
I made the function use pd.concat and joins on index, because it preserves more data in more cases. Suppose we had comparisons between groups (1, 2, 3), where the specimens in each look like:
1: {A, B, C, D}
2: {A, B, C}
3: {A, B, D}
Then doing pivot, depending on if you decide to drop comparisons where there were NAs, would lead to a bunch of data being lost for no reason, which we can keep if we set a data index (with multiple values) in the case of paired data, and no paired data in other cases.
Just blindly doing it has some problems:
1: If you potentially have a single draw of a sample in one group, with multiple draws in another group, then this will duplicate it once. Just in this case, I could see someone seeing it as a bug or a feature.
2: If you have multiple draws of a sample in both groups, then it would produce n * m comparisons, which is just awful.
Right now, I've put up a drop_duplicates, which just blows past the problem. It might be better to yell at the user that this could be a problem, in which case we can just make a function that checks for duplicates at the beginning that looks like:
Here is the main code that solves the problem, I ran the unit-tests and it came with a single error, but looking at it I don't think its connected? I'll follow up on this, and I've posted the message below.
Sorry if this is a bit of a long response.