Skip to content

Fix to #186#188

Open
jhaberbe wants to merge 1 commit intotrevismd:masterfrom
jhaberbe:master
Open

Fix to #186#188
jhaberbe wants to merge 1 commit intotrevismd:masterfrom
jhaberbe:master

Conversation

@jhaberbe
Copy link

@jhaberbe jhaberbe commented Feb 8, 2026

I sent a fix to this. There is a question I would ask, maybe whoever is reviewing this, to ask what behavior they'd prefer.

I made the function use pd.concat and joins on index, because it preserves more data in more cases. Suppose we had comparisons between groups (1, 2, 3), where the specimens in each look like:

1: {A, B, C, D}
2: {A, B, C}
3: {A, B, D}

Then doing pivot, depending on if you decide to drop comparisons where there were NAs, would lead to a bunch of data being lost for no reason, which we can keep if we set a data index (with multiple values) in the case of paired data, and no paired data in other cases.

Just blindly doing it has some problems:
1: If you potentially have a single draw of a sample in one group, with multiple draws in another group, then this will duplicate it once. Just in this case, I could see someone seeing it as a bug or a feature.
2: If you have multiple draws of a sample in both groups, then it would produce n * m comparisons, which is just awful.

Right now, I've put up a drop_duplicates, which just blows past the problem. It might be better to yell at the user that this could be a problem, in which case we can just make a function that checks for duplicates at the beginning that looks like:

if data.value_counts([paired, hue]).max() > 1:
     print("Duplicate samples within groups, consider update the names so they're unique.")

Here is the main code that solves the problem, I ran the unit-tests and it came with a single error, but looking at it I don't think its connected? I'll follow up on this, and I've posted the message below.

    def _get_results(self, num_comparisons, pvalues=None,
                     **stats_params) -> List[Annotation]:

        test_result_list = []

        if num_comparisons == "auto":
            num_comparisons = len(self._struct_pairs)

        for group_struct1, group_struct2 in self._struct_pairs:
            group1 = group_struct1['group']
            group2 = group_struct2['group']

           # Changes here
            if self.paired: 
                group_data1 = group_struct1["group_data"]
                group_data2 = group_struct2["group_data"]
                table = pd.merge(
                    group_data1.reset_index(),
                    group_data2.reset_index(),
                    on=group_data1.index.name
                ).drop_duplicates(g1.index.name)\
                .set_index(g1.index.name)

                group_struct1["group_data"] = table.iloc[:, 0]
                group_struct2["group_data"] = table.iloc[:, 1]
               # End of changes

            if self.perform_stat_test:
                result = self._get_stat_result_from_test(
                    group_struct1, group_struct2, num_comparisons,
                    **stats_params)
            else:
                result = self._get_custom_results(group_struct1, pvalues)

            result.group1 = group1
            result.group2 = group2

            test_result_list.append(Annotation((group_struct1, group_struct2),
                                               result,
                                               formatter=self.pvalue_format))

        # Perform other types of correction methods for multiple testing
        self._apply_comparisons_correction(test_result_list)

        return test_result_list
.........................................
======================================================================
FAIL: test_seaborn_violinplot (test_positions.TestPositionPlotter.test_seaborn_violinplot)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/statannotations/tests/test_positions.py", line 44, in test_seaborn_violinplot
    assert value_maxes[("Sun", "Male")] > 0.80
AssertionError

----------------------------------------------------------------------
Ran 178 tests in 31.370s

FAILED (failures=1)

Sorry if this is a bit of a long response.

Added pandas to loaded packages, then made it so iterated data would be paired on index if paired.
@jhaberbe
Copy link
Author

jhaberbe commented Feb 8, 2026

I am looking at the specific test on my computer, I don't think there is a problem. It looks like the previous commit set 0.8, because that was what it returned initially for that person, and maybe someone else made a change that technically broke it, but the plot looks perfectly fine.

The value that it returns now is 0.7399322328869139, and looking at the plot I don't see any reason that this is bad.

I'll write a test suite to add to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant