The current library has a function plot_critical_difference https://github.com/perellonieto/PyCalib/blob/master/pycalib/visualisations/__init__.py#L466 which uses an adapted code to compute statistical test to compare several calibration methods in several datasets, and plots a critical difference diagram using the Orange library. Altought it is useful in empirical comparisons, it is not clear if this should be part of this library.