Skip to content

Conversation

@valeriupredoi
Copy link
Collaborator

@valeriupredoi valeriupredoi commented Dec 16, 2025

A bit of a lengthy one, but in a nutshell:

  • eg a stat mean is not a property anymore, but a method, so active.mean[...] becomes active.mean()[...] so we can pass args and kwargs, so now you can active.mean(axis=(0, 1))[...]
  • add plenty of testing for Reductionist's new axis - which currently doesn't work as expected, see below

Main test case for Reductionist with axis

https://github.com/NCAS-CMS/PyActiveStorage/blob/axis_api/tests/test_real_s3_with_axes.py

  • Active loads a 4dim dataset
  • Loaded dataset <HDF5 dataset "m01s30i111": shape (120, 85, 324, 432), type "float32">
  • default axis arg (when axis=None): 'axis': (0, 1, 2, 3)

Test Case 1

def test_no_axis_2():
    """
    Fails: it should pass: 'axis': (0, 1, 2, 3) default
    are fine!

    activestorage.reductionist.ReductionistError: Reductionist error: HTTP 400: {"error": {"message": "request data is not valid", "caused_by": ["__all__: Validation error: Number of reduction axes must be less than length of shape - to reduce over all axes omit the axis field completely [{}]"]}}
    """
    active = build_active()
    result = active.min(axis=())[:]
    assert result == [[[[164.8125]]]]

Test Case 2

def test_axis_0_1():
    """Fails: activestorage.reductionist.ReductionistError: Reductionist error: HTTP 502: -"""
    active = build_active()
    result = active.min(axis=(0, 1))[:]
    assert result == [[[[164.8125]]]]

Test Case 3

def test_axis_0_1_2():
    """Passes fine."""
    active = build_active()
    result = active.min(axis=(0, 1, 2))[:]
    assert result[0][0][0][0] == 171.05126953125

These fails are here https://github.com/NCAS-CMS/PyActiveStorage/actions/runs/20272446127/job/58211728980?pr=300

@valeriupredoi valeriupredoi added the enhancement New feature or request label Dec 16, 2025
@valeriupredoi
Copy link
Collaborator Author

valeriupredoi commented Jan 21, 2026

@maxstack many thanks for looking into this! I think I found the issue at hand - in the current Reductionist, the response is a dict that has a "bytes" key eg:

Reduction result:  {'byte-order': 'little', 'bytes': [112, 4, 46, 67], 'count': [11897280], 'dtype': 'float32', 'shape': []}
Reduction result size:  184

but that value comes in as raw bytes and needs to be decoded at end pount by the Client; this explains a few things:

  • unit test failure
  • the 503 and 504 we see when attempting to run with axis

You can see the 503 from the failed test:

tests/test_real_s3_with_axes.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
activestorage/active.py:309: in __getitem__
    return self._get_selection(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
activestorage/active.py:440: in _get_selection
    return self._from_storage(ds, indexer, array._chunks, out_shape, dtype,
activestorage/active.py:530: in _from_storage
    result, count, out_selection = future.result()
                                   ^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/_base.py:443: in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/_base.py:395: in __get_result
    raise self._exception
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/thread.py:86: in run
    result = ctx.run(self.task)
             ^^^^^^^^^^^^^^^^^^
../../../miniconda3/envs/activestorage/lib/python3.14/concurrent/futures/thread.py:73: in run
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
activestorage/active.py:691: in _process_chunk
    tmp, count = reductionist.reduce_chunk(
activestorage/reductionist.py:101: in reduce_chunk
    decode_and_raise_error(response)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

response = <Response [503]>

    def decode_and_raise_error(response):
        """Decode an error response and raise ReductionistError."""
        try:
            error = json.dumps(response.json())
            raise ReductionistError(response.status_code, error)
        except requests.exceptions.JSONDecodeError as exc:
>           raise ReductionistError(response.status_code, "-") from exc
E           activestorage.reductionist.ReductionistError: Reductionist error: HTTP 503: -

activestorage/reductionist.py:273: ReductionistError

-> that's a json decoder error (which also, incidentally, completely destroyed the memory on my local machine). We should not get Reductionist to return raw bytes, we need actual data that can not risk corruption and the Client being unable to decode it and use. Is this something doable? Cheers 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants