Skip to content

Conversation

@hmaarrfk
Copy link

@hmaarrfk hmaarrfk commented Dec 18, 2025

Closes #3627

I ended up moving around my suggestion so that it could be more targetted to the one usecase I care about.

LMK what you think.

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Dec 18, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Dec 18, 2025
@hmaarrfk
Copy link
Author

Ok you are all sophisticated enough to catch my hack solution ^_^:

>           warnings.warn(
                msg,
                category=ZarrUserWarning,
                stacklevel=2,
            )
E           zarr.errors.ZarrUserWarning: Creating a zarr.buffer.gpu.Buffer with an array that does not support the __cuda_array_interface__ for zero-copy transfers, falling back to slow copy based path

@hmaarrfk hmaarrfk closed this Dec 19, 2025
@hmaarrfk
Copy link
Author

not sure i hvae time to troubleshoot this or to think of a worthy high performance solution.

@d-v-b
Copy link
Contributor

d-v-b commented Dec 19, 2025

there's no way .as_numpy_array() isn't going to break the cpu / gpu abstraction, right?

@d-v-b
Copy link
Contributor

d-v-b commented Dec 19, 2025

if there's some operation X that we need to do on cpu-backed data AND gpu-backed data, then we should look at defining X in the ndbuffer base class. Here X is "compare all elements in an NDBuffer against some scalar s", insofar as this is a basic part of zarr (checking if a chunk is fill_value) I think baking it into a core abc makes sense.

@hmaarrfk
Copy link
Author

there's no way .as_numpy_array() isn't going to break the cpu / gpu abstraction, right?

That is my understanding, you have to start to call np.asanyarray which would allow lazy arrays to go through.

I don't see a way through this without help in defining "likely_not_equal"
data-apis/array-api#985

thoughts there appreciated^^^^

If i have an MRI image, do I really need to tell you that it is exactly zero? you probably just care that it is "very likely non-zero" in this usecase, but the strict equality checks makes it hard to take any shortcuts.

@d-v-b
Copy link
Contributor

d-v-b commented Dec 19, 2025

for zarr, we really do need to check every single element in a chunk before concluding that all elements in the chunk are equal to the fill value. but if the performance of this check is a serious concern, then at least when reading data we could check if the compressed chunk is byte-wise identical to a compress fill_value chunk

@hmaarrfk
Copy link
Author

we really do need to check every single element in a chunk before concluding that all elements in the chunk are equal to the fill value

For the codepath that I describe in #3627 really???? zarr will crash if a chunk is identically 0 and saved?

@hmaarrfk
Copy link
Author

I haven't really used zarr in depth before (my fork was like 6 years out of date), but doing this change, and running the tests it seems that only a few fail, and they seem to be checking that you discard zero checks.

I don't disagree that it is "nice" to ensure that things are zero, but not strictly required.

In many cases, i would be ok with compressing some extra zeros.

Your idea of "checking that the compressed data" matches that of "compressed zeros" is fine, but again, not "strictly necessary" and adds a level of complexity that might not be required in all cases.

diff --git a/src/zarr/core/codec_pipeline.py b/src/zarr/core/codec_pipeline.py
index fd557ac4..18b96911 100644
--- a/src/zarr/core/codec_pipeline.py
+++ b/src/zarr/core/codec_pipeline.py
@@ -413,12 +413,12 @@ class BatchedCodecPipeline(CodecPipeline):
                 if chunk_array is None:
                     chunk_array_batch.append(None)  # type: ignore[unreachable]
                 else:
-                    if not chunk_spec.config.write_empty_chunks and chunk_array.all_equal(
-                        fill_value_or_default(chunk_spec)
-                    ):
-                        chunk_array_batch.append(None)
-                    else:
-                        chunk_array_batch.append(chunk_array)
+                    # if not chunk_spec.config.write_empty_chunks and chunk_array.all_equal(
+                    #     fill_value_or_default(chunk_spec)
+                    # ):
+                    #     chunk_array_batch.append(None)
+                    # else:
+                    chunk_array_batch.append(chunk_array)
 
             chunk_bytes_batch = await self.encode_batch(
                 [
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr2-0-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr2-5-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr3-0-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr3-5-False-memory] - assert 2 == 0
FAILED tests/test_codecs/test_sharding.py::test_delete_empty_shards[local] - assert <zarr.core.buffer.cpu.Buffer object at 0x76b090f2a660> is None
FAILED tests/test_codecs/test_sharding.py::test_delete_empty_shards[memory] - assert <zarr.core.buffer.cpu.Buffer object at 0x76b090e31100> is None
FAILED tests/test_examples.py::test_scripts_can_run[script_path0] - FileNotFoundError: [Errno 2] No such file or directory: 'uv'
FAILED tests/test_store/test_zip.py::TestZipStore::test_api_integration - UserWarning: Duplicate name: 'foo/c/0/0'

@d-v-b
Copy link
Contributor

d-v-b commented Dec 19, 2025

the reason why we avoid saving chunks that are identical to the fill value is to avoid unnecessary IO operations and reduce the number of stored objects. this only works for chunks that are identical to the fill value because the zarr spec instructs implementations to treat missing chunks as equal to chunks filled with fill_value. So "very likely all fill_value" and "definitely all fill_value" are different enough in this case that we check for it. But is this PR about the fill value logic?

@d-v-b d-v-b reopened this Dec 19, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Dec 19, 2025

aha didn't see that this was closed, I accidentally re-opened it. feel free to re-close it!

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.44%. Comparing base (b873691) to head (26e38d4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3628      +/-   ##
==========================================
- Coverage   60.90%   60.44%   -0.46%     
==========================================
  Files          86       86              
  Lines       10174    10175       +1     
==========================================
- Hits         6196     6150      -46     
- Misses       3978     4025      +47     
Files with missing lines Coverage Δ
src/zarr/core/codec_pipeline.py 70.24% <100.00%> (+0.14%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hmaarrfk
Copy link
Author

But is this PR about the fill value logic?

Exactly, and I don't know how to resolve the GPU usecase.

Today, i'm purely exploring what it would mean to convert some of my datasets from HDF5 to zarr.

Even if it is slow.... the conversion will complete....

@hmaarrfk
Copy link
Author

So "very likely all fill_value" and "definitely all fill_value" are different enough in this case that we check for it.

I understand and believe that in practice.

But do consider the flip side:

I can probably tell you that my images are 99.99999% not all 0. due to some noise that some scientist believe is important. is that good enough for you to skip the "bit for bit equality check?"?

For data that is "missing" i can probably tell you also that it is surely all 0s.

I think the semantics you are looking for just aren't there for the "numpy" API.

What you are trying to do is to "optimize away one chunk of data on disk", but perhaps, I want to save 100ms. Let me skip the 0 check? or give you a quick answer?

@d-v-b
Copy link
Contributor

d-v-b commented Dec 20, 2025

Let me skip the 0 check? or give you a quick answer?

you can skip this check by changing the write_empty_chunks value in the config. see

- Whether empty chunks are written to storage `array.write_empty_chunks`

@d-v-b
Copy link
Contributor

d-v-b commented Dec 20, 2025

historically checking if a chunk is empty or not has not been a serious performance bottleneck compared to reading, decompression, recompression, and writing. do you have some performance metrics from your use case that suggest otherwise?

@hmaarrfk
Copy link
Author

As i described in #3627

For lazy arrays, it eventually calls:

np.broadcast_arrays
np.array_equal

This is a problem if:

  1. Reading is slow -- think really high compression algorithm. Not zstd, but lzma for example.
  2. your underlying array implementation doesn't support these well

each might trigger a terribly slow read, on top of the read that is triggered when reading the data the final time to compress it.

@hmaarrfk
Copy link
Author

Leaving this example here since it helped me find https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YY2OXO4SGCSXYSAYVOVWTLU25PCUQJT7/

Ultimately, this "zero" check just create memory that is on the order of the entire zarr store.

It may be "creating it and deleting it very fast", but its still pretty "underwhelming".

When going from "store -> store", there needs to be a faster mechanism to shortcut this

memory benchmark script
import numpy as np
import psutil
import time
import multiprocessing
from multiprocessing import Queue

def monitor_memory(pid, queue, interval=0.01):
    proc = psutil.Process(pid)
    peak_rss = 0
    samples = []

    while True:
        try:
            rss = proc.memory_info().rss / 1024**2
            samples.append((time.time(), rss))
            if rss > peak_rss:
                peak_rss = rss
            time.sleep(interval)
        except psutil.NoSuchProcess:
            break

    queue.put(('done', peak_rss, samples))

def run_benchmark():
    a = np.full((2, 1024, 1024, 1024), fill_value=0, dtype='uint8')
    b = np.broadcast_to(
        np.full(1, fill_value=1, dtype='uint8'),
        a.shape
    )
    assert all(s == 0 for s in b.strides)
    np.array_equal(a, b)

if __name__ == '__main__':
    queue = Queue()
    child_process = multiprocessing.Process(target=run_benchmark)
    child_process.start()

    monitor_process = multiprocessing.Process(
        target=monitor_memory,
        args=(child_process.pid, queue)
    )
    monitor_process.start()

    child_process.join()
    time.sleep(0.2)
    monitor_process.terminate()
    monitor_process.join()

    status, peak_rss, samples = queue.get()

    print("\n=== Memory Monitoring Results ===")
    print(f"Peak RSS: {peak_rss:.1f} MB")
    print(f"Total samples: {len(samples)}")

    N_samples = 30
    if samples:
        print(f"\nMemory timeline (first {N_samples} and last {N_samples} samples):")
        for i, (timestamp, rss) in enumerate(samples[:N_samples]):
            print(f"  {timestamp:.2f}s: {rss:.1f} MB")
        if len(samples) > N_samples * 2:
            print("  ...")
        for i, (timestamp, rss) in enumerate(samples[-N_samples:], start=len(samples)-N_samples):
            print(f"  {timestamp:.2f}s: {rss:.1f} MB")

I have a few next steps:

  1. Learn more about your chunk_spec,
  2. See if I can engineer it.
  3. Otherwise, try the idea of "comparing he encoded array to zero"

I think that an optimization for array_equal will go a long way..... and hopefully it has positive reception at numpy upstream
The problem for me relies on the fact that

@hmaarrfk hmaarrfk closed this Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zero check in encoding less than ideal for lazy arrays

2 participants