Cache the result of the zero-check #3628

hmaarrfk · 2025-12-18T20:11:52Z

Closes #3627

I ended up moving around my suggestion so that it could be more targetted to the one usecase I care about.

LMK what you think.

[Description of PR]

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

Closes zarr-developers#3627

hmaarrfk · 2025-12-18T20:43:32Z

Ok you are all sophisticated enough to catch my hack solution ^_^:

>           warnings.warn(
                msg,
                category=ZarrUserWarning,
                stacklevel=2,
            )
E           zarr.errors.ZarrUserWarning: Creating a zarr.buffer.gpu.Buffer with an array that does not support the __cuda_array_interface__ for zero-copy transfers, falling back to slow copy based path

hmaarrfk · 2025-12-19T16:03:29Z

not sure i hvae time to troubleshoot this or to think of a worthy high performance solution.

d-v-b · 2025-12-19T16:08:39Z

there's no way .as_numpy_array() isn't going to break the cpu / gpu abstraction, right?

d-v-b · 2025-12-19T16:12:16Z

if there's some operation X that we need to do on cpu-backed data AND gpu-backed data, then we should look at defining X in the ndbuffer base class. Here X is "compare all elements in an NDBuffer against some scalar s", insofar as this is a basic part of zarr (checking if a chunk is fill_value) I think baking it into a core abc makes sense.

hmaarrfk · 2025-12-19T16:27:38Z

there's no way .as_numpy_array() isn't going to break the cpu / gpu abstraction, right?

That is my understanding, you have to start to call np.asanyarray which would allow lazy arrays to go through.

I don't see a way through this without help in defining "likely_not_equal"
data-apis/array-api#985

thoughts there appreciated^^^^

If i have an MRI image, do I really need to tell you that it is exactly zero? you probably just care that it is "very likely non-zero" in this usecase, but the strict equality checks makes it hard to take any shortcuts.

d-v-b · 2025-12-19T16:32:16Z

for zarr, we really do need to check every single element in a chunk before concluding that all elements in the chunk are equal to the fill value. but if the performance of this check is a serious concern, then at least when reading data we could check if the compressed chunk is byte-wise identical to a compress fill_value chunk

hmaarrfk · 2025-12-19T16:35:37Z

we really do need to check every single element in a chunk before concluding that all elements in the chunk are equal to the fill value

For the codepath that I describe in #3627 really???? zarr will crash if a chunk is identically 0 and saved?

hmaarrfk · 2025-12-19T16:59:01Z

I haven't really used zarr in depth before (my fork was like 6 years out of date), but doing this change, and running the tests it seems that only a few fail, and they seem to be checking that you discard zero checks.

I don't disagree that it is "nice" to ensure that things are zero, but not strictly required.

In many cases, i would be ok with compressing some extra zeros.

Your idea of "checking that the compressed data" matches that of "compressed zeros" is fine, but again, not "strictly necessary" and adds a level of complexity that might not be required in all cases.

diff --git a/src/zarr/core/codec_pipeline.py b/src/zarr/core/codec_pipeline.py
index fd557ac4..18b96911 100644
--- a/src/zarr/core/codec_pipeline.py
+++ b/src/zarr/core/codec_pipeline.py
@@ -413,12 +413,12 @@ class BatchedCodecPipeline(CodecPipeline):
                 if chunk_array is None:
                     chunk_array_batch.append(None)  # type: ignore[unreachable]
                 else:
-                    if not chunk_spec.config.write_empty_chunks and chunk_array.all_equal(
-                        fill_value_or_default(chunk_spec)
-                    ):
-                        chunk_array_batch.append(None)
-                    else:
-                        chunk_array_batch.append(chunk_array)
+                    # if not chunk_spec.config.write_empty_chunks and chunk_array.all_equal(
+                    #     fill_value_or_default(chunk_spec)
+                    # ):
+                    #     chunk_array_batch.append(None)
+                    # else:
+                    chunk_array_batch.append(chunk_array)
 
             chunk_bytes_batch = await self.encode_batch(
                 [

FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr2-0-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr2-5-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr3-0-False-memory] - assert 2 == 0
FAILED tests/test_array.py::test_write_empty_chunks_behavior[zarr3-5-False-memory] - assert 2 == 0
FAILED tests/test_codecs/test_sharding.py::test_delete_empty_shards[local] - assert <zarr.core.buffer.cpu.Buffer object at 0x76b090f2a660> is None
FAILED tests/test_codecs/test_sharding.py::test_delete_empty_shards[memory] - assert <zarr.core.buffer.cpu.Buffer object at 0x76b090e31100> is None
FAILED tests/test_examples.py::test_scripts_can_run[script_path0] - FileNotFoundError: [Errno 2] No such file or directory: 'uv'
FAILED tests/test_store/test_zip.py::TestZipStore::test_api_integration - UserWarning: Duplicate name: 'foo/c/0/0'

d-v-b · 2025-12-19T18:44:19Z

the reason why we avoid saving chunks that are identical to the fill value is to avoid unnecessary IO operations and reduce the number of stored objects. this only works for chunks that are identical to the fill value because the zarr spec instructs implementations to treat missing chunks as equal to chunks filled with fill_value. So "very likely all fill_value" and "definitely all fill_value" are different enough in this case that we check for it. But is this PR about the fill value logic?

d-v-b · 2025-12-19T18:45:18Z

aha didn't see that this was closed, I accidentally re-opened it. feel free to re-close it!

codecov · 2025-12-19T18:48:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.44%. Comparing base (b873691) to head (26e38d4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3628      +/-   ##
==========================================
- Coverage   60.90%   60.44%   -0.46%     
==========================================
  Files          86       86              
  Lines       10174    10175       +1     
==========================================
- Hits         6196     6150      -46     
- Misses       3978     4025      +47

Files with missing lines	Coverage Δ
src/zarr/core/codec_pipeline.py	`70.24% <100.00%> (+0.14%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hmaarrfk · 2025-12-19T23:54:02Z

But is this PR about the fill value logic?

Exactly, and I don't know how to resolve the GPU usecase.

Today, i'm purely exploring what it would mean to convert some of my datasets from HDF5 to zarr.

Even if it is slow.... the conversion will complete....

hmaarrfk · 2025-12-19T23:56:27Z

So "very likely all fill_value" and "definitely all fill_value" are different enough in this case that we check for it.

I understand and believe that in practice.

But do consider the flip side:

I can probably tell you that my images are 99.99999% not all 0. due to some noise that some scientist believe is important. is that good enough for you to skip the "bit for bit equality check?"?

For data that is "missing" i can probably tell you also that it is surely all 0s.

I think the semantics you are looking for just aren't there for the "numpy" API.

What you are trying to do is to "optimize away one chunk of data on disk", but perhaps, I want to save 100ms. Let me skip the 0 check? or give you a quick answer?

d-v-b · 2025-12-20T08:22:32Z

Let me skip the 0 check? or give you a quick answer?

you can skip this check by changing the write_empty_chunks value in the config. see

zarr-python/docs/user-guide/config.md

Line 32 in b873691

- Whether empty chunks are written to storage `array.write_empty_chunks`

d-v-b · 2025-12-20T08:24:35Z

historically checking if a chunk is empty or not has not been a serious performance bottleneck compared to reading, decompression, recompression, and writing. do you have some performance metrics from your use case that suggest otherwise?

hmaarrfk · 2025-12-20T13:02:28Z

As i described in #3627

For lazy arrays, it eventually calls:

np.broadcast_arrays
np.array_equal

This is a problem if:

Reading is slow -- think really high compression algorithm. Not zstd, but lzma for example.
your underlying array implementation doesn't support these well

each might trigger a terribly slow read, on top of the read that is triggered when reading the data the final time to compress it.

hmaarrfk · 2025-12-20T15:27:43Z

Leaving this example here since it helped me find https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YY2OXO4SGCSXYSAYVOVWTLU25PCUQJT7/

Ultimately, this "zero" check just create memory that is on the order of the entire zarr store.

It may be "creating it and deleting it very fast", but its still pretty "underwhelming".

When going from "store -> store", there needs to be a faster mechanism to shortcut this

memory benchmark script

import numpy as np
import psutil
import time
import multiprocessing
from multiprocessing import Queue

def monitor_memory(pid, queue, interval=0.01):
    proc = psutil.Process(pid)
    peak_rss = 0
    samples = []

    while True:
        try:
            rss = proc.memory_info().rss / 1024**2
            samples.append((time.time(), rss))
            if rss > peak_rss:
                peak_rss = rss
            time.sleep(interval)
        except psutil.NoSuchProcess:
            break

    queue.put(('done', peak_rss, samples))

def run_benchmark():
    a = np.full((2, 1024, 1024, 1024), fill_value=0, dtype='uint8')
    b = np.broadcast_to(
        np.full(1, fill_value=1, dtype='uint8'),
        a.shape
    )
    assert all(s == 0 for s in b.strides)
    np.array_equal(a, b)

if __name__ == '__main__':
    queue = Queue()
    child_process = multiprocessing.Process(target=run_benchmark)
    child_process.start()

    monitor_process = multiprocessing.Process(
        target=monitor_memory,
        args=(child_process.pid, queue)
    )
    monitor_process.start()

    child_process.join()
    time.sleep(0.2)
    monitor_process.terminate()
    monitor_process.join()

    status, peak_rss, samples = queue.get()

    print("\n=== Memory Monitoring Results ===")
    print(f"Peak RSS: {peak_rss:.1f} MB")
    print(f"Total samples: {len(samples)}")

    N_samples = 30
    if samples:
        print(f"\nMemory timeline (first {N_samples} and last {N_samples} samples):")
        for i, (timestamp, rss) in enumerate(samples[:N_samples]):
            print(f"  {timestamp:.2f}s: {rss:.1f} MB")
        if len(samples) > N_samples * 2:
            print("  ...")
        for i, (timestamp, rss) in enumerate(samples[-N_samples:], start=len(samples)-N_samples):
            print(f"  {timestamp:.2f}s: {rss:.1f} MB")

I have a few next steps:

Learn more about your chunk_spec,
See if I can engineer it.
Otherwise, try the idea of "comparing he encoded array to zero"

I think that an optimization for array_equal will go a long way..... and hopefully it has positive reception at numpy upstream
The problem for me relies on the fact that

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Dec 18, 2025

hmaarrfk force-pushed the fast_zero_check branch from ccf00ea to a35f878 Compare December 18, 2025 20:13

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Dec 18, 2025

Cache the result of the zero-check

26e38d4

Closes zarr-developers#3627

hmaarrfk force-pushed the fast_zero_check branch from a35f878 to 26e38d4 Compare December 18, 2025 20:18

hmaarrfk closed this Dec 19, 2025

d-v-b reopened this Dec 19, 2025

hmaarrfk closed this Dec 20, 2025

Uh oh!

Cache the result of the zero-check #3628

Cache the result of the zero-check #3628

Conversation

hmaarrfk commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmaarrfk commented Dec 18, 2025

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

codecov bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

hmaarrfk commented Dec 19, 2025

Uh oh!

d-v-b commented Dec 20, 2025

Uh oh!

d-v-b commented Dec 20, 2025

Uh oh!

hmaarrfk commented Dec 20, 2025

Uh oh!

hmaarrfk commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmaarrfk commented Dec 18, 2025 •

edited

Loading

d-v-b commented Dec 19, 2025 •

edited

Loading

codecov bot commented Dec 19, 2025 •

edited

Loading