Skip to content

prepare_classifier.py fails to download Figshare data (ZeroDivisionError / invalid zip) #102

@erussell92

Description

@erussell92

Bug encountered in prepare_classifier.py

Description of Issue:

When running prepare_classifier.py using the Docker-based installation (a fresh install), the automatic download of test data, models, and parameters from Figshare fails. After some digging, this appears to be due to Figshare no longer reliably serving content to non-browser clients (e.g., Python urllib), even when a valid private_link is provided. For me, this resulted in either a crash during download progress reporting or an invalid (zero-byte) file being downloaded and subsequently failing to unzip.

I was able to find a solution below

Specifics of my environment:

OS: Windows 11
Installation method: Docker container
Python version (inside container): 3.9
MELD Graph version: 2.2.4
Docker image: meldproject/meld_graph:latest

Steps to reproduce

Install MELD Graph using the official Docker image (meldproject/meld_graph:latest)

Run: docker compose run meld_graph python scripts/new_patient_pipeline/prepare_classifier.py

The script attempts to download test data from Figshare

Errors encountered

One of these two errors occur:

Error 1:

ZeroDivisionError: division by zero

This occurs in download_data.py when calculating download progress:

percent = int(count * blockSize * 100 / totalSize)

where the issue results from totalSize == 0.

I overrode this as a safeguard against the division by zero and encountered the subsequent:

Error 2 (after guarding against division by zero):

To investigate further, I tried manually downloading inside the container:

python - << 'EOF'
import urllib.request, tempfile, os

url = "https://figshare.com/ndownloader/files/53523443"
with tempfile.NamedTemporaryFile(delete=False) as f:
    fname = f.name

urllib.request.urlretrieve(url, fname)
print("Saved to:", fname)

with open(fname, "rb") as f:
    print(f.read(200))
EOF

and was able to determine that the downloaded file was zero bytes, instead of a ZIP archive

After digging around, I found a number of people have had similar issues with Figshare’s ndownloader not reliably returning file content to urllib.request.urlretrieve, even when the private_link token is valid. And that this seems to be a recent issue that seems to be unresolved as of Jan 2026.

Workaround / Solution

I was able to manually download the zip files for all three via firefox using the full private-link URLs and extracted them into the meld_data folder I created during configuration. (Note it did not work for me without the full private link. I just got InsufficientPermissions error otherwise).

Here are the URLs that worked for me:
• Test data: https://figshare.com/ndownloader/files/53523443?private_link=413bc45083e67c7e7a11
• MELD parameters: https://figshare.com/ndownloader/files/46176921?private_link=34b4a30c57a328a1e111
• Models: https://figshare.com/ndownloader/files/46176927?private_link=7f983b7321bba527ffef

After extracting these ZIP files into the MELD_data folder created during configuration, prepare_classifier.py ran successfully and skipped the download steps.

I just wanted to bring attention to this just in case anyone else was having the same issue as me!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions