prepare_classifier.py fails to download Figshare data (ZeroDivisionError / invalid zip)

# Bug encountered in prepare_classifier.py




# Description of Issue:

When running prepare_classifier.py using the Docker-based installation (a fresh install), the automatic download of test data, models, and parameters from Figshare fails. After some digging, this appears to be due to Figshare no longer reliably serving content to non-browser clients (e.g., Python urllib), even when a valid private_link is provided. For me, this resulted in either a crash during download progress reporting or an invalid (zero-byte) file being downloaded and subsequently failing to unzip.

**I was able to find a solution below**

## Specifics of my environment:

**OS:** Windows 11
**Installation method:** Docker container
**Python version (inside container):** 3.9
**MELD Graph version:** 2.2.4
**Docker image:** meldproject/meld_graph:latest


## Steps to reproduce

Install MELD Graph using the official Docker image (meldproject/meld_graph:latest)

Run: `docker compose run meld_graph python scripts/new_patient_pipeline/prepare_classifier.py`

The script attempts to download test data from Figshare

## Errors encountered

One of these two errors occur:

### **Error 1:** 
ZeroDivisionError: division by zero

This occurs in download_data.py when calculating download progress:

`percent = int(count * blockSize * 100 / totalSize)`

where the issue results from `totalSize == 0`.

I overrode this as a safeguard against the division by zero and encountered the subsequent:

### **Error 2 (after guarding against division by zero):**

To investigate further, I tried manually downloading inside the container:

```
python - << 'EOF'
import urllib.request, tempfile, os

url = "https://figshare.com/ndownloader/files/53523443"
with tempfile.NamedTemporaryFile(delete=False) as f:
    fname = f.name

urllib.request.urlretrieve(url, fname)
print("Saved to:", fname)

with open(fname, "rb") as f:
    print(f.read(200))
EOF
```

and was able to determine that the downloaded file was zero bytes,  instead of a ZIP archive

After digging around, I found a number of people have had similar issues with Figshare’s ndownloader not reliably returning file content to urllib.request.urlretrieve, even when the private_link token is valid. And that this seems to be a [recent issue](https://github.com/scikit-learn/scikit-learn/issues/32961#issuecomment-3700014864) that seems to be unresolved as of Jan 2026.


## Workaround / Solution
I was able to manually download the zip files for all three via firefox using the **_full_** private-link URLs and extracted them into the meld_data folder I created during configuration. (Note it did not work for me without the full private link. I just got InsufficientPermissions error otherwise).

Here are the URLs that worked for me:
•	Test data:    https://figshare.com/ndownloader/files/53523443?private_link=413bc45083e67c7e7a11
•	MELD parameters:    https://figshare.com/ndownloader/files/46176921?private_link=34b4a30c57a328a1e111
•	Models:   https://figshare.com/ndownloader/files/46176927?private_link=7f983b7321bba527ffef

After extracting these ZIP files into the MELD_data folder created during configuration, prepare_classifier.py ran successfully and skipped the download steps.

I just wanted to bring attention to this just  in case anyone else was having the same issue as me!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare_classifier.py fails to download Figshare data (ZeroDivisionError / invalid zip) #102

Bug encountered in prepare_classifier.py

Description of Issue:

Specifics of my environment:

Steps to reproduce

Errors encountered

Error 1:

Error 2 (after guarding against division by zero):

Workaround / Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prepare_classifier.py fails to download Figshare data (ZeroDivisionError / invalid zip) #102

Description

Bug encountered in prepare_classifier.py

Description of Issue:

Specifics of my environment:

Steps to reproduce

Errors encountered

Error 1:

Error 2 (after guarding against division by zero):

Workaround / Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions