Load ckpt dissabled when using fast_dev_run #854

IvanMM27 · 2025-12-15T11:29:13Z

Hello,

When using fast_dev_run in trainer.fit, an error is printed since no checkpoint is created, and GraphNeT directly loads the best checkpoint after trainer.fit is completed

Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------


  | Name                 | Type       | Params | Mode  | FLOPs
--------------------------------------------------------------------
0 | _tasks               | ModuleList | 129    | train | 0    
1 | _data_representation | KNNGraph   | 0      | train | 0    
2 | backbone             | DynEdge    | 1.4 M  | train | 0    
--------------------------------------------------------------------
1.4 M     Trainable params
0         Non-trainable params
1.4 M     Total params
5.515     Total estimated model params size (MB)
36        Modules in train mode
0         Modules in eval mode
0         Total Flops
Epoch  0: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00,  0.14 batch(es)/s, lr=1e-5, val_loss=0.00255, train_loss=0.028]`Trainer.fit` stopped: `max_steps=1` reached.                                                                                                                                           
Epoch  0: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00,  0.14 batch(es)/s, lr=1e-5, val_loss=0.00255, train_loss=0.028]
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet/examples/04_training/01_train_dynedge.py", line 249, in <module>
[rank0]:     main(
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet/examples/04_training/01_train_dynedge.py", line 164, in main
[rank0]:     model.fit(
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet/src/graphnet/models/easy_model.py", line 182, in fit
[rank0]:     torch.load(
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet_dev/lib/python3.10/site-packages/torch/serialization.py", line 1425, in load
[rank0]:     with _open_file_like(f, "rb") as opened_file:
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet_dev/lib/python3.10/site-packages/torch/serialization.py", line 751, in _open_file_like
[rank0]:     return _open_file(name_or_buffer, mode)
[rank0]:   File "/data_hgx/KM3NeT/mozun/temp/graphnet_dev/lib/python3.10/site-packages/torch/serialization.py", line 732, in __init__
[rank0]:     super().__init__(open(name, mode))
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: ''

Therefore, I have implemented in easy_syntax the class argument fast_dev_run that is parsed to trainer.fit and omits the loading of the best checkpoint.

Load ckpt dissabled when using fast_dev_run

9840e08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load ckpt dissabled when using fast_dev_run #854

Load ckpt dissabled when using fast_dev_run #854

Uh oh!

IvanMM27 commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Load ckpt dissabled when using fast_dev_run #854

Are you sure you want to change the base?

Load ckpt dissabled when using fast_dev_run #854

Uh oh!

Conversation

IvanMM27 commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant