Skip to content

Conversation

@haeggee
Copy link
Collaborator

@haeggee haeggee commented Jun 18, 2024

  • fix a memory leak for np.memmap (see e.g. here) by constructing the object holding the memmap inside the dataloader whenever you get a new item, not just once outside. in certain cases with multiple GPUs, this has lead to overloading the (shared/virtual) memory and therefore crashes
  • other fixes
    • make val batches inside training loop deterministic by resetting the val loader each time -- this gets rid of the noise of the sampling process for evaluation, so noise only comes from the model parameters
    • fixes the final evaluation on the full validation set, which previously did not always use the same set of batches because the val loader was not reset properly. also, separate the logging for the deterministic val batches (as above) and the full val set
    • get rid of unused and outdated arxiv data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant