Remove `*_step_end` from `LitModular` by dxoigmn · Pull Request #170 · IntelLabs/MART

dxoigmn · 2023-06-14T00:21:24Z

What does this PR do?

This PR merges *_step_end into *_step in LitModular. This means we no longer need to clear outputs.

This PR depends upon the following:

Let loss/preds/target output keys be configurable in LitModular #169

Type of change

Please check all relevant options.

Improvement (non-breaking)
Bug fix (non-breaking)
New feature (non-breaking)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

pytest
CUDA_VISIBLE_DEVICES=0 python -m mart experiment=CIFAR10_CNN_Adv trainer=gpu trainer.precision=16 reports 70% (21 sec/epoch).
CUDA_VISIBLE_DEVICES=0,1 python -m mart experiment=CIFAR10_CNN_Adv trainer=ddp trainer.precision=16 trainer.devices=2 model.optimizer.lr=0.2 trainer.max_steps=2925 datamodule.ims_per_batch=256 datamodule.world_size=2 reports 70% (14 sec/epoch).

Before submitting

The title is self-explanatory and the description concisely explains the PR
My PR does only one thing, instead of bundling different changes together
I list all the breaking changes introduced by this pull request
I have commented my code
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have run pre-commit hooks with pre-commit run -a command without errors

Did you have fun?

Make sure you had fun coding 🙃

mzweilin

I also introduced this change in #157 to work with newer PL.

Have you confirmed that there's no memory leak issue with the current PL version?

dxoigmn · 2023-06-22T18:23:40Z

I also introduced this change in #157 to work with newer PL.

Have you confirmed that there's no memory leak issue with the current PL version?

How did you confirm? I watched memory usage and it seemed fine. I also root caused (by reading internal PL code) why outputs.clear() was necessary in the old code. That is because PL is basically holding whatever training_step returns for training_epoch_end. This wasn't a problem for training_step (because we only returned a loss tensor) but was a problem for validation_step because we returned outputs there. Now we just return None and let metrics and log handle everything in their respective step functions (instead of using step_end).

dxoigmn · 2023-06-22T18:38:05Z

Note that RobustBench tests are failing because of some model issue unrelated to these changes...

mzweilin

LGTM.

pytest passes locally. We may need to change the RobustBench test in a separate PR to avoid test failure in CI.

dxoigmn added 3 commits June 13, 2023 14:12

Make metric logging keys configurable

c4e0d78

cleanup

508798c

Remove *_step_end

fc770e8

dxoigmn marked this pull request as ready for review June 14, 2023 00:31

dxoigmn requested a review from mzweilin June 14, 2023 00:31

dxoigmn mentioned this pull request Jun 14, 2023

Make SequentialDict return outputs from all modules as DotDict #171

Merged

18 tasks

Base automatically changed from better_litmodular2 to main June 22, 2023 18:13

mzweilin reviewed Jun 22, 2023

View reviewed changes

Merge branch 'main' into better_litmodular3

54ca2cb

dxoigmn requested a review from mzweilin June 22, 2023 18:28

mzweilin approved these changes Jun 22, 2023

View reviewed changes

dxoigmn merged commit 1b55378 into main Jun 22, 2023

dxoigmn deleted the better_litmodular3 branch June 22, 2023 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `*_step_end` from `LitModular`#170

Remove `*_step_end` from `LitModular`#170
dxoigmn merged 4 commits intomainfrom
better_litmodular3

dxoigmn commented Jun 14, 2023 •

edited

Loading

Uh oh!

mzweilin left a comment

Uh oh!

dxoigmn commented Jun 22, 2023 •

edited

Loading

Uh oh!

dxoigmn commented Jun 22, 2023

Uh oh!

mzweilin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dxoigmn commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Testing

Before submitting

Did you have fun?

Uh oh!

mzweilin left a comment

Choose a reason for hiding this comment

Uh oh!

dxoigmn commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxoigmn commented Jun 22, 2023

Uh oh!

mzweilin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dxoigmn commented Jun 14, 2023 •

edited

Loading

dxoigmn commented Jun 22, 2023 •

edited

Loading