Using sparse learning in practice 

Hi Tim, thanks for making this library. I am trying to test it on [speech generation models](https://github.com/coqui-ai/TTS/) and i have some questions from your code template:

1. The models come with their own schedulers and optimizers. Can i simply wrap them around with `decay = CosineDecay ... ` and `mask = Masking(optimizer, ...)`? Should i change the optimizer to follow `optim.SGD(...)` and ignore the scheduler? It looks like `mask.step()` runs every epoch and replaces the scheduler, but i think i should still keep the optimizer specific to the model i have.
2. I understand that density/sparsity is the desired % of weights to keep, while prune/death rate is an internal parameter to determine what % weights should be redistributed at each iteration. Is this correct?
3. Density looks like = sparsity in your code, although normally i would think density = 1 - sparsity.
4. Code fails at `core.py` line 221-223 when there are RNNs, because for them `bias` is a boolean and the bias terms are actually `bias_ih` and `bias_hh`.  I think this might count the parameters better:
```
for p, tensor in self.modules[0].named_parameters():
    total_size += tensor.numel()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using sparse learning in practice #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using sparse learning in practice #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions