Skip to content

Using sparse learning in practice  #24

@iamanigeeit

Description

@iamanigeeit

Hi Tim, thanks for making this library. I am trying to test it on speech generation models and i have some questions from your code template:

  1. The models come with their own schedulers and optimizers. Can i simply wrap them around with decay = CosineDecay ... and mask = Masking(optimizer, ...)? Should i change the optimizer to follow optim.SGD(...) and ignore the scheduler? It looks like mask.step() runs every epoch and replaces the scheduler, but i think i should still keep the optimizer specific to the model i have.
  2. I understand that density/sparsity is the desired % of weights to keep, while prune/death rate is an internal parameter to determine what % weights should be redistributed at each iteration. Is this correct?
  3. Density looks like = sparsity in your code, although normally i would think density = 1 - sparsity.
  4. Code fails at core.py line 221-223 when there are RNNs, because for them bias is a boolean and the bias terms are actually bias_ih and bias_hh. I think this might count the parameters better:
for p, tensor in self.modules[0].named_parameters():
    total_size += tensor.numel()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions