Question about the Squaring of the Hessian Diagonal in SparseGPT

I have a question regarding the code and pseudocode in your paper "SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot".

Specifically, in line 96 of sparsegpt.py, the formula tmp = W1 ** 2 / (torch.diag(Hinv1).reshape((1, -1)) ** 2) squares the diagonal elements of the Hessian matrix. Similarly, in line 8 of the pseudocode (Algorithm 1) in the paper, the diagonal elements of the Hessian matrix are also squared.

However, in the original formula (e.g., Equation 3), the diagonal elements of the Hessian matrix are not squared. Could you kindly clarify why squaring the diagonal elements is necessary in the implementation and pseudocode?

Thank you for your time, and I appreciate the important work you've done with SparseGPT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions