Skip to content

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

@xiejingcheng

Description

@xiejingcheng

I have a question regarding the code and pseudocode in your paper "SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot".

Specifically, in line 96 of sparsegpt.py, the formula tmp = W1 ** 2 / (torch.diag(Hinv1).reshape((1, -1)) ** 2) squares the diagonal elements of the Hessian matrix. Similarly, in line 8 of the pseudocode (Algorithm 1) in the paper, the diagonal elements of the Hessian matrix are also squared.

However, in the original formula (e.g., Equation 3), the diagonal elements of the Hessian matrix are not squared. Could you kindly clarify why squaring the diagonal elements is necessary in the implementation and pseudocode?

Thank you for your time, and I appreciate the important work you've done with SparseGPT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions