Skip to content

The computaion of neg_reward is wrong #16

@zzxn

Description

@zzxn

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions