I've heard that this can help performance, maybe it helps us here? Basically, do three runs on the pretext task, take the three sets of weights and average them. I need to find the paper this is from to make sure I'm understanding it correctly.