* Set higher layers (inference) a higher learning rate and lower layers (BERT representation) a lower learning rate * ...