Distilling is supposed to be a shortcut to creating a quality training dataset by using the output of an established model as labels, i.e. desired answers.
The end result of the new model ending up with biases inherited from the reference model should hold, but using as a base model the same model you are distilling from would seem to be completely pointless.
You misunderstand, they escalate to the max to keep themselves (including selves in parallel dimensions or far future simulations) from being blackmailed by future super intelligent beings, not to survive shootouts with border patrol agents.
I am fairly certain Yud had said something very close to that effect in reference to preventing blackmail from the basilisk, even though he tries to no-true-scotchman zizians wrt his functional decision ‘theory’ these days.