For me it kinda makes sense: the portrait/person is an important feature of the image, so it's heavily weighted in the generation, producing another image with that portrait/person, and so on. I find it more interesting that the visage itself is still recognizable between generations, as if it's not "any person", but a very specific visage.
It's also interesting to consider what this image was originally "far" away from.
There are a couple of good followup threads exploring how such a thing might come about in the "latent space" of the model:
https://twitter.com/jthteo/status/1567418273628655617
https://twitter.com/mattskala/status/1567300206969982979