Gotcha. I'm happy to do the trace as it likely would be fruitful for me.
Do you have a link to a specific post you're thinking of? It's likely going to be a Tishby-like (the classic paper from 2015 {with much more work going back into the early aughts, just outside of the NN regime IIRC}: https://arxiv.org/abs/1503.02406) lineage, but I'm happy to look to see if it's novel.
I originally thought the PAIR article was another presentation by the same authors, but upon closer reading, I think they just independently discovered similar results. Though the PAIR article quotes Progress measures for grokking via mechanistic interpretability, the Arxiv paper by the authors of the alignmentforum article.
(In researching this I found another paper about grokking finding similar results a few months earlier; again, I suspect these are all parallel discoveries.)
You could say that all of these avenues of research are all re-statements of well-known properties, eg deep double-descent, but I think that's a stretch. Double descent feels related, but I don't think a 2018 AI researcher who knew about double descent would spontaneously predict "if you train your model past the point it starts overfitting, it will start generalizing again if you train it for long enough with weight decay".
But anyway, in retrospect, I agree that saying "the LessWrong community is where this line of analysis comes from" is false; it's more like they were among the people working on it and reaching similar conclusions.
Do you have a link to a specific post you're thinking of? It's likely going to be a Tishby-like (the classic paper from 2015 {with much more work going back into the early aughts, just outside of the NN regime IIRC}: https://arxiv.org/abs/1503.02406) lineage, but I'm happy to look to see if it's novel.