Is the slow part of linking the "copy all the bytes into the executable" step (in which case avoiding separate-link is a clear win, saving a copy), or is it the "do all the relocations" work, which I think needs to be done anyway ?
I put my question to a friend of mine who works on linkers, and his take was that for a single threaded linker like ld.bfd the copy-bytes part would probably dominate, but that for a multithreaded linker like lld that part trivially parallelizes and so the slow part tends to be elsewhere. He also pointed me at a recent blogpost by the lld maintainer on this topic: https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster which I should go and read...