Nah, memcpy shouldn't exist (just not for buffer overflow reasons). memcpy is there because someone got their knickers in a twist over possibly being a few cycles faster than memmove, even though the chances that actually matters is tiny, while the chances that you'll accidentally use memcpy when your memory overlaps is much larger.
actually that justification for memcpy is quite poor. usually, performance wise, it is quite easily beatable once you start using the hardware. most implementations I've seen loop byte copies in the most naïve way possible... almost every architecture supports 32-bit copies, most support more... :)
incidentally I've saved some .2-.8ms per frame in a "AAA" (I use the term loosely and with disgust) game title by replacing compiler generated assignment operator in one struct with an explicit 128-byte copy. copying individual bytes is slow, and the compiler is never as sufficiently smart as people claim.
in analagous cases involving memcpy, the fact that memcpy is not memmove saves much less than actually copying the data in a way that squeezes everything out of the data buses. :)
"almost every architecture supports 32-bit copies"
How much of that was true at the time that function was created?
Also, on 'just use memmove': it would not surprise me if ancient Unices used memcpy instead of inlining it to copy really small buffers (such as 14-byte-max filenames), not for speed, but for the byte savings (even if singular 'byte' is the correct way to phrase that). With small buffers, the overhead of that 'if' can become substantial.