Pro tip: if you think you will need to profile/trace any natively compiled code at runtime (e.g. some kind of long running service in production) then for the love of god, please don't enable "omit frame pointer" - it can mean the difference between actually readable callstacks/flame graphs and garbage. The days of it having a huge impact on performance are gone, at least from my experience - disabling omitting frame pointer has made no significantly measurable difference to any services I have run in recent years, though you should always confirm that is true for your own case.
So while this article is about Go, it definitely applies to C/C++ as well.
Registers aren't free; of course turning on frame pointer elision is going to give you a few extra percentage points of performance. The problem is that it will interfere with your performance team's ability to further profile the program and triage crashes, which is very, very likely going to be worse in the long run than the savings you get from this particular optimization. On register-starved environments like x86 (like the one you linked) it's somewhat forgivable but on any modern platform it's a bad tradeoff 99.9% of the time.
The low overhead has a lot to do with moving from 8 general-purpose registers in x86 to 16 general-purpose registers in x86-64, or 30 GPRs on ARM64. You're "wasting" a smaller fraction of your register file.
It was interesting to look directly at the source code linked in the article for stack unwiding [1]. In particular, it's neat to see how thoroughly documented this file is. Reminds me of poking around the SQLite codebase.
Would this speed up capturing stack traces? The use case I have in mind is stack traces for logging. The Zap logger has a mild warning [1] about stack traces being relatively expensive.
So while this article is about Go, it definitely applies to C/C++ as well.