The async design allows the language to do garbage collection in Rust because there are defined time when the language is running and not running. This gets around the aliasable or mutable limitations of Rust. There are also benefits around the stack uniformity vs regular Lua. (I.e., in regular Lua you can have some differences between C stacks and Lua stacks.)
Quite frankly I don't understand the question. The assume here is that you have the goal is to have/write a dynamic language you will have to write a VM. If you write a VM a stackless model is the preferred one for a variety of reasons. However not having a stack causes pains for native calls that can detour via native objects back into the VM. That you often have when embedding these languages (which is the goal).
So if you have that sort of setup, you will need to find solutions.
What piccolo does is quite pretty and definitely beats some other options out there. A lot of stackless VMs just crash horribly if you do something naughty with re-entrant calls.