Dynamic memory allocation seems like something you'd generally want to avoid in real-time programs and games. Slab allocation is often used (I think that's the right term, allocate a large chunk of memory at startup, bump a pointer to allocate, reset the pointer to free all memory at once), and that's simple to implement in Forth.
If inner interpreter overhead is an issue, there are well known solutions to that as well. Subroutine threading removes the inner interpreter entirely, and makes it simple to implement inlining in the compiler, and then there's always the inline assembly approach.
If inner interpreter overhead is an issue, there are well known solutions to that as well. Subroutine threading removes the inner interpreter entirely, and makes it simple to implement inlining in the compiler, and then there's always the inline assembly approach.