When the task, or part of it, is np complete there is no way around. Model has to try all options till it find working one. In a loop. And this can be multi-step with partial fallback. That's how humans are thinking. They can see only to some depth. They may first determine promising directions. Select one, go dipper. Fallback if it doesn't work. Pattern matching mentioned is simplest one step solution. LLMs are doing it with no problems.
Source? TFA, i.e. the thing we're commenting on, tried to, and seems to, show the opposite