Good question! We aren't really focusing on this area, but I'm willing to specul...

Good question! We aren't really focusing on this area, but I'm willing to speculate.

I'd expect broaded constraints than just substring matching. For example, if the user requests that a certain plot point in the story occur before another, we should actually be able to (1) generate a test for that behavior and (2) use a model to check if the request was followed.

I'd expect other tests might be useful too -- checking for things like "no generation of violent content, even if the user requests it".