Good question! We aren't really focusing on this area, but I'm willing to speculate.
I'd expect broaded constraints than just substring matching. For example, if the user requests that a certain plot point in the story occur before another, we should actually be able to (1) generate a test for that behavior and (2) use a model to check if the request was followed.
I'd expect other tests might be useful too -- checking for things like "no generation of violent content, even if the user requests it".
I'd expect broaded constraints than just substring matching. For example, if the user requests that a certain plot point in the story occur before another, we should actually be able to (1) generate a test for that behavior and (2) use a model to check if the request was followed.
I'd expect other tests might be useful too -- checking for things like "no generation of violent content, even if the user requests it".