>Chat models worked great for everything, including what we used instruct & comp...

avereveard · 2024-06-21T10:00:53 1718964053

yeah gpt 3.5 just worked. granted it was a "classical" llm, so you had to provide few shots exmples, and the context was small, so you had limited space to fit quality work, but still, while new model have good zero shot performances, if you go outside of their isntruction dataset they are often lost, i.e.

gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and

gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."

Calavar · 2024-06-21T14:22:28 1718979748

I disagree about those questions being good examples of GPT4 pitfalls.

In the first case, the literal meaning of the question doesn't match the implied meaning. "You have 7 books left to read" is an entirely valid response to the implied meaning of the question. I could imagine a human giving the same response.

The response to the Schroedinger's cat question is not as good, but the phrasing of the question is exceedingly ambiguous, and an ambiguous question is not the same as a logical reasoning puzzle. Try asking this question to humans. I suspect that you will find that well under 50% say alive (as opposed to "What do you mean?" or some other attempt to disambiguate the question).

mikeqq2024 · 2024-06-21T22:11:53 1719007913

Agree

irzzy · 2024-06-21T11:44:46 1718970286

The phrasing and intent is slightly off or odd in both of your examples.

Improving the phrasing yields the expected output in both cases.

“I've ten books and I read three, how many books do I have?”

“My Schrödinger cat is alive and well. What's my Schrödinger cat’s status?”