>Chat models worked great for everything, including what we used instruct & completion models for
In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.
yeah gpt 3.5 just worked. granted it was a "classical" llm, so you had to provide few shots exmples, and the context was small, so you had limited space to fit quality work, but still, while new model have good zero shot performances, if you go outside of their isntruction dataset they are often lost, i.e.
gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and
gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."
I disagree about those questions being good examples of GPT4 pitfalls.
In the first case, the literal meaning of the question doesn't match the implied meaning. "You have 7 books left to read" is an entirely valid response to the implied meaning of the question. I could imagine a human giving the same response.
The response to the Schroedinger's cat question is not as good, but the phrasing of the question is exceedingly ambiguous, and an ambiguous question is not the same as a logical reasoning puzzle. Try asking this question to humans. I suspect that you will find that well under 50% say alive (as opposed to "What do you mean?" or some other attempt to disambiguate the question).
In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.