Hacker News new | past | comments | ask | show | jobs | submit login

>Chat models worked great for everything, including what we used instruct & completion models for

In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.




yeah gpt 3.5 just worked. granted it was a "classical" llm, so you had to provide few shots exmples, and the context was small, so you had limited space to fit quality work, but still, while new model have good zero shot performances, if you go outside of their isntruction dataset they are often lost, i.e.

gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and

gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."


I disagree about those questions being good examples of GPT4 pitfalls.

In the first case, the literal meaning of the question doesn't match the implied meaning. "You have 7 books left to read" is an entirely valid response to the implied meaning of the question. I could imagine a human giving the same response.

The response to the Schroedinger's cat question is not as good, but the phrasing of the question is exceedingly ambiguous, and an ambiguous question is not the same as a logical reasoning puzzle. Try asking this question to humans. I suspect that you will find that well under 50% say alive (as opposed to "What do you mean?" or some other attempt to disambiguate the question).


Agree


The phrasing and intent is slightly off or odd in both of your examples.

Improving the phrasing yields the expected output in both cases.

“I've ten books and I read three, how many books do I have?”

“My Schrödinger cat is alive and well. What's my Schrödinger cat’s status?”




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: