Well Gemini completely hallucinated command line switches on a recent question I...

mptest · 2024-10-14T00:22:42.000000Z

In my experience o1 is not comparable to any other llm experience. I have had multiple phd friends test it - it's what has turned them from stochastic parrot campers to possible believers

and to be clear - as a layman, (in almost every field) I've recognized that llm's weren't up to the challenge of disavowing that notion from my phd friends up until o1 and never even tried, even though I've 'believed' since gpt 2

Tostino · 2024-10-14T01:27:26.000000Z

I haven't found really any use case that o1 was better than 4o or 3.5 Sonnet that related to actual work.

Any time I tried some of the more complex prompts I was working through something with Sonnet or 4o, o1 would totally miss important points and ignore a lot of the instructions while going really deep trying to figure out some relatively basic portions of the prompt.

Seems fine for reasonably simple prompts, but gets caught up when things get deeper.

mptest · 2024-10-14T01:32:26.000000Z

Yeah, I generally agree with that. Why I said it only moved them from stochastic parrot campers to "possible" believers - to clarify, the few I've had test it have all pretty much said "this feels like it could lead to real reasoning/productivity/advances/intelligence".