Wait we say LLMs are as smart as some humans? I must have missed the memo.

p-e-w · 2024-08-17T03:39:02 1723865942

I mean LLMs crush most humans (even many professionals) at coding/legal/medical/etc. exams.

Ah, wait. That only counts as evidence of intelligence when humans do it, right?

BriggyDwiggs42 · 2024-08-17T04:25:24 1723868724

Wait are LLMs crushing professionals at coding? Anyways, LLMs are clearly better than humans at memorizing information, and they most certainly have some ability to retrieve and adapt this information to solve problems. On tests, they can do some degree of generalization to answer questions with different wording from their training data. However, because humans are comparably quite bad at memorizing, we know that when humans do well on medical tests, it usually (not always) means that they “understand” how, for example, anatomy works. They’ve formed some model in their head of how to break the body into parts and parts into subparts and how parts interact and so on because doing so is actually easier for most humans than trying to remember the linguistic facts.

Understanding in this sense seems different from the memorizing+flexible retrieval we know LLMs exceed at because it extends much further beyond its training distribution. If I ask a (good) medical professional a question unlike anything they’ve ever seen before, they’ll be able to draw on their “understanding” of anatomy to give me a decent guess. LLMs are inconsistent on these kinds of questions and often drop the ball.

We can also point to training data requirements as a discrepancy. In brain organoid experiments, we observe a much lower quantity of examples are required to achieve neural-network-like results. This isn’t surprising to me. Biological neurons have exceedingly complex behavior; it takes hundreds of neural network nodes to recreate the behavior of a single neuron, and they can reorganize their network structure in response to stimuli, form loops, operate in non-discretized time, etc. We don’t know how far transformers will be able to go, but I think that if you want to make a model that holds a candle to that sort of complexity, you’ll need at least many, many orders of magnitude more scale or, more likely, a different, less limiting architecture.

ben_w · 2024-08-17T17:44:14 1723916654

> Wait are LLMs crushing professionals at coding?

Exams, yes. Not the work yet. This is also why they've not made doctors and lawyers obsolete: even in the fields where the models perform the best, you're getting the equivalent of someone fresh out of university with no real experience.

I suspect you're right about your point with generalise vs. memorise. Not absolutely sure, but I do suspect so.

I also suspect we'll get transformative AI well before we can train any AI with as few examples as any organic brain needs. Unless we suddenly jump to that with one weird trick we've been overlooking this whole time, which wouldn't hugely surprise me given how many radical improvements we've made just by trying random ideas.

mewpmewp2 · 2024-08-17T06:17:53 1723875473

Most people can't code at all fyi.

BriggyDwiggs42 · 2024-08-17T07:16:36 1723878996

Most professional coders can, in fact, code.

mewpmewp2 · 2024-08-17T14:48:08 1723906088

Most people are not professional coders.

TeMPOraL · 2024-08-17T15:11:24 1723907484

And to short-circuit the argument: approximately no one is, at the same time, a professional coder, writer, artist, detective, translator, doctor, and <insert one or ten or a hundred more occupations here>. GPT-4 is all of them at once, and outperforms an average professional of any of those occupations in their respective field.

BriggyDwiggs42 · 2024-08-17T21:12:56 1723929176

You said it outperforms the average professional in any of these occupations (including detective and artist?) and more? That’s a very bold claim. Can you substantiate that somehow? I don’t disagree for a second that incrementally better LLMs + robotics will be able to automate a large portion of labor, but that doesn’t in my eyes make them smarter than humans. Jobs today aren’t exactly the fullest realization of human potential; you wouldn’t call a robotic arm smarter than a human for being better on the line.

ithkuil · 2024-08-17T17:07:11 1723914431

I thought this thread was an answer to the qualifier in parenthesis:

> I mean LLMs crush most humans (even many professionals) at coding/legal/medical/etc. exams.

wavemode · 2024-08-17T04:01:16 1723867276

Those same humans the AI beats at one task, the human would beat it at 1000 others. So the human is 1000x more intelligent!

Ah, wait. That only counts as evidence of intelligence when AIs do it, right?

ben_w · 2024-08-17T17:51:59 1723917119

It's the other way around. Any one human will only beat an AI at two or three tasks — the first two being our day job and our mother tongue, the third possibly being our hobby — while the AI will out-perform us at thousands of others tasks.

Only if you count humanity as a whole do we beat an LLM at everything.

wavemode · 2024-08-17T18:35:03 1723919703

There are two logical problems with what you've written here.

1. You're counting "our day job" as one task, while counting all individual prompts an AI can answer as each being their own task. This is obviously misleading (and I could just as easily perform the same compression in the opposite direction - chatbots can only do one task: "be a chatbot").

2. You're not controlling for training. It's already meaningless to compare the intelligence of one entity trained to do a task with another entity that was not trained to do that task.

But even ignoring those fallacies, what you've written here is still not true. 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer. Regurgitating facts is not evidence of intelligence, and humans can do it easily (and do it without hallucinating, which is key) if you just give them access to the same information.

But when you go in the opposite direction, achieving parity is no longer so simple. When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away. They are fundamental limitations of the AI's capabilities, rather than just a result of not possessing enough information.

What many people don't seem to realize is, when merely getting up in the morning and brushing your teeth you are already exercising more intelligence than any AI has ever possessed. (Anyone who has ever worked in robotics or visual processing can attest to that enthusiastically.) So don't even get me started on actual critical thinking.

ben_w · 2024-08-17T18:52:01 1723920721

Neither of those is a logical problem.

1. This is indeed a simplification, but for any single task in your day job, it would be those tasks where you have the most experience. For example, I used to write video games, AI does a better job of game design than me, but I'm the better programmer.

2. Unimportant, as the consideration I was rejecting was performance in tasks.

As it happens, some of my other recent messages demonstrate that I agree they are low intelligence for this exact reason.

> 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer

Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does?

Nope.

Could I do it with a search engine, which is what RAG assisted LLMs do?

Perhaps.

> humans can do it easily (and do it without hallucinating

Hell no we mess that up almost constantly.

> When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away

I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…

wavemode · 2024-08-17T19:03:30 1723921410

> Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does? Nope.

You're -still- ignoring the fact that these models spend millions of GPU-hours in training. I'm sure you could manage.

> Hell no we mess that up almost constantly.

"Almost constantly"? Is this satire? I'd fire any such person, and probably recommend them psychiatric treatment.

AI-hype people really think so little of human beings? I certainly hope my pilot isn't "almost constantly" hallucinating his aviation training.

> I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…

Show me the conversation.