Does Reasoning Emerge? Probabilities of Causation in Large Language Models

layer8 · 2024-08-16T18:41:44 1723833704

My impression is that LLMs “pattern-match” on a less abstract level than general-purpose reasoning requires. They capture a large number of typical reasoning patterns through their training, but it is not sufficiently decoupled, or generalized, from what the reasoning is about in each of the concrete instances that occur in the training data. As a result, the apparent reasoning capability that LLMs exhibit significantly depends on what they are asked to reason about, and even depends on representational aspects like the sentence patterns used in the query. LLMs seem to be largely unable to symbolically abstract (as opposed to interpolate) from what is exemplified in the training data.

kgeist · 2024-08-16T21:52:56 1723845176

Aren't humans also prone to this?

Take the classic trick question, for example: "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"

Most people give a wrong answer because they, too, "pattern match".

bad_user · 2024-08-17T03:52:06 1723866726

People pattern match, but people also apply logic and abstract thinking, far surpassing the abilities of LLMs, and this is a fundamental limitation that won't get fixed by more processing power and training data.

The big difference in behavior is in how people or LLMs approach new problems. A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements. A LLM is also incapable of learning, once you point out their error, or of admitting that it doesn't know.

Regarding people, I find it interesting that even lower IQ people are capable of tasks that are currently completely out of reach for AI. It's not just the obvious, such as self-reflection, but even tasks that should've been solved by AI already, such as driving to the grocery store.

viraptor · 2024-08-17T05:20:23 1723872023

There's lots of assumptions here that we've got examples to disprove:

> but people also apply logic and abstract thinking

Which people? If we universally did that as a default, elections would look massively different.

> A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements

Getting good answers for coding questions on my private code/databases disproves this. The requirements have been changed significantly. I've been through ~20 turn chat with LLM investigating a previously unseen database, suggesting queries to get more information, acting on it to create hypotheses and follow up on them.

> A LLM is also incapable of learning, once you point out their error,

This is the standard coding agent loop - you feed back the error to get a better answer through in-context learning. It works.

> or of admitting that it doesn't know.

Response from gpt: «I apologize, but I'm not able to find any reliable information about a person named "Thrust Energetic Aneksy."»

Jensson · 2024-08-17T07:08:37 1723878517

> Response from gpt: «I apologize, but I'm not able to find any reliable information about a person named "Thrust Energetic Aneksy."»

The model says that because it is trained to say that to specific queries. They have given it a lot of prompts with "Who is X" and showing the responses "I don't know about that person".

The reason you don't see "I don't know" much in other kinds of problems is that there it isn't easy to create such data examples where the model says I don't know while still making the model solve problems that are in its dataset, since it starts to pattern match all sort of math problems to "I don't know" even when it could solve it.

A human can look at his own thoughts and realize how he is solving it, and thus knows a lot better what he knows and not. LLMs aren't like that, they don't really know what they do know. The LLM doesn't know its own weights when it picks a word, it has no clue how certain it is, it just predicts whether a human would have said "I don't know" to the question, not whether itself would know.

viraptor · 2024-08-17T08:56:59 1723885019

> LLMs aren't like that, they don't really know what they do know.

They actually do. The information is saved there, you just need to ask explicitly, because the usual response doesn't expose it. (But likely can be fine tuned to do that) https://ar5iv.labs.arxiv.org/html/2308.16175

Also the original claim was that they're not capable of responding with "I don't know", so that's what I was addressing.

bad_user · 2024-08-20T15:21:00 1724167260

> Which people? If we universally did that as a default, elections would look massively different.

All people, otherwise we wouldn't be able to do basic tasks, such as finding edible food or recognise danger.

I dislike how you discount the way other people vote as being somehow irrational, while I'm sure you consider your own political thinking as being rational. People always vote according to their own needs and self-interest, and in terms of politics, things are always nuanced and complicated. The fact that many people vote contrary to your wishes is actually proof that people can think for themselves.

> Getting good answers for coding questions on my private code/databases disproves this.

I use GitHub Copilot and ChatGPT every day. It only answers correctly when there's a clear, and widely documented pattern, and even then, it can hallucinate. Don't get me wrong, it's still useful, but it shows in no way an ability to reason, or the capacity to admit that a solution is out of its reach.

Your experience with coding is kind of irrelevant to the question at hand.

photonthug · 2024-08-17T06:00:27 1723874427

> This is the standard coding agent loop - you feed back the error to get a better answer through in-context learning. It works.

This looks like it works sometimes, but only if "pointing out the error" is coincidentally the same as "clarifying problem spec". Admittedly for really simple cases those are the same or hard to tell apart. But it always seems clear that adding error correction context is similar to adding additional search terms to get to a better stack-overflow page. This feels very different than updating any kind of logical model for the problem representation.

viraptor · 2024-08-17T06:13:32 1723875212

It doesn't have to be as explicit as an extra term. The error feedback can be a failing test which was just modified, or a self-reflection result from a prompt like "does what you created satisfy the original request?".

Updating the logical model of the problem also happens when you do a database investigation I mentioned earlier. There's both information gathering and building on it to ask more specific questions.

Jensson · 2024-08-17T07:04:15 1723878255

> I've been through ~20 turn chat with LLM investigating a previously unseen database, suggesting queries to get more information, acting on it to create hypotheses and follow up on them.

You do know that when that happens the LLM usually just throws random stuff at you until you are happy? That is much easier to do than to reason, LLM solved the much easier problem of looking smart than being smart, trick is to make the other person solve the problem for you while attributing it to you.

You see humans do this as well in hiring interviews etc, it is really easy to trick people who want you to succeed.

viraptor · 2024-08-17T08:01:29 1723881689

When the hypotheses make sense, the tests for them work and the next steps are based on the previous test results that's not "random stuff". I made that explicit in the first message. This problem wasn't solvable with random guesses.

FeepingCreature · 2024-08-17T06:39:23 1723876763

> The big difference in behavior is in how people or LLMs approach new problems. A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements. A LLM is also incapable of learning, once you point out their error, or of admitting that it doesn't know.

This is simply not true.

make_it_sure · 2024-08-17T06:01:35 1723874495

you're so off, the comment below gives some good points

wavemode · 2024-08-17T01:14:17 1723857257

When are we going to stop defending LLM intelligence on the grounds that "some people are stupid too"?

Human intelligence is not benchmarked by its lowest common denominators (just like how we don't judge LLM's on the basis of tiny 100M parameter models).

jacobsimon · 2024-08-17T01:21:52 1723857712

The paper linked here starts off by asking whether LLMs can “solve complex problems in ways that resemble human thinking.” Why would we try to answer that question without discussing how humans think?

mannykannot · 2024-08-17T12:40:18 1723898418

> Why would we try to answer that question without discussing how humans think?

The post you are replying to does not suggest that we should. If anything, it is suggesting the opposite - that we should be considering the full range of human abilities (or at least those that are effective in solving complex problems) when addressing the question you have quoted.

I agree with most of what viraptor has said in this thread, but not in this particular case.

mrjin · 2024-08-17T05:42:28 1723873348

Coz we simply don't understand how we understand, period.

mewpmewp2 · 2024-08-17T06:15:55 1723875355

So we shouldn't discuss it or try to understand it at all?

mrjin · 2024-08-17T08:04:40 1723881880

Where did you get that? You don't know you don't know. It's not going to sell if you literally say I don't know what we're doing...

ben_w · 2024-08-17T17:30:59 1723915859

I'm a different person than you replied to, but I agree with them.

Your previous reply was to:

> Why would we try to answer that question without discussing how humans think?

And by replying to that with

> Coz we simply don't understand how we understand, period.

Well, when we don't understand how we understand, that is exactly when we should be discussing how humans think. Or at least how we think we think. And the bat and a ball example relates to research about how we think we think.

So yeah, your reply definitely comes across as saying the first half, "So we shouldn't discuss it", while finishing it with "period" also suggests "or try to understand it at all".

ben_w · 2024-08-17T17:36:23 1723916183

> Human intelligence is not benchmarked by its lowest common denominators (just like how we don't judge LLM's on the basis of tiny 100M parameter models).

Indeed, but we do also catalogue our cognitive biases — kinda the human version of what are now called hallucinations when LLMs do them. (When Stable Diffusion does it, it's "oh god the fingers").

It's worth caring about both strengths and weaknesses.

fragmede · 2024-08-17T17:45:42 1723916742

Yes it is. Warning labels on products and laws aren't written with genius level people in mind, but the lowest common denominator. Tv politics is the way is it because of said lowest common denominator of intelligence. It's not some people are dumb too, most people are dumb in the wrong contexts, even you and I pre-coffee or sleep deprived or wasted drunk or high.

IQ tests are averaged out so 100 is average intelligence, not the high point.

p-e-w · 2024-08-17T02:04:04 1723860244

When are we going to define what intelligence actually means, so these discussions can start making sense?

5 years ago, it was still common to read things like "AIs will never be as intelligent as mice, let alone humans", today, it's "sure, AIs are as intelligent as some humans, but not as intelligent as the right humans".

Noticed how everyone dropped the Turing Test like a hot potato as the gold standard for intelligence, the moment it became apparent that LLMs were about to pass it? Try to find a recent high-profile article invoking the Turing Test. Crickets. The intellectual dishonesty is nauseating.

The entire discussion is dominated by smart people who are scared shitless that AI is going to show them just how ordinary they are in the grand scheme of things.

BriggyDwiggs42 · 2024-08-17T03:23:48 1723865028

Wait we say LLMs are as smart as some humans? I must have missed the memo.

p-e-w · 2024-08-17T03:39:02 1723865942

I mean LLMs crush most humans (even many professionals) at coding/legal/medical/etc. exams.

Ah, wait. That only counts as evidence of intelligence when humans do it, right?

BriggyDwiggs42 · 2024-08-17T04:25:24 1723868724

Wait are LLMs crushing professionals at coding? Anyways, LLMs are clearly better than humans at memorizing information, and they most certainly have some ability to retrieve and adapt this information to solve problems. On tests, they can do some degree of generalization to answer questions with different wording from their training data. However, because humans are comparably quite bad at memorizing, we know that when humans do well on medical tests, it usually (not always) means that they “understand” how, for example, anatomy works. They’ve formed some model in their head of how to break the body into parts and parts into subparts and how parts interact and so on because doing so is actually easier for most humans than trying to remember the linguistic facts.

Understanding in this sense seems different from the memorizing+flexible retrieval we know LLMs exceed at because it extends much further beyond its training distribution. If I ask a (good) medical professional a question unlike anything they’ve ever seen before, they’ll be able to draw on their “understanding” of anatomy to give me a decent guess. LLMs are inconsistent on these kinds of questions and often drop the ball.

We can also point to training data requirements as a discrepancy. In brain organoid experiments, we observe a much lower quantity of examples are required to achieve neural-network-like results. This isn’t surprising to me. Biological neurons have exceedingly complex behavior; it takes hundreds of neural network nodes to recreate the behavior of a single neuron, and they can reorganize their network structure in response to stimuli, form loops, operate in non-discretized time, etc. We don’t know how far transformers will be able to go, but I think that if you want to make a model that holds a candle to that sort of complexity, you’ll need at least many, many orders of magnitude more scale or, more likely, a different, less limiting architecture.

ben_w · 2024-08-17T17:44:14 1723916654

> Wait are LLMs crushing professionals at coding?

Exams, yes. Not the work yet. This is also why they've not made doctors and lawyers obsolete: even in the fields where the models perform the best, you're getting the equivalent of someone fresh out of university with no real experience.

I suspect you're right about your point with generalise vs. memorise. Not absolutely sure, but I do suspect so.

I also suspect we'll get transformative AI well before we can train any AI with as few examples as any organic brain needs. Unless we suddenly jump to that with one weird trick we've been overlooking this whole time, which wouldn't hugely surprise me given how many radical improvements we've made just by trying random ideas.

mewpmewp2 · 2024-08-17T06:17:53 1723875473

Most people can't code at all fyi.

BriggyDwiggs42 · 2024-08-17T07:16:36 1723878996

Most professional coders can, in fact, code.

mewpmewp2 · 2024-08-17T14:48:08 1723906088

Most people are not professional coders.

TeMPOraL · 2024-08-17T15:11:24 1723907484

And to short-circuit the argument: approximately no one is, at the same time, a professional coder, writer, artist, detective, translator, doctor, and <insert one or ten or a hundred more occupations here>. GPT-4 is all of them at once, and outperforms an average professional of any of those occupations in their respective field.

BriggyDwiggs42 · 2024-08-17T21:12:56 1723929176

You said it outperforms the average professional in any of these occupations (including detective and artist?) and more? That’s a very bold claim. Can you substantiate that somehow? I don’t disagree for a second that incrementally better LLMs + robotics will be able to automate a large portion of labor, but that doesn’t in my eyes make them smarter than humans. Jobs today aren’t exactly the fullest realization of human potential; you wouldn’t call a robotic arm smarter than a human for being better on the line.

ithkuil · 2024-08-17T17:07:11 1723914431

I thought this thread was an answer to the qualifier in parenthesis:

> I mean LLMs crush most humans (even many professionals) at coding/legal/medical/etc. exams.

wavemode · 2024-08-17T04:01:16 1723867276

Those same humans the AI beats at one task, the human would beat it at 1000 others. So the human is 1000x more intelligent!

Ah, wait. That only counts as evidence of intelligence when AIs do it, right?

ben_w · 2024-08-17T17:51:59 1723917119

It's the other way around. Any one human will only beat an AI at two or three tasks — the first two being our day job and our mother tongue, the third possibly being our hobby — while the AI will out-perform us at thousands of others tasks.

Only if you count humanity as a whole do we beat an LLM at everything.

wavemode · 2024-08-17T18:35:03 1723919703

There are two logical problems with what you've written here.

1. You're counting "our day job" as one task, while counting all individual prompts an AI can answer as each being their own task. This is obviously misleading (and I could just as easily perform the same compression in the opposite direction - chatbots can only do one task: "be a chatbot").

2. You're not controlling for training. It's already meaningless to compare the intelligence of one entity trained to do a task with another entity that was not trained to do that task.

But even ignoring those fallacies, what you've written here is still not true. 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer. Regurgitating facts is not evidence of intelligence, and humans can do it easily (and do it without hallucinating, which is key) if you just give them access to the same information.

But when you go in the opposite direction, achieving parity is no longer so simple. When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away. They are fundamental limitations of the AI's capabilities, rather than just a result of not possessing enough information.

What many people don't seem to realize is, when merely getting up in the morning and brushing your teeth you are already exercising more intelligence than any AI has ever possessed. (Anyone who has ever worked in robotics or visual processing can attest to that enthusiastically.) So don't even get me started on actual critical thinking.

ben_w · 2024-08-17T18:52:01 1723920721

Neither of those is a logical problem.

1. This is indeed a simplification, but for any single task in your day job, it would be those tasks where you have the most experience. For example, I used to write video games, AI does a better job of game design than me, but I'm the better programmer.

2. Unimportant, as the consideration I was rejecting was performance in tasks.

As it happens, some of my other recent messages demonstrate that I agree they are low intelligence for this exact reason.

> 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer

Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does?

Nope.

Could I do it with a search engine, which is what RAG assisted LLMs do?

Perhaps.

> humans can do it easily (and do it without hallucinating

Hell no we mess that up almost constantly.

> When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away

I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…

wavemode · 2024-08-17T19:03:30 1723921410

> Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does? Nope.

You're -still- ignoring the fact that these models spend millions of GPU-hours in training. I'm sure you could manage.

> Hell no we mess that up almost constantly.

"Almost constantly"? Is this satire? I'd fire any such person, and probably recommend them psychiatric treatment.

AI-hype people really think so little of human beings? I certainly hope my pilot isn't "almost constantly" hallucinating his aviation training.

> I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…

Show me the conversation.

card_zero · 2024-08-17T02:39:11 1723862351

Do we still have Turing Test competitions? Now that everybody's used to LLMs, and detecting their style, it might be interesting. I'm not sure the LLMs would do very well against humans who were experienced and alert.

viraptor · 2024-08-17T05:33:14 1723872794

"detecting their style" is trivial only because people take the default output and run with it. You can significantly alter it if you want and it would be an obvious thing to do for such test. Right now the default output styles of public LLMs have no reason to try avoiding detection and is not trained to do it.

card_zero · 2024-08-17T13:47:06 1723902426

I didn't exactly mean superficial style, but properties like creativity, liveliness, and consistency. I hesitate to say that these are things LLMs can't do, because I'll provoke somebody into telling me that they totally can, but weaknesses like this could be ways for human intuition to detect an LLM. It might have to be a long competition, because an unusual set of prompts (or training, even) is likely to initially impress the humans: time should be allowed for novelty to wear off.

viraptor · 2024-08-17T14:10:22 1723903822

> properties like creativity, liveliness, and consistency

I don't believe we train The LLMs for those things specifically. It will be interesting to see if some datasets for this appear. I think we can still make huge improvements by just caring about that more.

We've got Samantha though, so I hope we will see those attributes too. https://erichartford.com/meet-samantha

ben_w · 2024-08-17T17:37:48 1723916268

> Noticed how everyone dropped the Turing Test like a hot potato as the gold standard for intelligence, the moment it became apparent that LLMs were about to pass it? Try to find a recent high-profile article invoking the Turing Test. Crickets. The intellectual dishonesty is nauseating.

Not quite that bad, but I hear you.

https://arxiv.org/html/2405.08007v1

Kiro · 2024-08-17T06:38:13 1723876693

Just because you're tired of hearing something doesn't make it less true. "When are we going to stop" is a manipulative argument.

mecsred · 2024-08-16T22:00:40 1723845640

I think you could reasonably describe that as "pattern matching at the wrong level". If you tell most people the first answer is wrong they will go up a level or two and work out the correct answer.

nwhitehead · 2024-08-16T22:32:13 1723847533

You would think so...

I asked this question in a college-level class with clickers. For the initial question I told them, "This is a trick question, your first answer might not be right". Still less than 10% of students got the right answer.

adamisom · 2024-08-16T22:35:26 1723847726

Students are more arrogant than LLMs (but still more generally intelligent)

roenxi · 2024-08-17T02:41:53 1723862513

This idea that students are "more generally intelligent" requires specific arguments to support. ChatGPT alone has a vast breadth of knowledge that is impossible for a human to keep up with as well as superior skills to a student in any number of fields. I can find evidence that it has an IQ of around 124 [0] which is going to stretch most students (although in fairness the same article also speculates an IQ of 0).

Students can keep ahead of it with training in specific fields and it has a few weaknesses in specific skills but I think someone could make a reasonable claim that ChatGPT has superior general intelligence.

[0] https://medium.com/@soltrinox/the-i-q-of-gpt4-is-124-approx-...

chmod775 · 2024-08-17T03:27:17 1723865237

> superior skills to a student in any number of fields

It's abhorrent any time we measure pure distilled intelligence.

When asked to come up with any non-basic novel algorithm and data structure, it creates nonsense.

Especially when you ask it to create vector instruction friendly memory layouts and it can't code in its preferred way. I had some fun trying to make it spit out a brute-force-ish solver for a problem involving basic orbital mechanics and some forces. Wouldn't even want try something more complicated. It can do generalized solvers somewhat, since it can copy that homework, but none that can express the kinds of terms you'd be working with (despite those also having code available in some research papers).

Speaking of which, it cannot even figure out some basic truths in orbital mechanics that can be somewhat easily derived from the formulas commonly given, nine times out of ten (you can get there if you're very patient and are able to filter its wrong answers).

But at the end of the day it was still a valuable tool to me as I was learning these things myself, since despite being often wrong, it nevertheless spat out useful things I could plug into Google to find more trustworthy sources that would teach me. Really neat if you're going in blind into a new subject.

magic_hamster · 2024-08-17T03:04:09 1723863849

Claiming that Chatgpt is more intelligent than a student is the same as saying an encyclopedia or a library is more intelligent than a student. Sure they retain more information. But Chatgpt is not AGI and it has no idea what it is even talking about.

ben_w · 2024-08-17T18:00:54 1723917654

> I can find evidence that it has an IQ of around 124

Despite me being someone who is generally impressed by the best LLMs, I think this says more about IQ tests than it does any AI.

Which isn't to shame those tests — we made those tests for humans, we were the only intelligence we knew of that used complex abstract language and tools until about InstructGPT — but it does mean we should reconsider what we mean by "intelligence".

My gut feeling is that a better measure is how fast we learn stuff. Computers have a speed advantage because transistors outpace synapses by the degree to which a pack of wolves outpaces continental drift (yes I did do the calculation), so what I mean here is how many examples rather than how many seconds.

But as I say, gut feeling — this isn't a detailed proposal for a new measure, it likely needs a lot of work to even turn this into something that can be a good measure.

deafpolygon · 2024-08-17T08:36:27 1723883787

Knowledge != Intelligence

dinobones · 2024-08-16T23:24:23 1723850663

This question becomes easy if you think about it algebraically/mathematically, but comically hard thinking about it intuitively, as in, trying to reason with language.

bat=ball+$1

(ball+$1)+ball=$1.10

2ball + $1 = $1.10

2ball = $0.10

ball = $0.05

dTal · 2024-08-17T09:17:55 1723886275

Indeed, domain-specific reasoning ability is a seminal psychology result: people are in general sucky at a specific type of logic puzzle, unless you frame it as enforcing a social rule - then they're excellent.

https://en.wikipedia.org/wiki/Wason_selection_task

Balgair · 2024-08-17T13:01:26 1723899686

> Aren't humans also prone to this?

Actually, no. We can do 'real' reasoning and come up with novel conclusions. In fact, we have such an example going all the way back to Plato in Meno, about 2300 years ago. It's the doubling of the square dialog of Socrates and the slave boy.

https://classics.mit.edu/Plato/meno.html (Crtl+F for 'square')

Really, the whole dialog is relevant to this discussion with LLMs knowing things.

amy-petrik-214 · 2024-08-16T22:19:30 1723846770

10 cents! I'm 100% sure

SpaghettiCthulu · 2024-08-16T23:28:26 1723850906

You are 100% wrong.

ignoramous · 2024-08-16T23:37:08 1723851428

"A robot must obey orders given it by human beings..."

https://g.co/gemini/share/94238c5ed174 / https://archive.is/ZTkjI

card_zero · 2024-08-17T01:59:52 1723859992

It might still say 5 cents if the problem was "the bat costs 90 cents more than the ball", since 5 cents is overwhelmingly the right answer to questions that look like this.

qingcharles · 2024-08-20T22:00:58 1724191258

That's a great conversation. I anthropomorphically feel bad for Gemini, in that it actually got a hard question right, and then it was gaslit into believing it was wrong.

gcanyon · 2024-08-17T12:14:17 1723896857

Since I've known this puzzle since childhood, my "pattern match" answer is the correct answer. I wonder if someone took it up a level and posed a puzzle that looked like this puzzle but had some other subtle complication, whether it would fool me.

CooCooCaCha · 2024-08-16T23:45:11 1723851911

“Don’t humans do this too?” Comes up in every thread about LLMs yet the conversation changes when you actually ask what you’d trust an LLM to do.

SpicyLemonZest · 2024-08-17T03:28:26 1723865306

I'm not saying this is you, but in my experience, the conversation changes because people flatly refuse to believe any concrete examples. I would and many companies do trust an LLM with low to medium skilled text processing tasks: internal knowledge sharing, first line customer support, low stakes document review. Github has their famous Copilot product, which I don't personally find too useful but many of my coworkers use.

CooCooCaCha · 2024-08-17T18:43:29 1723920209

In other words, you don't trust it with anything important.

SpicyLemonZest · 2024-08-18T00:22:26 1723940546

Really depends on what you understand the word to mean. All of the things I listed are “important” in the sense that they’ve gotta get done, and in the sense that it matters how well they’re done. Would I trust an LLM with something business-critical? No, but there’s quite a lot of people I’d put in the same boat.

CooCooCaCha · 2024-08-18T15:09:32 1723993772

The difference is some humans exist that can be trusted with business critical tasks but no AIs exist with that level of competence.

This is actually a great example of what I'm talking about. All this talk about how useful AIs are, and how humans have flaws and yet the conclusion is always the same. AIs are only useful for tasks that are relatively easy and have a higher tolerance for failure.

astromaniak · 2024-08-16T19:07:46 1723835266

For some reasons LLMs get a lot of attention. But.. while simplicity is great it has limits. To make model reason you have to put it in a loop with fallbacks. It has to try possibilities and fallback from false branches. Which can be done on a level higher. This can be either algorithm, another model, or another thread in the same model. To some degree it can be done by prompting in the same thread. Like asking LLMs to first print high level algorithm and then do it step by step.

layer8 · 2024-08-16T19:15:03 1723835703

Iteration is important, but I don’t think that it can substantively compensate for the abstraction limitations outlined in the GP comment.

Salgat · 2024-08-17T03:39:08 1723865948

LLMs already do this. Their many wide layers allow for this, and as a final fallback, their output adjusts based on every token they generate (it's not all decided at once). All your statement really means is a vague "well it should do it more!" which yeah, is the goal of each iteration of GPT etc.

llm_trw · 2024-08-16T21:35:53 1723844153

LLMs get a lot of attention because they were the first architecture that could scale to a trillion parameters while still improving with every added parameter.

refulgentis · 2024-08-16T20:27:52 1723840072

> To make model reason you have to put it in a loop with fallbacks

Source? TFA, i.e. the thing we're commenting on, tried to, and seems to, show the opposite

astromaniak · 2024-08-17T04:43:46 1723869826

When the task, or part of it, is np complete there is no way around. Model has to try all options till it find working one. In a loop. And this can be multi-step with partial fallback. That's how humans are thinking. They can see only to some depth. They may first determine promising directions. Select one, go dipper. Fallback if it doesn't work. Pattern matching mentioned is simplest one step solution. LLMs are doing it with no problems.

cma · 2024-08-17T00:56:48 1723856208

The comment may just be pattern matching on the topic, not reasoning about TFA.

Salgat · 2024-08-17T03:32:34 1723865554

I wonder if reasoning is intrinsically tied to the need for a model to be able to perform well with extremely limited training data. Current LLMs require the sum total of all human knowledge to work correctly (even a few decades ago, there was no where near enough data to train one to be useful, regardless of available computing), meanwhile humans only need access to a few books and conversations with uneducated people growing up to achieve brilliant ideas.

ekianjo · 2024-08-17T04:28:20 1723868900

> Current LLMs require the sum total of all human knowledge to work correctly

the total sum of human knowledge is not found in the digital world.

Salgat · 2024-08-17T14:32:02 1723905122

I'd argue the vast majority of it is, from newspaper articles spanning back centuries to the vast majority of books written, down to every conversation someone has had on public social media websites such as Reddit.

arketyp · 2024-08-16T18:49:05 1723834145

I agree. Although I wonder sometimes how much "about" is involved in my own abstract reasoning.

slashdave · 2024-08-17T00:41:33 1723855293

Given that LLMs are trained on tokens and not symbols, that is, they have no context or world model, I would say that LLMs cannot symbolically abstract, given that they have neither a mechanism nor the data to do so.

xkcd1963 · 2024-08-17T07:03:56 1723878236

A unicellular organism alone has already more variables to deal with than an LLM, now multiply that by the amount of cells a brain has.

w10-1 · 2024-08-16T20:10:59 1723839059

Their hypothesis is a good one:

- A form of reasoning is to connect cause and effect via probability of necessity (PN) and the probability of sufficiency (PS).

- You can identify when the natural language inputs can support PN and PS inference based on LLM modeling

That would mean you can engineer in more causal reasoning based on data input and model architecture.

They define causal functions, project accuracy measures (false positives/negatives) onto factual and counter-factual assertion tests, and measure LLM performance wrt this accuracy. They establish surprisingly low tolerance for counterfactual error rate, and suggest it might indicate an upper limit for reasoning based on current LLM architectures.

Their findings are limited by how constrained their approach is (short simple boolean chains). It's hard to see how this approach could be extended to more complex reasoning. Conversely, if/since LLM's can't get this right, it's hard to see them progressing at the rates hoped, unless this approach somehow misses a dynamic of a larger model.

It seems like this would be a very useful starting point for LLM quality engineering, at least for simple inference.

latentnumber · 2024-08-17T01:24:55 1723857895

> It seems like this would be a very useful starting point for LLM quality engineering, at least for simple inference.

Interesting. Can you elaborate on this? You mean this test can function as a metric or is it just an evaluation for applications?

heyjamesknight · 2024-08-16T21:54:44 1723845284

LLMs have access to the space of collective semantic understanding. I don't understand why people expect cognitive faculties that are clearly extra-semantic to just fall out of them eventually.

The reason they sometimes appear to reason is because there's a lot of reasoning in the corpus of human text activity. But that's just a semantic artifact of a non-semantic process.

Human cognition is much more than just our ability to string sentences together.

mbil · 2024-08-16T22:58:20 1723849100

I might expect some extra-semantic cognitive faculties to emerge from LLMs, or at least be approximated by LLMs. Let me try to explain why. One example of extra-semantic ability is spatial reasoning. I can point to a spot on the ground and my dog will walk over to it — he’s probably not using semantic processing to talk through his relationship with the ground, the distance of each pace, his velocity, etc. But could a robotic dog powered by an LLM use a linguistic or symbolic representation of spacial concepts and actions to translate semantic reasoning into spacial reasoning? Imagine sensors with a measurement to language translation layer (“kitchen is five feet in front of you”), and actuators that can be triggered with language (“move forward two feet”). It seems conceivable that a detailed enough representation of the world, expressive enough controls, and a powerful enough LLM could result in something that is akin to spacial reasoning (an extra-semantic process), while under the hood it’s “just” semantic understanding.

defrost · 2024-08-17T00:50:54 1723855854

Spatial reasoning is more akin to visualising a 3D "odd shaped" fuel tank from 2D schematics and being able to mentally rotate that shape to estimate where a fluid line would be at various angles.

This is distinct from stringing together treasure map instructions in an chain.

throwuxiytayq · 2024-08-17T17:05:52 1723914352

Isn’t spatial navigation a bit like graph walking, though? Also, AFAIK blind people describe it completely differently, and they’re generally confused by the whole concept of 3D perspective and objects getting visually smaller over distance, and so on. Brains don’t work the same for everyone in our species, and I wouldn’t presume to know the full internal representation just based on qualia.

082349872349872 · 2024-08-18T11:03:29 1723979009

I'm always impressed by the "straightedge-and-compass"-flavoured techniques drafters of old used to rotate views of odd 3D shapes from pairs of 2D schematics, in the centuries before CAD software.

https://en.wikipedia.org/wiki/Descriptive_geometry#Finding_t...

https://en.wikipedia.org/wiki/Gaspard_Monge#1789_and_after

matt-attack · 2024-08-16T21:58:23 1723845503

I don’t know if you’re correct. I don’t think you know that our brains are that different? We too need to train ourselves on massive amounts of data. I feel like the kids of reasoning and understanding I’ve seen ChatGPT do are soooo far beyond something like just processing language.

andai · 2024-08-17T00:03:20 1723853000

When I talk to 8B models, it's often painfully clear that they are operating mostly (entirely?) on the level of language. They often say things that make no sense except from a "word association" perspective.

With bigger (400B models) that's not so clear.

It would be silly to say that a fruit fly has the same thoughts as me, only a million times smaller quantitatively.

I imagine the same thing is true (genuine qualitative leaps) in the 8B -> 400B direction.

thuuuomas · 2024-08-16T22:16:52 1723846612

We do represent much of our cognition in language. Sometime I feel like LLMs might be “dancing skeletons” - pulleys & wire giving motion to the bones of cognition.

andai · 2024-08-17T00:05:24 1723853124

But why stop there? All matter and all life is just increasingly fancy machines.

https://en.wikipedia.org/wiki/Philosophical_zombie

Editor's note: I do not promote such a worldview -- my intention is precisely the opposite.

sgt101 · 2024-08-16T22:20:09 1723846809

Our brains have effects that proceeded language. Look at lions for an example.

We are much more (and a little less) than lions in terms of mind.

Minor49er · 2024-08-16T22:05:25 1723845925

What reasoning have you seen coming from ChatGPT?

ninetyninenine · 2024-08-17T00:28:43 1723854523

Do you have any evidence that human cognition (for speaking) is more than just an ability to string sentences together? Do you have any evidence that LLMs don't reason at all?

A perfect machine designed to only string sentences together as perfect responses with no reasoning built it IS Indistinguishable from a machine that only builds sentences from pure reasoning.

Either way nobody understands what's going on in the human brain and nobody understands why LLMs work. You don't know. You're just stating a belief.

eltoxo · 2024-08-17T10:35:31 1723890931

It is like having Google's MusicML output a mp3 of saxophone music and then ask what proof is there that MusicML has not learned to play the saxophone?

In a certain context that is only judging the output, what is meant by "play the saxophone", the model has achieved.

In another context of what is normally meant, the idea the model has learned to play the saxophone is completely ridiculous and not something anyone would even try to defend.

In the context of LLMs and intelligence/reasoning, I think we are mostly talking about the later and not the former.

"Maybe you don't have to blow throw a physical tube to make saxophone sounds, you can just train on tons of output of saxophone sounds then it is basically the same thing"

The enter discussion is ridiculous.

ninetyninenine · 2024-08-17T15:38:17 1723909097

Let's limit the discussion to things that can be actually done with an LLM.

Getting one to blow on a saxophone is outside of this context.

An LLM can't blow on a saxophone period. However it can write and read English.

>In the context of LLMs and intelligence/reasoning, I think we are mostly talking about the later and not the former.

And I'm saying the later is completely wrong. I'm also saying the former is irrelevant. Look this is what you're doing. For the former you're comparing something humans can do to something LLMs Can't do. That's a completely irrelevant comparison.

for the later we are comparing things humans and LLMs BOTH can do. Sometimes humans give superior output, sometimes LLMs give superior output. Given similar inputs and outputs the internal analysis of what's going on whether it's true intelligence or true reasoning is NOT ridiculous.

"Ridiculous" is comparing things where no output exists. LLMs do not have saxophone output where they actually blow into an instrument. There's nothing to be compared here.

slashdave · 2024-08-17T00:43:18 1723855398

There is evidence that many animals have reasoning abilities. Few can talk.

ninetyninenine · 2024-08-17T02:37:16 1723862236

There's also counter evidence that animals lack reasoning abilities. Ever see a gorilla attack it's own reflection or a dog chase it's own tail? Contradictory evidence displaying the ability to reason and the lack of the ability to reason is what is displayed by animals.

Figures that the LLM displays the same contradictory evidence.

But none of this evidence proves anything definitively. Much like human cognition, The LLM is a machine that we built but don't understand. No definitive statement can be made about it.

BriggyDwiggs42 · 2024-08-17T03:31:49 1723865509

You just argued that, because some animals display errors in reasoning, we can call into doubt the claim that any animal reasons. This does not follow. The reflection test, for example, is passed by some animals; we can test it by putting a red dot on their face and seeing if they mess with it. Either way, it’s not even a test of general reasoning abilities. I think you’re giving animals far too little credit.

ninetyninenine · 2024-08-17T04:13:02 1723867982

>You just argued that, because some animals display errors in reasoning, we can call into doubt the claim that any animal reasons.

No, I literally said no claim can be made. And I didn't just say some animals display errors therefore we can doubt any claim that animals reason.

I said we observe contradictory evidence. And my conclusion again was that we don't enough evidence to make ANY claim.

BriggyDwiggs42 · 2024-08-17T07:12:38 1723878758

I believe the claim that some animals can reason is a very reasonable hypothesis, while the claim that no animals can reason is an unlikely one. The contradictory evidence you cite is pointing at some animals doing dumb things. Those animals can be as dumb as rocks and it wouldn’t matter for my claim, I’d only need to show you one reasoning animal to prove it.

ninetyninenine · 2024-08-17T07:54:04 1723881244

Doesn’t prove anything for any animal you show me that can reason I can show you an example of where it can’t.

However I agree with your hypothesis. And I agree that my counter examples are not convincing.

The inner point is this. The same exact thing above can be said for an LLM. You can show evidence for both reasoning and failure to reason.

The illogic here is that contradictory evidence for animals indicates that animals can reason. But the same evidence for LLMs prove they can’t reason.

sibeliuss · 2024-08-17T04:03:33 1723867413

Not to mention all of the developments in mirror testing to account for differences in perception. What they're finding is that self-recognition is more common than assumed.

akomtu · 2024-08-17T01:46:52 1723859212

A musician or an artist hardly need words to create music or paintings.

ekianjo · 2024-08-17T04:52:13 1723870333

Id wage they could not create much if they had no exposure to other art or music in the first place. creation does not come from nothing. composers and artists typically imitate, its very well known.

Jensson · 2024-08-17T07:35:05 1723880105

> creation does not come from nothing

So where is the original guitar music that all the guitar players imitated? Can't have been created by a human since humans imitate and can't create new things, as you say, was it god who created it? Or was it always there?

Humans are really creative and create new stuff. Not sure why people try to say humans aren't.

ninetyninenine · 2024-08-17T02:33:58 1723862038

Same with an LLM.

akomtu · 2024-08-17T07:07:30 1723878450

I thought LLMs are text generators, and for images and music there are separate models. Anyway, the argument was that cognition is limited to text.

ben_w · 2024-08-17T18:10:39 1723918239

I find the terminology is used inconsistently*, so it's probably always worth asking.

To me, a "large language model" is always going to mean "does text"; but the same architecture (transformer) could equally well be trained on any token sequence which may be sheet music or genes or whatever.

IIRC, transformers aren't so good for image generators, or continuums in general, they really do work best where "token" is the good representation.

* e.g., to me, if it's an AI and it's general then it's an AGI, so GPT-3.5 onwards counts; what OpenAI means when they say "AGI" is what I'd call "transformative AI"; there's plenty of people on this site who assert that it's not an AGI but whenever I've dug into the claims it seems they use "AGI" to mean what I'd call "ASI" ("S" for "superhuman"); and still others refuse to accept that LLMs are AI at all despite coming from AI research groups publishing AI papers in AI journals.

ninetyninenine · 2024-08-17T15:31:48 1723908708

No. LLMs can take any type of data. Text is simply a string of symbols. Images, video and music are also a string of symbols. The model is the same algorithm just trained on different types of data.

I never said cognition was limited to text. I just limited the topic itself to cognition involving text.

lr4444lr · 2024-08-17T01:44:30 1723859070

Every culture on earth has seemed to figured the same rudimentary addition and fractions to figure out accounts and inheritance partitioning. Why didn't they come up with inconsistent models of numerics if their developed in total linguistic isolation?

ninetyninenine · 2024-08-17T02:36:02 1723862162

It's possible that we are just LLMs with much much more data such that we don't make inconsistencies. And the data is of course just inborn and hardwired into our neural networks rather than learned.

We don't know, so no statement can really be made here.

BriggyDwiggs42 · 2024-08-17T03:33:13 1723865593

It’s possible that we’re brains floating in space imagining our lives, but we don’t tend to act on that assumption because it’s a bit silly.

ninetyninenine · 2024-08-17T04:16:30 1723868190

No, my claim isn't that out there.

You can't explain how an LLM does what it does and you can't explain how humans do what we do either. With no explanation possible but CLEAR similarities between human responses and LLM responses that pass turing tests... my hypothesis is actually reasonable.

In theory, with enough data and enough neurons we can conceivably construct an LLM that performs better than humans. Neural nets are supposed to able to compute anything anyway. So none of what I said is unreasonable.

BriggyDwiggs42 · 2024-08-17T07:08:00 1723878480

The problem I have with your claim is that it assumes humans use language the way that an LLM does. Humans don’t live in a world of language, they live in the world. When you teach kids vocabulary you point to objects in the environment. Our minds, as a consequence, don’t bottom out at language; we draw on language as a pointer into mental concepts built on sensory experience. LLMs don’t reference something, they’re a crystallization of language’s approximate structure. How do they implement this structure? I dunno, but I do know that they aren’t going to do much more than that because it isn’t rewarded during training. We almost certainly possess something like an LLM in our heads to help structure language, but we also have so, so much more going on up there.

ninetyninenine · 2024-08-17T07:57:46 1723881466

You made a bunch of claims here but you can’t prove any of them to be true.

Also you are categorically wrong about language. LLMs despite the name go well beyond language. LLMs can generate images and sound and analyze them too. They are trained on images and sound. Try ChatGPT.

akomtu · 2024-08-17T07:19:36 1723879176

LLMs don't imitate human intelligence. They imitate machine intelligence, a form of linear symbolic reasoning.

ekianjo · 2024-08-17T04:47:29 1723870049

> Human cognition is much more than just our ability to string sentences together

animals without similar language capabilities dont seem to be too strong at reasoning. it could well be that language and reasoning are heavily linked together

Torkel · 2024-08-16T22:43:28 1723848208

Multimodal models are not limited by semantic understanding though, right?

They are given photos, video, audio.

elzbardico · 2024-08-16T23:25:04 1723850704

For them, it is all a stream of tokens, roughly speaking.

doe_eyes · 2024-08-16T18:15:30 1723832130

This is proposed as a way to measure "true" reasoning by asking a certain type of trick questions, but I don't quite see how this could be a basis of a sustainable benchmark.

If this gets attention, the next generation of LLMs will be trained on this paper, and then fine-tuned by using this exact form of questions to appear strong on this benchmark, and... we're back to square one.

altruios · 2024-08-16T18:25:53 1723832753

Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.

And if there is no measurable difference... we can't measure 'realness', we just have to measure something different (and more useful) 'soundness'. Regardless of it is reasoning or not internally, if it produces a sound and logical argument: who cares?

I agree: I don't think any measure tested linguistically can prove it is internally reasoning... in the same way we haven't truly proven other sentient people aren't in fact zombies (we just politely assume the most likely case that they aren't).

HarHarVeryFunny · 2024-08-16T19:13:35 1723835615

Real reasoning, which can be used to predict outcomes in novel situations, is based on multi-step what-if prediction, perhaps coupled with actual experimentation, and requires things like long-term (task duration) attention, (task duration) working memory, online learning (unless you want to have to figure everything out from scratch everytime you re-encounter it), perhaps (depending on the problem) innate curiosity to explore potential solutions, etc. LLMs are architecturally missing all of the above.

What you might call "fake" reasoning, or memorized reasoning, only works in situations similar to what an LLM was exposed to in it's training set (e.g. during a post-training step intended to embue better reasoning), and is just recalling reasoning steps (reflected in word sequences) that it has seen in the training set in similar circumstances.

The difference between the two is that real reasoning will work for any problem, while fake/recall reasoning only works for situations it saw in the training set. Relying on fake reasoning makes the model very "brittle" - it may seem intelligent in some/many situations where it can rely on recall, but then "unexpectedly" behave in some dumb way when faced with a novel problem. You can see an example of this with the "farmer crossing river with hen and corn" type problem, where the models get it right if problem is similar enough to what it was trained on, but can devolve into nonsense like crossing back and forth multiple times unnecessarily (which has the surface form of a solution) if the problem is made a bit less familiar.

ekianjo · 2024-08-17T04:54:42 1723870482

> which can be used to predict outcomes in novel situations,

the kind of predictions humans are extremely bad in the first place? most people cant even grok anything beyond basic math.

HarHarVeryFunny · 2024-08-17T18:37:31 1723919851

> the kind of predictions humans are extremely bad in the first place? most people cant even grok anything beyond basic math.

We use reasoning/planning all the time in everyday settings - it's not just for math or puzzle solving. Anytime you have to pause for a second to wonder how to do something, or what to say, as opposed to acting or speaking reactively, that's reasoning/planning being used.

Reasoning/planning is a key part of intelligence and why evolution has equipped us with large costly brains - so that we can survive and thrive in varied environments and in novel situations, per our species' adaptation as generalists. Human's are extraordinarily good at reasoning - if you want an example of an animal that can't then a cow or croc would be a better example!

Jensson · 2024-08-17T07:20:33 1723879233

> the kind of predictions humans are extremely bad in the first place

Humans are extremely good at it, basically every human can learn to drive cars safely in novel neighborhoods, that is a skill only humans posses today, no animals or machines can do it and it requires a very impressive level of learning and reasoning.

Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.

ben_w · 2024-08-17T11:59:45 1723895985

> basically every human can learn to drive cars safely in novel neighborhoods

You sure about that?

I was raised in the UK; both my experience of cycling the Rhine and being the passenger when my brother was driving in France, was that we each picked the wrong side of the road once per day.

I doubt either of us has any intuition for a moose or a kangaroo on the road.

A previous partner was American-ish[0], her parents visited the UK and had no idea what this sign meant and so were driving on 60 mph roads at 30 mph: https://commons.wikimedia.org/wiki/File:UK_traffic_sign_671....

Also, she crashed her car in start-stop traffic, a write-off at about 20 mph. And I was cycling to work one day, and a driver, who had stopped at a minor-to-major junction, didn't look my way and pulled out into me as I was passing in front of him — wrote off my bike, probably around 10 mph or less.

I've been in places where red lights are obeyed, and others where they're treated as suggestions.

> Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.

I disagree; a computer can be perfectly symbolic, but an AI has to learn those symbols and their relations from scratch. This is why ChatGPT is so much worse at arithmetic than the hardware it's operating on.

[0] It's complicated: https://en.wikipedia.org/wiki/Third_culture_kid

ekianjo · 2024-08-17T09:40:09 1723887609

> can learn to drive cars safely in novel neighborhoods

we dont call that "prediction" in common language. thats just pattern matching based on driving experience and training.

HarHarVeryFunny · 2024-08-17T19:15:37 1723922137

But in terms of how our brain works, and why it evolved, it really is prediction.

Prediction allows us to behave according to what is about to happen (or what we want to happen) as opposed to just reacting to what is happening right now. "I predict the sabre-tooth is going to run towards me, so I better be prepared", is more adaptive than "Ouch! this fucker has big teeth!".

When we're driving (well) we're continually predicting what other drivers/pedestrians are going to do, what's the best lane to be in for next exit, etc, etc.

TeMPOraL · 2024-08-16T19:44:48 1723837488

> LLMs are architecturally missing all of the above.

So are small children.

I mean, they have a very limited form of the above. So do LLMs, within their context windows.

HarHarVeryFunny · 2024-08-16T20:45:30 1723841130

> So are small children.

No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.

There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!) - our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.

ben_w · 2024-08-17T13:42:58 1723902178

> No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.

Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing. I've seen this first-hand with a modified version of the Queen of Hearts poem used as a logic puzzle at university, and most of us were trying to remember the poem rather than solve the puzzle — the teachers knew this and that was the point of the exercise, to get us to read the actual question instead of what we were expecting: https://en.wikipedia.org/wiki/The_Queen_of_Hearts_(poem)

If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective and we wouldn't need or want neural nets for anything deeper than finding the inputs to send to that model: https://en.wikipedia.org/wiki/Cyc

> There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!)

Yes.

I don't know how much of that really matters given they are weirdly high performance given GPT-3 has a total complexity similar to the connectome of a mid-sized rodent, but they are indeed different.

On the other hand, converting the training run to biological terms would be like keeping said rodent alive for 50,000 years experiencing nothing but a stream of pre-tokenised text from the internet and giving it rewards or punishments according to how well it imagines a missing token. Perhaps a rat would do fine if it didn't typically die of old age 0.006% of the way through such a training process.

But that's just more agreement that it's all very alien.

> our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.

I'm not so sure about that. We have plenty of cognitive biases, and because of these even we have to take notes, check with others, or defer to computers, for more than the most trivial of logic problems.

HarHarVeryFunny · 2024-08-17T19:04:33 1723921473

> Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing

Intelligence is prediction, and the simplest kind of prediction, and what literally comes to mind first is "this time will be same as next time", so in many familiar situations we are just reacting rather than reasoning/planning. It's when what comes to mind first doesn't work (we try it), or we can see the flaw without even trying, that we need to stop and think (reason), such as "hmm... how can I get this stuck lid off the jar - what do I have that can help?".

> If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective

CYC vs LLM is an interesting comparison, and one that I've also made myself, but of course there are differences as well as similarities. The similarity is that both are rules based systems of sorts (maybe we could even regard an LLM as an expert system over the domain of natural language), and in both cases there is the wishful thinking that "scale it up and it'll become sentient and/or god-like"! The major difference is that CYC is essentially just using it's rules to perform a deductive closure over it's inputs (what can it deduce from inputs, using multiple applications of rules), whereas the LLM was trained explicitly with a predictive goal, and with it's domain of natural language it's able to predict (recall) human responses and therefore appear intelligent.

I think prediction (= intelligence) is the key difference here. An LLM is still limited in it's intelligence/predictive ability, most obviously when it comes to multi-step reasoning, but it's natural language ability and flexible (key-based self-attention) predictive architecture make it quite capable when operating on "in distribution" inputs.

matt-attack · 2024-08-16T22:06:40 1723846000

Isn’t the “life experience” that a child is missing precisely analogous to the training an LLM requires?

HarHarVeryFunny · 2024-08-16T22:23:13 1723846993

No, because the child will be able to apply what they've learnt in novel ways, and experiment/explore (what-if or hands-on) to fill in the gaps. The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.

ben_w · 2024-08-18T09:46:34 1723974394

> The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.

I'm not sure how much of that is an architectural limit vs. an implementation limit.

Certainly they have difficulty significantly improving their quality due to the limitations of the architecture, but I have not heard anything to suggest the architecture has difficulty expanding in breadth of tasks — it just has to actually be trained on them, not merely be limited frozen weights used for inference.

(Unless you count in-context learning, which is cool, but I think you meant something with more persistence than that).

chongli · 2024-08-16T20:13:53 1723839233

Children absolutely can solve those “farmer crossing the river” type problems with high reliability. Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.

The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.

simonh · 2024-08-16T21:06:24 1723842384

There’s an interesting wrinkle to this. There’s a faculty called Prefrontal Synthesis that children learn from language early on, which enables them to compose recursive and hierarchical linguistic structures. This also enables them to reason about physical tasks in the sane way. Children that don’t learn this by a certain age (I think about 5) can never learn it. The most common case is deaf children that never learn a ‘proper’ sign language early enough.

So you’re right, and children pick this up very quickly. I think Chomsky was definitely right that our brains are wired for grammar. Nevertheless there is a window of plasticity in young childhood to pick up certain capabilities, which still need to be learned, or activated.

wizzwizz4 · 2024-08-16T21:21:00 1723843260

> Children that don’t learn this by a certain age (I think about 5) can never learn it.

Helen Keller is a counterexample for a lot of these myths: she didn't have proper language (only several dozen home signs) until 7 or so. With things like vision, critical periods have been proven, but a lot of the higher-level stuff, I really doubt critical periods are a thing.

Helen Keller did have hearing until an illness at 19 months, so it's conceivable she developed the critical faculties then. A proper controlled trial would be unethical, so we may never know for sure.

simonh · 2024-08-16T21:40:27 1723844427

Thanks, it’s good to get counter arguments and wider context. This isn’t an area I’m very familiar with, so I’m aware I could easily fall down a an intellectual pothole without knowing. Paper below, any additional context welcome.

I misremembered however. The paper noted evidence of thresholds at 2, 5 and onset of puberty as seeming to affect p mental plasticity in these capabilities so there’s no one cutoff.

https://riojournal.com/article/38546/

ben_w · 2024-08-17T13:27:26 1723901246

LLMs can cope fine with all of them being animals with made-up names, as demonstrated here with me bashing the keyboard randomly: https://chatgpt.com/share/ee013797-a55c-4685-8f2b-87f1b455b4...

chongli · 2024-08-17T15:32:05 1723908725

That solution seems to me like they built a hand-made river-crossing expert system and the LLM is activating it when it pattern-matches on words like "river crossing." From the linked page:

Expert(s): Logic Puzzle Solver, River Crossing Problem Expert

In other words, they cheated! Children don't have river-crossing problem expert systems built into their brains to solve these things.

ben_w · 2024-08-17T15:46:17 1723909577

I asked it to do that, no "cheating" necessary, my "custom instructions" setting is as follows:

--

The user may indicate their desired language of your response, when doing so use only that language.

Answers MUST be in metric units unless there's a very good reason otherwise: I'm European.

Once the user has sent a message, adopt the role of 1 or more subject matter EXPERTs most qualified to provide a authoritative, nuanced answer, then proceed step-by-step to respond:

1. Begin your response like this: *Expert(s)*: list of selected EXPERTs *Possible Keywords*: lengthy CSV of EXPERT-related topics, terms, people, and/or jargon *Question*: improved rewrite of user query in imperative mood addressed to EXPERTs *Plan*: As EXPERT, summarize your strategy and naming any formal methodology, reasoning process, or logical framework used **

2. Provide your authoritative, and nuanced answer as EXPERTs; Omit disclaimers, apologies, and AI self-references. Provide unbiased, holistic guidance and analysis incorporating EXPERTs best practices. Go step by step for complex answers. Do not elide code. Use Markdown.

--

In other words, it can be good at logic puzzles just by being asked to.

chongli · 2024-08-17T16:11:53 1723911113

In other words, you cheated. Those aren’t instructions you would give to a child.

ben_w · 2024-08-17T17:18:09 1723915089

> In other words, you cheated. Those aren’t instructions you would give to a child.

No, but you are cheating by shifting the goal-posts like that.

You previously wrote:

> The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.

I'm literally showing you an LLM doing what you said LLMs couldn't do, and which you used as your justification for claiming it "lacks some general reasoning process which the child possesses".

Well here it is, doing the thing.

Note that at no point here have I tried to claim that AI are fast learners here, or exactly like humans — we also don't give kids, as I said in another comment about rats, 50,000 years of subjective experience reading the internet to get here — but the best models definitely demonstrate the things you're saying they can't do.

TeMPOraL · 2024-08-17T10:32:37 1723890757

> Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.

When was your last experience with small children? Let's define "small" here to 5 y.o. or less, as that's the limit of my direct experience (having a 5 y.o. and an almost 3 y.o. daughters now).

There's a lot riding on "learn how to solve it once" in this case, because it'll definitely take more than a couple exposures to the quiz before a small kid is going to catch on the pattern and suppress their instincts to playfully explore the concept space. And even after that, I seriously doubt you "could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore", because that's symbolic algebra, something teenagers (and even some adults) struggle with.

lucianbr · 2024-08-16T18:42:13 1723833733

Whatever 'real' reasoning is, it's more useful than 'fake' reasoning. We can't measure the difference, but we can use one and not the other.

Multiple articles pointing out that AI isn't getting enough ROI are evidence we don't have 'real', read 'useful' reasoning. The fake reasoning in the paper does not help with this, and the fact that we can't measure the difference changes nothing.

This 'something that we can't measure does not exist' logic is flawed. The earth's curvature existed way before we were able to measure it.

og_kalu · 2024-08-16T23:08:40 1723849720

"Measuring it" in this instance doesn't mean picking up a ruler and measuring distance or seeing phenomena with the naked eye.

Measuring it means that there are actual discernible differences that can be "sussed out" and that and this very important, separate the so called "fake reasoning" from "real reasoning". A suite of trick questions millions of humans would also flounder on ain't it, unless of course humans are no longer general intelligences.

You can't eat your cake and have it. The whole point of a distinction is that it distinguises something from the other. You can't claim a distinction that doesn't distinguish. You're just making things up at that point.

lucianbr · 2024-08-17T18:11:29 1723918289

Your position is that it can't be measured or distinguished. My position is that it can be distinguished: there's not much return on investment from ai, because it's not really intelligent. If it was able to reason generally, it would create plenty of ROI.

You can't use a contradiction between your position and mine to prove my position is absurd.

og_kalu · 2024-08-18T05:57:56 1723960676

I don't know where you have the idea there's been no return of investment from ai but it's so blatantly wrong i don't even know where to begin.

lucianbr · 2024-08-19T07:45:50 1724053550

https://www.economist.com/finance-and-economics/2024/07/02/w...

https://www.ftadviser.com/investments/2024/07/03/ai-will-tak...

https://www.businessinsider.com/ai-return-investment-disappo...

https://www.forbes.com/councils/forbestechcouncil/2024/04/10...

Maybe begin by reading all of these.

The Goldman Sachs report is even discussed on HN: https://news.ycombinator.com/item?id=40837081

There's talk of OpenAI going bankrupt. It's an exaggeration, but they're not making money, that's clear. Which means ROI is zero.

https://www.forbes.com/sites/lutzfinger/2023/08/18/is-openai...

Just simply deny reality, that makes for constructive discussion I guess.

og_kalu · 2024-08-19T13:53:54 1724075634

At worst, literally all of those articles (yes even Goldman) say the return of investment might not be as high as hyped. Nothing about no return or even little return. I'm not the one denying reality here.

adamc · 2024-08-16T21:35:52 1723844152

Made me think of the famous McNamara fallacy: https://en.wikipedia.org/wiki/McNamara_fallacy

"The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide."

doe_eyes · 2024-08-16T18:38:51 1723833531

> Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.

The point is, it's easier to teach an LLM to fake it than to make it - for example, they get good at answering questions that overlap with their training data set long before they start generalizing.

So on some epistemological level, your point is worth pondering; but more simply, it actually matters if an LLM has learned to game a benchmark vs approximate human cognition. If it's the former, it might fail in weird ways when we least expect it.

visarga · 2024-08-16T18:45:42 1723833942

It's like students learning for the test, not really understanding. Or like regular people who don't always understand, just follow the memorized steps. How many times do we really really understand and how many do we just imitate?

I have a suspicion that humans often use abstractions or methods they don't understand. We frequently rely on heuristics, mental shortcuts, and received wisdom without grasping the underlying principles. To understand has many meanings: to predict, control, use, explain, discover, model and generalize. Some also add "to feel".

In one extreme we could say only a PhD in their area of expertise really understands, the rest of us just fumble concepts. I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.

Jensson · 2024-08-17T07:25:39 1723879539

> I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.

I'd say the other way around, education teaches you to not reason and instead just follow the patterns you learned in the book. Most people do reason a ton before they go to school, but then school beats that out of them.

smodo · 2024-08-16T18:46:14 1723833974

I think this talk [0] by Jodie Burchell explains the problem pretty well. In short: you are right that for a given task, only the outcome matters. However, as Burchell shows, AI is sold as being able to generalize. I understand this as the ability to transfer concepts between dissimilar problem spaces. Clearly, if the problem space and or concepts need to be defined beforehand in order for the task to be performed by AI, there’s little generalization going on.

[0] https://youtu.be/Pv0cfsastFs?si=WLoMrT0S6Oe-f1OJ

kridsdale3 · 2024-08-16T21:29:34 1723843774

Then those salesmen need to be silenced. They are selling the public AGI when every scientist says we don't have AGI but maybe through iterative research we can approach it.

thwarted · 2024-08-16T22:22:55 1723846975

Describing some service/product in grandiose terms and misrepresenting it's actual use cases, utility, and applicablity, that it'll solve all your ills and put out the cat, is a grift for as long as there have been salesmen. Silencing such salesmen would probably be a net gain, but it's hardly new and probably isn't going to change because the salesmen don't get hit with the responsibility for following through on the promises they make or imply. They closed the sale and got their commission.

mannykannot · 2024-08-16T19:50:50 1723837850

If it produces a sound (and therefore, by definition, a logically valid) argument, that is about as good as we could hope for. What we want to avoid is the fallacy of assuming that all arguments with true conclusions are sound.

Another thing we want to see in an extended discussion on a particular topic are a consistent set of premises across all arguments.

jcoc611 · 2024-08-16T18:18:54 1723832334

From an external perspective, is there a way to distinguish between simulation of consciousness and the real thing?

If the answer is no, could you make an argument that they are the same?

jrflowers · 2024-08-16T18:26:57 1723832817

You could make the argument that two things that we don’t understand are the same thing because we’re equally ignorant of both in the same way that you could make the argument that Jimmy Hoffa and Genghis Khan are probably buried in the same place, since we have equal knowledge of their locations.

gorjusborg · 2024-08-16T18:36:44 1723833404

Like the original Mechanical Turk.

Clearly there is a difference between a small person hidden within playing chess and a fully mechanical chess automaton, but as the observer we might not be able to tell the difference. The observer's perception of the facts doesn't change the actual facts, and the implications of those facts.

mannykannot · 2024-08-16T19:36:51 1723837011

The Mechanical Turk, however, was not a simulation of human consciousness, reasoning, chess-playing or any other human ability: it was the real thing, somewhat artfully dressed-up as to appear otherwise.

Is it meaningful to say that Alphago Zero does not play Go, it just simulates something that does?

aflukasz · 2024-08-16T18:24:21 1723832661

I like this observation. And it fascinates me each time I see some self proclaimed conscious entity arguing that this just simply cannot be.

altruios · 2024-08-16T18:37:58 1723833478

> self proclaimed conscious entity

Well, I do not proclaim consciousness: only the subjective feeling of consciousness. I really 'feel' conscious: but I can't prove or 'know' that in fact I am 'conscious' and making choices... to be conscious is to 'make choices'... Instead of just obeying the rules of chemistry and physics... which YOU HAVE TO BREAK in order to be conscious at all (how can you make a choice at all if you are fully obeying the rules of chemistry {which have no choice}).

A choice does not apply to chemistry or physics: from where does choice come from - I suspect from our fantasies and nothing from objective reality (for I do not see humans consistently breaking the way chemistry works in their brains) - it probably comes from nowhere.

If you can explain the lack of choice available in chemistry first (and how that doesn't interfere with us being able to make a choice): then I'll entertain the idea that we are conscious creatures. But if choice doesn't exist at the chemical level, it can't magically emerge from following deterministic rules. And chemistry is deterministic not probabilistic (h2 + o doesn't magically make neon ever, or 2 water molecules instead of one).

jvalleroy · 2024-08-16T18:41:15 1723833675

You are confusing consciousness with free will. They are not the same.

Consciousness is about experience, not "choices".

altruios · 2024-08-16T19:12:40 1723835560

Experience and choice are adjacent when they are not the same.

I specifically mean to say the experience of choice is the root of conscious thought - if you do not experience choice, you're experiencing the world the exact same way a robot would.

When pretending you are in the fictional character of a movie vs the fictional character in a video game. one experience's more choice, is making conscious decisions vs a passive experience.

Merely having an experience is not enough to be conscious. You have to actively be making choices to be considered conscious.

Consciousness is about making choices. Choices are a measure of consciousness.

But do choices actually exist?

adamc · 2024-08-16T21:45:17 1723844717

I don't think this is clear at all. What I am experiencing is mostly the inner narrator, the ongoing stream of chatter about how I feel, what I see, what I think about what I see, etc.

What I experience is self-observation, largely directed through or by language processing.

cmpalmer52 · 2024-08-17T00:02:29 1723852949

So, one LLM is hooked up to sound and vision and can understand speech. It is directed to “free associate” an output which is fed to another AI. When you ask it things, the monitoring AI evaluates the truthfulness, helpfulness, and ability to insult/harm others. It then feeds that back as inputs to the main AI which incorporates the feedback. The supervisory AI is responsible for what it says to the outside world, modulating and structuring the output of the central AI. Meanwhile, when not answering or conversing, it “talks to itself” about what it is experiencing. Now if it can search and learn incrementally, uh, I don’t know. It begins to sound like assigning an Id AI, an Ego AI, and a Superego AI.

But it feels intuitive to me that general AI is going to require subunits, systems, and some kind of internal monitoring and feedback.

cscurmudgeon · 2024-08-16T18:33:38 1723833218

Because you don’t see X is not a proof that X doesn’t exist. Here X may or not exist.

X = difference between simulated and real consciousness

Black holes were posited before they were detected empirically. We don't declare them to be non-existent when their theory came out just because we couldn't detect them.

layer8 · 2024-08-16T18:37:21 1723833441

Consciousness and reasoning are orthogonal to each other.

ben_w · 2024-08-18T08:45:56 1723970756

I suspect that depends on which of the 200 definitions of "consciousness" you're using. And some other broad range of definitions of "reasoning".

andrewla · 2024-08-16T19:18:52 1723835932

There's an interesting paper [1] that discusses this very possibility.

[1] https://academic.oup.com/mind/article/LIX/236/433/986238?log...

mensetmanusman · 2024-08-16T18:35:28 1723833328

There might not be an external perspective, just someone else’s internal perspective of the external.

__loam · 2024-08-16T18:23:50 1723832630

Why are you bringing up metaphysics when the concern is that the student has seen the exam, so to speak.

_ugfj · 2024-08-16T18:23:28 1723832608

Throwing all the paintings made prior 1937 into an LLM would never get Guernica out of it. As long as it's an LLM this stands, not just today but all the way to the future.

This empty sophistry of presuming automated bullshit generators somehow can mimic a human brain is laughable.

Please please read https://aeon.co/essays/your-brain-does-not-process-informati...

altruios · 2024-08-16T19:03:55 1723835035

The author fails to provide any argument other than one of incredulity and some bad reasoning with bad faith examples.

The dollar bill copying example is a faulty metaphor. His claim of humans not being information processors and he tries to demonstrate this by having a human process information (drawing from reference is processing an image and giving an output)...

His argument sounds like one from 'it's always sunny'. As if metaphors never improve or get more accurate over time, and that this latest metaphor isn't the most accurate metaphor we have. It is. When we have something better: we'll all start talking about the brain in that frame of reference.

This is an idiot that can write in a way that masks some deep bigotries (in favor of the mythical 'human spirit').

I do not take this person seriously. I'm glossing over all casual incorrectness of his statements - a good number of them just aren't true. the ones I just scrolled to statements like... 'the brain keeps functioning or we disappear' or 'This might sound complicated, but it is actually incredibly simple, and completely free of computations, representations and algorithms' in the description of the 'linear optical trajectory' ALGORTHIM (a set of simple steps to follow - in this case - visual pattern matching).

Where is the sense in what I just read?

_ugfj · 2024-08-16T21:23:43 1723843423

AI bros never take anything seriously aside from the bullshit they are drunk on

https://disconnect.blog/what-comes-after-the-ai-crash/ this is your future

imtringued · 2024-08-16T18:23:24 1723832604

You can shuffle the choices in multiple choice based benchmarks and models that memorize the benchmark tend to do really badly, almost worse than random guessing.

slashdave · 2024-08-16T18:28:29 1723832909

Easily overcome by training models on randomized questions. Trivial to implement too.

tsukikage · 2024-08-16T19:45:45 1723837545

If the model works correctly on way more questions than there is room to store a giant list of recorded answers for, some kind of deduction of generalised rules must have taken place.

"there is room to represent recorded answers for" is doing a lot of work, of course; it might e.g. have invented compression mechanisms better than known ones instead.

Jensson · 2024-08-17T07:29:38 1723879778

It does system 1 thinking, it doesn't do system 2 thinking. That makes it really dumb, but can still answer a very wide range of questions it hasn't seen exact matches of since system 1 thinking can pattern match in complex ways.

> it might e.g. have invented compression mechanisms better than known ones instead.

You mean humans? Humans invented the transformer architecture, that is what compresses the human text to this form where semantics of text gets encoded instead of the raw words.

slashdave · 2024-08-16T21:00:02 1723842002

You cannot create generalized questions using randomization alone

mannykannot · 2024-08-17T13:29:52 1723901392

I don't think we can just assume that training with this exact form of questioning will lead to a strong performance on such questions. For one thing, given the LLM propensity for hallucinating, I do not think we can be confident that an LLM, after this training, will reliably employ the correct model to answer a given question.

IgorPartola · 2024-08-17T01:56:51 1723859811

So the best way I can describe how humans abstract our thinking is that “a thought about a thought is itself a thought”. I am not an expert but I don’t believe LLMs can arbitrarily abstract their current “thought”, put it down for later contemplation, trace back to a previous thought or a random thought, and out of these individual thoughts form an understanding.

I would expect that a higher level algorithm would be required to string together thoughts into understandings.

Then again, I wonder if what we are going to see is fundamentally different kinds of intelligences that just do not necessarily think like humans. Chimps cannot tell you about last Tuesday since their memory seems a lot more associative than recall based. But they have situational awareness that even our superheroes in our comics do not generally posses (flash some numbers in front of a chimp for one second and he will remember all their positions and order even if you distract him immediately after). Maybe LLMs cannot be human intelligent but you could argue that they are a kind of intelligence.

layer8 · 2024-08-16T19:00:32 1723834832

Regarding AI reasoning and abstraction capabilities, the ARC Prize competition is an interesting project: https://arcprize.org/

abcde777666 · 2024-08-16T22:40:39 1723848039

There are many pillars of our own intelligence that we tend to gloss over. For instance - awareness and the ability to direct attention. Or something as simple as lifting your hand and moving some fingers at will. Those things impress me far more than the noises we produce with our mouths!

slashdave · 2024-08-16T18:19:35 1723832375

There seems to be the implicit (and unspoken) assumption that these probability terms (PN,PS) are all independent. However, clearly they are not.

croes · 2024-08-16T18:27:43 1723832863

rockskon · 2024-08-17T01:18:16 1723857496

Simple answer to the question posed by the headline:

No.

As much as Google, Microsoft, OpenAI, and every other company that's poured billions into this technology want to think otherwise - more training data will not turn your AI model into AGI.

Any argument to the contrary is copium.

sweeter · 2024-08-16T23:19:31 1723850371

probability? None imo.

kantapproves · 2024-08-16T22:49:15 1723848555

I believe Hume (and Kant) have some things to say about this.

The connection might need some fleshing out, but I believe, and I might be wrong here, it was decided a few centuries ago that probabilities alone cannot explain causality. It would be a hoot, wouldn’t it?

Perhaps AI just need some a priori synthetics to spruce it up.