My impression is that LLMs “pattern-match” on a less abstract level than general-purpose reasoning requires. They capture a large number of typical reasoning patterns through their training, but it is not sufficiently decoupled, or generalized, from what the reasoning is about in each of the concrete instances that occur in the training data. As a result, the apparent reasoning capability that LLMs exhibit significantly depends on what they are asked to reason about, and even depends on representational aspects like the sentence patterns used in the query. LLMs seem to be largely unable to symbolically abstract (as opposed to interpolate) from what is exemplified in the training data.
Take the classic trick question, for example: "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
Most people give a wrong answer because they, too, "pattern match".
People pattern match, but people also apply logic and abstract thinking, far surpassing the abilities of LLMs, and this is a fundamental limitation that won't get fixed by more processing power and training data.
The big difference in behavior is in how people or LLMs approach new problems. A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements. A LLM is also incapable of learning, once you point out their error, or of admitting that it doesn't know.
Regarding people, I find it interesting that even lower IQ people are capable of tasks that are currently completely out of reach for AI. It's not just the obvious, such as self-reflection, but even tasks that should've been solved by AI already, such as driving to the grocery store.
There's lots of assumptions here that we've got examples to disprove:
> but people also apply logic and abstract thinking
Which people? If we universally did that as a default, elections would look massively different.
> A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements
Getting good answers for coding questions on my private code/databases disproves this. The requirements have been changed significantly. I've been through ~20 turn chat with LLM investigating a previously unseen database, suggesting queries to get more information, acting on it to create hypotheses and follow up on them.
> A LLM is also incapable of learning, once you point out their error,
This is the standard coding agent loop - you feed back the error to get a better answer through in-context learning. It works.
> or of admitting that it doesn't know.
Response from gpt: «I apologize, but I'm not able to find any reliable information about a person named "Thrust Energetic Aneksy."»
> Response from gpt: «I apologize, but I'm not able to find any reliable information about a person named "Thrust Energetic Aneksy."»
The model says that because it is trained to say that to specific queries. They have given it a lot of prompts with "Who is X" and showing the responses "I don't know about that person".
The reason you don't see "I don't know" much in other kinds of problems is that there it isn't easy to create such data examples where the model says I don't know while still making the model solve problems that are in its dataset, since it starts to pattern match all sort of math problems to "I don't know" even when it could solve it.
A human can look at his own thoughts and realize how he is solving it, and thus knows a lot better what he knows and not. LLMs aren't like that, they don't really know what they do know. The LLM doesn't know its own weights when it picks a word, it has no clue how certain it is, it just predicts whether a human would have said "I don't know" to the question, not whether itself would know.
> LLMs aren't like that, they don't really know what they do know.
They actually do. The information is saved there, you just need to ask explicitly, because the usual response doesn't expose it. (But likely can be fine tuned to do that) https://ar5iv.labs.arxiv.org/html/2308.16175
Also the original claim was that they're not capable of responding with "I don't know", so that's what I was addressing.
> Which people? If we universally did that as a default, elections would look massively different.
All people, otherwise we wouldn't be able to do basic tasks, such as finding edible food or recognise danger.
I dislike how you discount the way other people vote as being somehow irrational, while I'm sure you consider your own political thinking as being rational. People always vote according to their own needs and self-interest, and in terms of politics, things are always nuanced and complicated. The fact that many people vote contrary to your wishes is actually proof that people can think for themselves.
> Getting good answers for coding questions on my private code/databases disproves this.
I use GitHub Copilot and ChatGPT every day. It only answers correctly when there's a clear, and widely documented pattern, and even then, it can hallucinate. Don't get me wrong, it's still useful, but it shows in no way an ability to reason, or the capacity to admit that a solution is out of its reach.
Your experience with coding is kind of irrelevant to the question at hand.
> This is the standard coding agent loop - you feed back the error to get a better answer through in-context learning. It works.
This looks like it works sometimes, but only if "pointing out the error" is coincidentally the same as "clarifying problem spec". Admittedly for really simple cases those are the same or hard to tell apart. But it always seems clear that adding error correction context is similar to adding additional search terms to get to a better stack-overflow page. This feels very different than updating any kind of logical model for the problem representation.
It doesn't have to be as explicit as an extra term. The error feedback can be a failing test which was just modified, or a self-reflection result from a prompt like "does what you created satisfy the original request?".
Updating the logical model of the problem also happens when you do a database investigation I mentioned earlier. There's both information gathering and building on it to ask more specific questions.
> I've been through ~20 turn chat with LLM investigating a previously unseen database, suggesting queries to get more information, acting on it to create hypotheses and follow up on them.
You do know that when that happens the LLM usually just throws random stuff at you until you are happy? That is much easier to do than to reason, LLM solved the much easier problem of looking smart than being smart, trick is to make the other person solve the problem for you while attributing it to you.
You see humans do this as well in hiring interviews etc, it is really easy to trick people who want you to succeed.
When the hypotheses make sense, the tests for them work and the next steps are based on the previous test results that's not "random stuff". I made that explicit in the first message. This problem wasn't solvable with random guesses.
> The big difference in behavior is in how people or LLMs approach new problems. A LLM is incapable of solving a problem that's similar to one that it was trained on, but with slightly changed requirements. A LLM is also incapable of learning, once you point out their error, or of admitting that it doesn't know.
When are we going to stop defending LLM intelligence on the grounds that "some people are stupid too"?
Human intelligence is not benchmarked by its lowest common denominators (just like how we don't judge LLM's on the basis of tiny 100M parameter models).
The paper linked here starts off by asking whether LLMs can “solve complex problems in ways that resemble human thinking.” Why would we try to answer that question without discussing how humans think?
> Why would we try to answer that question without discussing how humans think?
The post you are replying to does not suggest that we should. If anything, it is suggesting the opposite - that we should be considering the full range of human abilities (or at least those that are effective in solving complex problems) when addressing the question you have quoted.
I agree with most of what viraptor has said in this thread, but not in this particular case.
I'm a different person than you replied to, but I agree with them.
Your previous reply was to:
> Why would we try to answer that question without discussing how humans think?
And by replying to that with
> Coz we simply don't understand how we understand, period.
Well, when we don't understand how we understand, that is exactly when we should be discussing how humans think. Or at least how we think we think. And the bat and a ball example relates to research about how we think we think.
So yeah, your reply definitely comes across as saying the first half, "So we shouldn't discuss it", while finishing it with "period" also suggests "or try to understand it at all".
> Human intelligence is not benchmarked by its lowest common denominators (just like how we don't judge LLM's on the basis of tiny 100M parameter models).
Indeed, but we do also catalogue our cognitive biases — kinda the human version of what are now called hallucinations when LLMs do them. (When Stable Diffusion does it, it's "oh god the fingers").
It's worth caring about both strengths and weaknesses.
Yes it is. Warning labels on products and laws aren't written with genius level people in mind,
but the lowest common denominator. Tv politics is the way is it because of said lowest common denominator of intelligence. It's not some people are dumb too, most people are dumb in the wrong contexts, even you and I pre-coffee or sleep deprived or wasted drunk or high.
IQ tests are averaged out so 100 is average intelligence, not the high point.
When are we going to define what intelligence actually means, so these discussions can start making sense?
5 years ago, it was still common to read things like "AIs will never be as intelligent as mice, let alone humans", today, it's "sure, AIs are as intelligent as some humans, but not as intelligent as the right humans".
Noticed how everyone dropped the Turing Test like a hot potato as the gold standard for intelligence, the moment it became apparent that LLMs were about to pass it? Try to find a recent high-profile article invoking the Turing Test. Crickets. The intellectual dishonesty is nauseating.
The entire discussion is dominated by smart people who are scared shitless that AI is going to show them just how ordinary they are in the grand scheme of things.
Wait are LLMs crushing professionals at coding? Anyways, LLMs are clearly better than humans at memorizing information, and they most certainly have some ability to retrieve and adapt this information to solve problems. On tests, they can do some degree of generalization to answer questions with different wording from their training data. However, because humans are comparably quite bad at memorizing, we know that when humans do well on medical tests, it usually (not always) means that they “understand” how, for example, anatomy works. They’ve formed some model in their head of how to break the body into parts and parts into subparts and how parts interact and so on because doing so is actually easier for most humans than trying to remember the linguistic facts.
Understanding in this sense seems different from the memorizing+flexible retrieval we know LLMs exceed at because it extends much further beyond its training distribution. If I ask a (good) medical professional a question unlike anything they’ve ever seen before, they’ll be able to draw on their “understanding” of anatomy to give me a decent guess. LLMs are inconsistent on these kinds of questions and often drop the ball.
We can also point to training data requirements as a discrepancy. In brain organoid experiments, we observe a much lower quantity of examples are required to achieve neural-network-like results. This isn’t surprising to me. Biological neurons have exceedingly complex behavior; it takes hundreds of neural network nodes to recreate the behavior of a single neuron, and they can reorganize their network structure in response to stimuli, form loops, operate in non-discretized time, etc. We don’t know how far transformers will be able to go, but I think that if you want to make a model that holds a candle to that sort of complexity, you’ll need at least many, many orders of magnitude more scale or, more likely, a different, less limiting architecture.
Exams, yes. Not the work yet. This is also why they've not made doctors and lawyers obsolete: even in the fields where the models perform the best, you're getting the equivalent of someone fresh out of university with no real experience.
I suspect you're right about your point with generalise vs. memorise. Not absolutely sure, but I do suspect so.
I also suspect we'll get transformative AI well before we can train any AI with as few examples as any organic brain needs. Unless we suddenly jump to that with one weird trick we've been overlooking this whole time, which wouldn't hugely surprise me given how many radical improvements we've made just by trying random ideas.
And to short-circuit the argument: approximately no one is, at the same time, a professional coder, writer, artist, detective, translator, doctor, and <insert one or ten or a hundred more occupations here>. GPT-4 is all of them at once, and outperforms an average professional of any of those occupations in their respective field.
You said it outperforms the average professional in any of these occupations (including detective and artist?) and more? That’s a very bold claim. Can you substantiate that somehow? I don’t disagree for a second that incrementally better LLMs + robotics will be able to automate a large portion of labor, but that doesn’t in my eyes make them smarter than humans. Jobs today aren’t exactly the fullest realization of human potential; you wouldn’t call a robotic arm smarter than a human for being better on the line.
It's the other way around. Any one human will only beat an AI at two or three tasks — the first two being our day job and our mother tongue, the third possibly being our hobby — while the AI will out-perform us at thousands of others tasks.
Only if you count humanity as a whole do we beat an LLM at everything.
There are two logical problems with what you've written here.
1. You're counting "our day job" as one task, while counting all individual prompts an AI can answer as each being their own task. This is obviously misleading (and I could just as easily perform the same compression in the opposite direction - chatbots can only do one task: "be a chatbot").
2. You're not controlling for training. It's already meaningless to compare the intelligence of one entity trained to do a task with another entity that was not trained to do that task.
But even ignoring those fallacies, what you've written here is still not true. 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer. Regurgitating facts is not evidence of intelligence, and humans can do it easily (and do it without hallucinating, which is key) if you just give them access to the same information.
But when you go in the opposite direction, achieving parity is no longer so simple. When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away. They are fundamental limitations of the AI's capabilities, rather than just a result of not possessing enough information.
What many people don't seem to realize is, when merely getting up in the morning and brushing your teeth you are already exercising more intelligence than any AI has ever possessed. (Anyone who has ever worked in robotics or visual processing can attest to that enthusiastically.) So don't even get me started on actual critical thinking.
1. This is indeed a simplification, but for any single task in your day job, it would be those tasks where you have the most experience. For example, I used to write video games, AI does a better job of game design than me, but I'm the better programmer.
2. Unimportant, as the consideration I was rejecting was performance in tasks.
As it happens, some of my other recent messages demonstrate that I agree they are low intelligence for this exact reason.
> 99% of things an LLM could supposedly "out-perform" a human at, the human would actually outperform if you provided that human with the same text resources the LLM used to conjure its answer
Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does?
Nope.
Could I do it with a search engine, which is what RAG assisted LLMs do?
Perhaps.
> humans can do it easily (and do it without hallucinating
Hell no we mess that up almost constantly.
> When LLM's fail to do math, fail to strategize, fail to process rules of abstract games, etc. there is no textbook or lecture or article you can provide to the LLM that magically makes the problem go away
I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…
> Could I pass a bar exam of a medical exam, by reading the public internet, with no notes and just from memory, which is what a base model does? Nope.
You're -still- ignoring the fact that these models spend millions of GPU-hours in training. I'm sure you could manage.
> Hell no we mess that up almost constantly.
"Almost constantly"? Is this satire? I'd fire any such person, and probably recommend them psychiatric treatment.
AI-hype people really think so little of human beings? I certainly hope my pilot isn't "almost constantly" hallucinating his aviation training.
> I'm sure I've seen this done. I wonder if I'm hallucinating that certainty…
Do we still have Turing Test competitions? Now that everybody's used to LLMs, and detecting their style, it might be interesting. I'm not sure the LLMs would do very well against humans who were experienced and alert.
"detecting their style" is trivial only because people take the default output and run with it. You can significantly alter it if you want and it would be an obvious thing to do for such test. Right now the default output styles of public LLMs have no reason to try avoiding detection and is not trained to do it.
I didn't exactly mean superficial style, but properties like creativity, liveliness, and consistency. I hesitate to say that these are things LLMs can't do, because I'll provoke somebody into telling me that they totally can, but weaknesses like this could be ways for human intuition to detect an LLM. It might have to be a long competition, because an unusual set of prompts (or training, even) is likely to initially impress the humans: time should be allowed for novelty to wear off.
> properties like creativity, liveliness, and consistency
I don't believe we train The LLMs for those things specifically. It will be interesting to see if some datasets for this appear. I think we can still make huge improvements by just caring about that more.
> Noticed how everyone dropped the Turing Test like a hot potato as the gold standard for intelligence, the moment it became apparent that LLMs were about to pass it? Try to find a recent high-profile article invoking the Turing Test. Crickets. The intellectual dishonesty is nauseating.
I think you could reasonably describe that as "pattern matching at the wrong level". If you tell most people the first answer is wrong they will go up a level or two and work out the correct answer.
I asked this question in a college-level class with clickers. For the initial question I told them, "This is a trick question, your first answer might not be right". Still less than 10% of students got the right answer.
This idea that students are "more generally intelligent" requires specific arguments to support. ChatGPT alone has a vast breadth of knowledge that is impossible for a human to keep up with as well as superior skills to a student in any number of fields. I can find evidence that it has an IQ of around 124 [0] which is going to stretch most students (although in fairness the same article also speculates an IQ of 0).
Students can keep ahead of it with training in specific fields and it has a few weaknesses in specific skills but I think someone could make a reasonable claim that ChatGPT has superior general intelligence.
> superior skills to a student in any number of fields
It's abhorrent any time we measure pure distilled intelligence.
When asked to come up with any non-basic novel algorithm and data structure, it creates nonsense.
Especially when you ask it to create vector instruction friendly memory layouts and it can't code in its preferred way. I had some fun trying to make it spit out a brute-force-ish solver for a problem involving basic orbital mechanics and some forces. Wouldn't even want try something more complicated. It can do generalized solvers somewhat, since it can copy that homework, but none that can express the kinds of terms you'd be working with (despite those also having code available in some research papers).
Speaking of which, it cannot even figure out some basic truths in orbital mechanics that can be somewhat easily derived from the formulas commonly given, nine times out of ten (you can get there if you're very patient and are able to filter its wrong answers).
But at the end of the day it was still a valuable tool to me as I was learning these things myself, since despite being often wrong, it nevertheless spat out useful things I could plug into Google to find more trustworthy sources that would teach me. Really neat if you're going in blind into a new subject.
Claiming that Chatgpt is more intelligent than a student is the same as saying an encyclopedia or a library is more intelligent than a student. Sure they retain more information. But Chatgpt is not AGI and it has no idea what it is even talking about.
> I can find evidence that it has an IQ of around 124
Despite me being someone who is generally impressed by the best LLMs, I think this says more about IQ tests than it does any AI.
Which isn't to shame those tests — we made those tests for humans, we were the only intelligence we knew of that used complex abstract language and tools until about InstructGPT — but it does mean we should reconsider what we mean by "intelligence".
My gut feeling is that a better measure is how fast we learn stuff. Computers have a speed advantage because transistors outpace synapses by the degree to which a pack of wolves outpaces continental drift (yes I did do the calculation), so what I mean here is how many examples rather than how many seconds.
But as I say, gut feeling — this isn't a detailed proposal for a new measure, it likely needs a lot of work to even turn this into something that can be a good measure.
This question becomes easy if you think about it algebraically/mathematically, but comically hard thinking about it intuitively, as in, trying to reason with language.
Indeed, domain-specific reasoning ability is a seminal psychology result: people are in general sucky at a specific type of logic puzzle, unless you frame it as enforcing a social rule - then they're excellent.
Actually, no. We can do 'real' reasoning and come up with novel conclusions. In fact, we have such an example going all the way back to Plato in Meno, about 2300 years ago. It's the doubling of the square dialog of Socrates and the slave boy.
It might still say 5 cents if the problem was "the bat costs 90 cents more than the ball", since 5 cents is overwhelmingly the right answer to questions that look like this.
That's a great conversation. I anthropomorphically feel bad for Gemini, in that it actually got a hard question right, and then it was gaslit into believing it was wrong.
Since I've known this puzzle since childhood, my "pattern match" answer is the correct answer. I wonder if someone took it up a level and posed a puzzle that looked like this puzzle but had some other subtle complication, whether it would fool me.
I'm not saying this is you, but in my experience, the conversation changes because people flatly refuse to believe any concrete examples. I would and many companies do trust an LLM with low to medium skilled text processing tasks: internal knowledge sharing, first line customer support, low stakes document review. Github has their famous Copilot product, which I don't personally find too useful but many of my coworkers use.
Really depends on what you understand the word to mean. All of the things I listed are “important” in the sense that they’ve gotta get done, and in the sense that it matters how well they’re done. Would I trust an LLM with something business-critical? No, but there’s quite a lot of people I’d put in the same boat.
The difference is some humans exist that can be trusted with business critical tasks but no AIs exist with that level of competence.
This is actually a great example of what I'm talking about. All this talk about how useful AIs are, and how humans have flaws and yet the conclusion is always the same. AIs are only useful for tasks that are relatively easy and have a higher tolerance for failure.
For some reasons LLMs get a lot of attention. But.. while simplicity is great it has limits. To make model reason you have to put it in a loop with fallbacks. It has to try possibilities and fallback from false branches. Which can be done on a level higher. This can be either algorithm, another model, or another thread in the same model. To some degree it can be done by prompting in the same thread. Like asking LLMs to first print high level algorithm and then do it step by step.
LLMs already do this. Their many wide layers allow for this, and as a final fallback, their output adjusts based on every token they generate (it's not all decided at once). All your statement really means is a vague "well it should do it more!" which yeah, is the goal of each iteration of GPT etc.
LLMs get a lot of attention because they were the first architecture that could scale to a trillion parameters while still improving with every added parameter.
When the task, or part of it, is np complete there is no way around. Model has to try all options till it find working one. In a loop. And this can be multi-step with partial fallback. That's how humans are thinking. They can see only to some depth. They may first determine promising directions. Select one, go dipper. Fallback if it doesn't work. Pattern matching mentioned is simplest one step solution. LLMs are doing it with no problems.
I wonder if reasoning is intrinsically tied to the need for a model to be able to perform well with extremely limited training data. Current LLMs require the sum total of all human knowledge to work correctly (even a few decades ago, there was no where near enough data to train one to be useful, regardless of available computing), meanwhile humans only need access to a few books and conversations with uneducated people growing up to achieve brilliant ideas.
I'd argue the vast majority of it is, from newspaper articles spanning back centuries to the vast majority of books written, down to every conversation someone has had on public social media websites such as Reddit.
Given that LLMs are trained on tokens and not symbols, that is, they have no context or world model, I would say that LLMs cannot symbolically abstract, given that they have neither a mechanism nor the data to do so.
- A form of reasoning is to connect cause and effect via probability of necessity (PN) and the probability of sufficiency (PS).
- You can identify when the natural language inputs can support PN and PS inference based on LLM modeling
That would mean you can engineer in more causal reasoning based on data input and model architecture.
They define causal functions, project accuracy measures (false positives/negatives) onto factual and counter-factual assertion tests, and measure LLM performance wrt this accuracy. They establish surprisingly low tolerance for counterfactual error rate, and suggest it might indicate an upper limit for reasoning based on current LLM architectures.
Their findings are limited by how constrained their approach is (short simple boolean chains). It's hard to see how this approach could be extended to more complex reasoning. Conversely, if/since LLM's can't get this right, it's hard to see them progressing at the rates hoped, unless this approach somehow misses a dynamic of a larger model.
It seems like this would be a very useful starting point for LLM quality engineering, at least for simple inference.
LLMs have access to the space of collective semantic understanding. I don't understand why people expect cognitive faculties that are clearly extra-semantic to just fall out of them eventually.
The reason they sometimes appear to reason is because there's a lot of reasoning in the corpus of human text activity. But that's just a semantic artifact of a non-semantic process.
Human cognition is much more than just our ability to string sentences together.
I might expect some extra-semantic cognitive faculties to emerge from LLMs, or at least be approximated by LLMs. Let me try to explain why. One example of extra-semantic ability is spatial reasoning. I can point to a spot on the ground and my dog will walk over to it — he’s probably not using semantic processing to talk through his relationship with the ground, the distance of each pace, his velocity, etc. But could a robotic dog powered by an LLM use a linguistic or symbolic representation of spacial concepts and actions to translate semantic reasoning into spacial reasoning? Imagine sensors with a measurement to language translation layer (“kitchen is five feet in front of you”), and actuators that can be triggered with language (“move forward two feet”). It seems conceivable that a detailed enough representation of the world, expressive enough controls, and a powerful enough LLM could result in something that is akin to spacial reasoning (an extra-semantic process), while under the hood it’s “just” semantic understanding.
Spatial reasoning is more akin to visualising a 3D "odd shaped" fuel tank from 2D schematics and being able to mentally rotate that shape to estimate where a fluid line would be at various angles.
This is distinct from stringing together treasure map instructions in an chain.
Isn’t spatial navigation a bit like graph walking, though? Also, AFAIK blind people describe it completely differently, and they’re generally confused by the whole concept of 3D perspective and objects getting visually smaller over distance, and so on. Brains don’t work the same for everyone in our species, and I wouldn’t presume to know the full internal representation just based on qualia.
I'm always impressed by the "straightedge-and-compass"-flavoured techniques drafters of old used to rotate views of odd 3D shapes from pairs of 2D schematics, in the centuries before CAD software.
I don’t know if you’re correct. I don’t think you know that our brains are that different? We too need to train ourselves on massive amounts of data. I feel like the kids of reasoning and understanding I’ve seen ChatGPT do are soooo far beyond something like just processing language.
When I talk to 8B models, it's often painfully clear that they are operating mostly (entirely?) on the level of language. They often say things that make no sense except from a "word association" perspective.
With bigger (400B models) that's not so clear.
It would be silly to say that a fruit fly has the same thoughts as me, only a million times smaller quantitatively.
I imagine the same thing is true (genuine qualitative leaps) in the 8B -> 400B direction.
We do represent much of our cognition in language.
Sometime I feel like LLMs might be “dancing skeletons” - pulleys & wire giving motion to the bones of cognition.
Do you have any evidence that human cognition (for speaking) is more than just an ability to string sentences together? Do you have any evidence that LLMs don't reason at all?
A perfect machine designed to only string sentences together as perfect responses with no reasoning built it IS Indistinguishable from a machine that only builds sentences from pure reasoning.
Either way nobody understands what's going on in the human brain and nobody understands why LLMs work. You don't know. You're just stating a belief.
It is like having Google's MusicML output a mp3 of saxophone music and then ask what proof is there that MusicML has not learned to play the saxophone?
In a certain context that is only judging the output, what is meant by "play the saxophone", the model has achieved.
In another context of what is normally meant, the idea the model has learned to play the saxophone is completely ridiculous and not something anyone would even try to defend.
In the context of LLMs and intelligence/reasoning, I think we are mostly talking about the later and not the former.
"Maybe you don't have to blow throw a physical tube to make saxophone sounds, you can just train on tons of output of saxophone sounds then it is basically the same thing"
Let's limit the discussion to things that can be actually done with an LLM.
Getting one to blow on a saxophone is outside of this context.
An LLM can't blow on a saxophone period. However it can write and read English.
>In the context of LLMs and intelligence/reasoning, I think we are mostly talking about the later and not the former.
And I'm saying the later is completely wrong. I'm also saying the former is irrelevant. Look this is what you're doing. For the former you're comparing something humans can do to something LLMs Can't do. That's a completely irrelevant comparison.
for the later we are comparing things humans and LLMs BOTH can do. Sometimes humans give superior output, sometimes LLMs give superior output. Given similar inputs and outputs the internal analysis of what's going on whether it's true intelligence or true reasoning is NOT ridiculous.
"Ridiculous" is comparing things where no output exists. LLMs do not have saxophone output where they actually blow into an instrument. There's nothing to be compared here.
There's also counter evidence that animals lack reasoning abilities. Ever see a gorilla attack it's own reflection or a dog chase it's own tail? Contradictory evidence displaying the ability to reason and the lack of the ability to reason is what is displayed by animals.
Figures that the LLM displays the same contradictory evidence.
But none of this evidence proves anything definitively. Much like human cognition, The LLM is a machine that we built but don't understand. No definitive statement can be made about it.
You just argued that, because some animals display errors in reasoning, we can call into doubt the claim that any animal reasons. This does not follow. The reflection test, for example, is passed by some animals; we can test it by putting a red dot on their face and seeing if they mess with it. Either way, it’s not even a test of general reasoning abilities. I think you’re giving animals far too little credit.
I believe the claim that some animals can reason is a very reasonable hypothesis, while the claim that no animals can reason is an unlikely one. The contradictory evidence you cite is pointing at some animals doing dumb things. Those animals can be as dumb as rocks and it wouldn’t matter for my claim, I’d only need to show you one reasoning animal to prove it.
Not to mention all of the developments in mirror testing to account for differences in perception. What they're finding is that self-recognition is more common than assumed.
Id wage they could not create much if they had no exposure to other art or music in the first place. creation does not come from nothing. composers and artists typically imitate, its very well known.
So where is the original guitar music that all the guitar players imitated? Can't have been created by a human since humans imitate and can't create new things, as you say, was it god who created it? Or was it always there?
Humans are really creative and create new stuff. Not sure why people try to say humans aren't.
I find the terminology is used inconsistently*, so it's probably always worth asking.
To me, a "large language model" is always going to mean "does text"; but the same architecture (transformer) could equally well be trained on any token sequence which may be sheet music or genes or whatever.
IIRC, transformers aren't so good for image generators, or continuums in general, they really do work best where "token" is the good representation.
* e.g., to me, if it's an AI and it's general then it's an AGI, so GPT-3.5 onwards counts; what OpenAI means when they say "AGI" is what I'd call "transformative AI"; there's plenty of people on this site who assert that it's not an AGI but whenever I've dug into the claims it seems they use "AGI" to mean what I'd call "ASI" ("S" for "superhuman"); and still others refuse to accept that LLMs are AI at all despite coming from AI research groups publishing AI papers in AI journals.
No. LLMs can take any type of data. Text is simply a string of symbols. Images, video and music are also a string of symbols. The model is the same algorithm just trained on different types of data.
I never said cognition was limited to text. I just limited the topic itself to cognition involving text.
Every culture on earth has seemed to figured the same rudimentary addition and fractions to figure out accounts and inheritance partitioning. Why didn't they come up with inconsistent models of numerics if their developed in total linguistic isolation?
It's possible that we are just LLMs with much much more data such that we don't make inconsistencies. And the data is of course just inborn and hardwired into our neural networks rather than learned.
We don't know, so no statement can really be made here.
You can't explain how an LLM does what it does and you can't explain how humans do what we do either. With no explanation possible but CLEAR similarities between human responses and LLM responses that pass turing tests... my hypothesis is actually reasonable.
In theory, with enough data and enough neurons we can conceivably construct an LLM that performs better than humans. Neural nets are supposed to able to compute anything anyway. So none of what I said is unreasonable.
The problem I have with your claim is that it assumes humans use language the way that an LLM does. Humans don’t live in a world of language, they live in the world. When you teach kids vocabulary you point to objects in the environment. Our minds, as a consequence, don’t bottom out at language; we draw on language as a pointer into mental concepts built on sensory experience. LLMs don’t reference something, they’re a crystallization of language’s approximate structure. How do they implement this structure? I dunno, but I do know that they aren’t going to do much more than that because it isn’t rewarded during training. We almost certainly possess something like an LLM in our heads to help structure language, but we also have so, so much more going on up there.
You made a bunch of claims here but you can’t prove any of them to be true.
Also you are categorically wrong about language. LLMs despite the name go well beyond language. LLMs can generate images and sound and analyze them too. They are trained on images and sound. Try ChatGPT.
> Human cognition is much more than just our ability to string sentences together
animals without similar language capabilities dont seem to be too strong at reasoning. it could well be that language and reasoning are heavily linked together
This is proposed as a way to measure "true" reasoning by asking a certain type of trick questions, but I don't quite see how this could be a basis of a sustainable benchmark.
If this gets attention, the next generation of LLMs will be trained on this paper, and then fine-tuned by using this exact form of questions to appear strong on this benchmark, and... we're back to square one.
Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.
And if there is no measurable difference... we can't measure 'realness', we just have to measure something different (and more useful) 'soundness'. Regardless of it is reasoning or not internally, if it produces a sound and logical argument: who cares?
I agree: I don't think any measure tested linguistically can prove it is internally reasoning... in the same way we haven't truly proven other sentient people aren't in fact zombies (we just politely assume the most likely case that they aren't).
Real reasoning, which can be used to predict outcomes in novel situations, is based on multi-step what-if prediction, perhaps coupled with actual experimentation, and requires things like long-term (task duration) attention, (task duration) working memory, online learning (unless you want to have to figure everything out from scratch everytime you re-encounter it), perhaps (depending on the problem) innate curiosity to explore potential solutions, etc. LLMs are architecturally missing all of the above.
What you might call "fake" reasoning, or memorized reasoning, only works in situations similar to what an LLM was exposed to in it's training set (e.g. during a post-training step intended to embue better reasoning), and is just recalling reasoning steps (reflected in word sequences) that it has seen in the training set in similar circumstances.
The difference between the two is that real reasoning will work for any problem, while fake/recall reasoning only works for situations it saw in the training set. Relying on fake reasoning makes the model very "brittle" - it may seem intelligent in some/many situations where it can rely on recall, but then "unexpectedly" behave in some dumb way when faced with a novel problem. You can see an example of this with the "farmer crossing river with hen and corn" type problem, where the models get it right if problem is similar enough to what it was trained on, but can devolve into nonsense like crossing back and forth multiple times unnecessarily (which has the surface form of a solution) if the problem is made a bit less familiar.
> the kind of predictions humans are extremely bad in the first place? most people cant even grok anything beyond basic math.
We use reasoning/planning all the time in everyday settings - it's not just for math or puzzle solving. Anytime you have to pause for a second to wonder how to do something, or what to say, as opposed to acting or speaking reactively, that's reasoning/planning being used.
Reasoning/planning is a key part of intelligence and why evolution has equipped us with large costly brains - so that we can survive and thrive in varied environments and in novel situations, per our species' adaptation as generalists. Human's are extraordinarily good at reasoning - if you want an example of an animal that can't then a cow or croc would be a better example!
> the kind of predictions humans are extremely bad in the first place
Humans are extremely good at it, basically every human can learn to drive cars safely in novel neighborhoods, that is a skill only humans posses today, no animals or machines can do it and it requires a very impressive level of learning and reasoning.
Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.
> basically every human can learn to drive cars safely in novel neighborhoods
You sure about that?
I was raised in the UK; both my experience of cycling the Rhine and being the passenger when my brother was driving in France, was that we each picked the wrong side of the road once per day.
I doubt either of us has any intuition for a moose or a kangaroo on the road.
Also, she crashed her car in start-stop traffic, a write-off at about 20 mph. And I was cycling to work one day, and a driver, who had stopped at a minor-to-major junction, didn't look my way and pulled out into me as I was passing in front of him — wrote off my bike, probably around 10 mph or less.
I've been in places where red lights are obeyed, and others where they're treated as suggestions.
> Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.
I disagree; a computer can be perfectly symbolic, but an AI has to learn those symbols and their relations from scratch. This is why ChatGPT is so much worse at arithmetic than the hardware it's operating on.
But in terms of how our brain works, and why it evolved, it really is prediction.
Prediction allows us to behave according to what is about to happen (or what we want to happen) as opposed to just reacting to what is happening right now. "I predict the sabre-tooth is going to run towards me, so I better be prepared", is more adaptive than "Ouch! this fucker has big teeth!".
When we're driving (well) we're continually predicting what other drivers/pedestrians are going to do, what's the best lane to be in for next exit, etc, etc.
No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.
There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!) - our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.
> No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.
Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing. I've seen this first-hand with a modified version of the Queen of Hearts poem used as a logic puzzle at university, and most of us were trying to remember the poem rather than solve the puzzle — the teachers knew this and that was the point of the exercise, to get us to read the actual question instead of what we were expecting: https://en.wikipedia.org/wiki/The_Queen_of_Hearts_(poem)
If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective and we wouldn't need or want neural nets for anything deeper than finding the inputs to send to that model: https://en.wikipedia.org/wiki/Cyc
> There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!)
Yes.
I don't know how much of that really matters given they are weirdly high performance given GPT-3 has a total complexity similar to the connectome of a mid-sized rodent, but they are indeed different.
On the other hand, converting the training run to biological terms would be like keeping said rodent alive for 50,000 years experiencing nothing but a stream of pre-tokenised text from the internet and giving it rewards or punishments according to how well it imagines a missing token. Perhaps a rat would do fine if it didn't typically die of old age 0.006% of the way through such a training process.
But that's just more agreement that it's all very alien.
> our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.
I'm not so sure about that. We have plenty of cognitive biases, and because of these even we have to take notes, check with others, or defer to computers, for more than the most trivial of logic problems.
> Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing
Intelligence is prediction, and the simplest kind of prediction, and what literally comes to mind first is "this time will be same as next time", so in many familiar situations we are just reacting rather than reasoning/planning. It's when what comes to mind first doesn't work (we try it), or we can see the flaw without even trying, that we need to stop and think (reason), such as "hmm... how can I get this stuck lid off the jar - what do I have that can help?".
> If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective
CYC vs LLM is an interesting comparison, and one that I've also made myself, but of course there are differences as well as similarities. The similarity is that both are rules based systems of sorts (maybe we could even regard an LLM as an expert system over the domain of natural language), and in both cases there is the wishful thinking that "scale it up and it'll become sentient and/or god-like"! The major difference is that CYC is essentially just using it's rules to perform a deductive closure over it's inputs (what can it deduce from inputs, using multiple applications of rules), whereas the LLM was trained explicitly with a predictive goal, and with it's domain of natural language it's able to predict (recall) human responses and therefore appear intelligent.
I think prediction (= intelligence) is the key difference here. An LLM is still limited in it's intelligence/predictive ability, most obviously when it comes to multi-step reasoning, but it's natural language ability and flexible (key-based self-attention) predictive architecture make it quite capable when operating on "in distribution" inputs.
No, because the child will be able to apply what they've learnt in novel ways, and experiment/explore (what-if or hands-on) to fill in the gaps. The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.
> The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.
I'm not sure how much of that is an architectural limit vs. an implementation limit.
Certainly they have difficulty significantly improving their quality due to the limitations of the architecture, but I have not heard anything to suggest the architecture has difficulty expanding in breadth of tasks — it just has to actually be trained on them, not merely be limited frozen weights used for inference.
(Unless you count in-context learning, which is cool, but I think you meant something with more persistence than that).
Children absolutely can solve those “farmer crossing the river” type problems with high reliability. Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.
The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.
There’s an interesting wrinkle to this. There’s a faculty called Prefrontal Synthesis that children learn from language early on, which enables them to compose recursive and hierarchical linguistic structures. This also enables them to reason about physical tasks in the sane way. Children that don’t learn this by a certain age (I think about 5) can never learn it. The most common case is deaf children that never learn a ‘proper’ sign language early enough.
So you’re right, and children pick this up very quickly. I think Chomsky was definitely right that our brains are wired for grammar. Nevertheless there is a window of plasticity in young childhood to pick up certain capabilities, which still need to be learned, or activated.
> Children that don’t learn this by a certain age (I think about 5) can never learn it.
Helen Keller is a counterexample for a lot of these myths: she didn't have proper language (only several dozen home signs) until 7 or so. With things like vision, critical periods have been proven, but a lot of the higher-level stuff, I really doubt critical periods are a thing.
Helen Keller did have hearing until an illness at 19 months, so it's conceivable she developed the critical faculties then. A proper controlled trial would be unethical, so we may never know for sure.
Thanks, it’s good to get counter arguments and wider context. This isn’t an area I’m very familiar with, so I’m aware I could easily fall down a an intellectual pothole without knowing. Paper below, any additional context welcome.
I misremembered however. The paper noted evidence of thresholds at 2, 5 and onset of puberty as seeming to affect p mental plasticity in these capabilities so there’s no one cutoff.
That solution seems to me like they built a hand-made river-crossing expert system and the LLM is activating it when it pattern-matches on words like "river crossing." From the linked page:
Expert(s): Logic Puzzle Solver, River Crossing Problem Expert
In other words, they cheated! Children don't have river-crossing problem expert systems built into their brains to solve these things.
I asked it to do that, no "cheating" necessary, my "custom instructions" setting is as follows:
--
The user may indicate their desired language of your response, when doing so use only that language.
Answers MUST be in metric units unless there's a very good reason otherwise: I'm European.
Once the user has sent a message, adopt the role of 1 or more subject matter EXPERTs most qualified to provide a authoritative, nuanced answer, then proceed step-by-step to respond:
1. Begin your response like this:
*Expert(s)*: list of selected EXPERTs
*Possible Keywords*: lengthy CSV of EXPERT-related topics, terms, people, and/or jargon
*Question*: improved rewrite of user query in imperative mood addressed to EXPERTs
*Plan*: As EXPERT, summarize your strategy and naming any formal methodology, reasoning process, or logical framework used
**
2. Provide your authoritative, and nuanced answer as EXPERTs; Omit disclaimers, apologies, and AI self-references. Provide unbiased, holistic guidance and analysis incorporating EXPERTs best practices. Go step by step for complex answers. Do not elide code. Use Markdown.
--
In other words, it can be good at logic puzzles just by being asked to.
> In other words, you cheated. Those aren’t instructions you would give to a child.
No, but you are cheating by shifting the goal-posts like that.
You previously wrote:
> The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.
I'm literally showing you an LLM doing what you said LLMs couldn't do, and which you used as your justification for claiming it "lacks some general reasoning process which the child possesses".
Well here it is, doing the thing.
Note that at no point here have I tried to claim that AI are fast learners here, or exactly like humans — we also don't give kids, as I said in another comment about rats, 50,000 years of subjective experience reading the internet to get here — but the best models definitely demonstrate the things you're saying they can't do.
> Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.
When was your last experience with small children? Let's define "small" here to 5 y.o. or less, as that's the limit of my direct experience (having a 5 y.o. and an almost 3 y.o. daughters now).
There's a lot riding on "learn how to solve it once" in this case, because it'll definitely take more than a couple exposures to the quiz before a small kid is going to catch on the pattern and suppress their instincts to playfully explore the concept space. And even after that, I seriously doubt you "could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore", because that's symbolic algebra, something teenagers (and even some adults) struggle with.
Whatever 'real' reasoning is, it's more useful than 'fake' reasoning. We can't measure the difference, but we can use one and not the other.
Multiple articles pointing out that AI isn't getting enough ROI are evidence we don't have 'real', read 'useful' reasoning. The fake reasoning in the paper does not help with this, and the fact that we can't measure the difference changes nothing.
This 'something that we can't measure does not exist' logic is flawed. The earth's curvature existed way before we were able to measure it.
"Measuring it" in this instance doesn't mean picking up a ruler and measuring distance or seeing phenomena with the naked eye.
Measuring it means that there are actual discernible differences that can be "sussed out" and that and this very important, separate the so called "fake reasoning" from "real reasoning". A suite of trick questions millions of humans would also flounder on ain't it, unless of course humans are no longer general intelligences.
You can't eat your cake and have it. The whole point of a distinction is that it distinguises something from the other. You can't claim a distinction that doesn't distinguish. You're just making things up at that point.
Your position is that it can't be measured or distinguished. My position is that it can be distinguished: there's not much return on investment from ai, because it's not really intelligent. If it was able to reason generally, it would create plenty of ROI.
You can't use a contradiction between your position and mine to prove my position is absurd.
At worst, literally all of those articles (yes even Goldman) say the return of investment might not be as high as hyped. Nothing about no return or even little return. I'm not the one denying reality here.
> Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.
The point is, it's easier to teach an LLM to fake it than to make it - for example, they get good at answering questions that overlap with their training data set long before they start generalizing.
So on some epistemological level, your point is worth pondering; but more simply, it actually matters if an LLM has learned to game a benchmark vs approximate human cognition. If it's the former, it might fail in weird ways when we least expect it.
It's like students learning for the test, not really understanding. Or like regular people who don't always understand, just follow the memorized steps. How many times do we really really understand and how many do we just imitate?
I have a suspicion that humans often use abstractions or methods they don't understand. We frequently rely on heuristics, mental shortcuts, and received wisdom without grasping the underlying principles. To understand has many meanings: to predict, control, use, explain, discover, model and generalize. Some also add "to feel".
In one extreme we could say only a PhD in their area of expertise really understands, the rest of us just fumble concepts. I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.
> I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.
I'd say the other way around, education teaches you to not reason and instead just follow the patterns you learned in the book. Most people do reason a ton before they go to school, but then school beats that out of them.
I think this talk [0] by Jodie Burchell explains the problem pretty well. In short: you are right that for a given task, only the outcome matters. However, as Burchell shows, AI is sold as being able to generalize. I understand this as the ability to transfer concepts between dissimilar problem spaces. Clearly, if the problem space and or concepts need to be defined beforehand in order for the task to be performed by AI, there’s little generalization going on.
Then those salesmen need to be silenced. They are selling the public AGI when every scientist says we don't have AGI but maybe through iterative research we can approach it.
Describing some service/product in grandiose terms and misrepresenting it's actual use cases, utility, and applicablity, that it'll solve all your ills and put out the cat, is a grift
for as long as there have been salesmen. Silencing such salesmen would probably be a net gain, but it's hardly new and probably isn't going to change because the salesmen don't get hit with the responsibility for following through on the promises they make or imply. They closed the sale and got their commission.
If it produces a sound (and therefore, by definition, a logically valid) argument, that is about as good as we could hope for. What we want to avoid is the fallacy of assuming that all arguments with true conclusions are sound.
Another thing we want to see in an extended discussion on a particular topic are a consistent set of premises across all arguments.
You could make the argument that two things that we don’t understand are the same thing because we’re equally ignorant of both in the same way that you could make the argument that Jimmy Hoffa and Genghis Khan are probably buried in the same place, since we have equal knowledge of their locations.
Clearly there is a difference between a small person hidden within playing chess and a fully mechanical chess automaton, but as the observer we might not be able to tell the difference. The observer's perception of the facts doesn't change the actual facts, and the implications of those facts.
The Mechanical Turk, however, was not a simulation of human consciousness, reasoning, chess-playing or any other human ability: it was the real thing, somewhat artfully dressed-up as to appear otherwise.
Is it meaningful to say that Alphago Zero does not play Go, it just simulates something that does?
Well, I do not proclaim consciousness: only the subjective feeling of consciousness. I really 'feel' conscious: but I can't prove or 'know' that in fact I am 'conscious' and making choices... to be conscious is to 'make choices'... Instead of just obeying the rules of chemistry and physics... which YOU HAVE TO BREAK in order to be conscious at all (how can you make a choice at all if you are fully obeying the rules of chemistry {which have no choice}).
A choice does not apply to chemistry or physics: from where does choice come from - I suspect from our fantasies and nothing from objective reality (for I do not see humans consistently breaking the way chemistry works in their brains) - it probably comes from nowhere.
If you can explain the lack of choice available in chemistry first (and how that doesn't interfere with us being able to make a choice): then I'll entertain the idea that we are conscious creatures. But if choice doesn't exist at the chemical level, it can't magically emerge from following deterministic rules. And chemistry is deterministic not probabilistic (h2 + o doesn't magically make neon ever, or 2 water molecules instead of one).
Experience and choice are adjacent when they are not the same.
I specifically mean to say the experience of choice is the root of conscious thought - if you do not experience choice, you're experiencing the world the exact same way a robot would.
When pretending you are in the fictional character of a movie vs the fictional character in a video game. one experience's more choice, is making conscious decisions vs a passive experience.
Merely having an experience is not enough to be conscious. You have to actively be making choices to be considered conscious.
Consciousness is about making choices. Choices are a measure of consciousness.
I don't think this is clear at all. What I am experiencing is mostly the inner narrator, the ongoing stream of chatter about how I feel, what I see, what I think about what I see, etc.
What I experience is self-observation, largely directed through or by language processing.
So, one LLM is hooked up to sound and vision and can understand speech. It is directed to “free associate” an output which is fed to another AI. When you ask it things, the monitoring AI evaluates the truthfulness, helpfulness, and ability to insult/harm others. It then feeds that back as inputs to the main AI which incorporates the feedback. The supervisory AI is responsible for what it says to the outside world, modulating and structuring the output of the central AI. Meanwhile, when not answering or conversing, it “talks to itself” about what it is experiencing. Now if it can search and learn incrementally, uh, I don’t know. It begins to sound like assigning an Id AI, an Ego AI, and a Superego AI.
But it feels intuitive to me that general AI is going to require subunits, systems, and some kind of internal monitoring and feedback.
Because you don’t see X is not a proof that X doesn’t exist. Here X may or not exist.
X = difference between simulated and real consciousness
Black holes were posited before they were detected empirically. We don't declare them to be non-existent when their theory came out just because we couldn't detect them.
Throwing all the paintings made prior 1937 into an LLM would never get Guernica out of it. As long as it's an LLM this stands, not just today but all the way to the future.
This empty sophistry of presuming automated bullshit generators somehow can mimic a human brain is laughable.
The author fails to provide any argument other than one of incredulity and some bad reasoning with bad faith examples.
The dollar bill copying example is a faulty metaphor. His claim of humans not being information processors and he tries to demonstrate this by having a human process information (drawing from reference is processing an image and giving an output)...
His argument sounds like one from 'it's always sunny'. As if metaphors never improve or get more accurate over time, and that this latest metaphor isn't the most accurate metaphor we have. It is. When we have something better: we'll all start talking about the brain in that frame of reference.
This is an idiot that can write in a way that masks some deep bigotries (in favor of the mythical 'human spirit').
I do not take this person seriously. I'm glossing over all casual incorrectness of his statements - a good number of them just aren't true. the ones I just scrolled to statements like... 'the brain keeps functioning or we disappear' or 'This might sound complicated, but it is actually incredibly simple, and completely free of computations, representations and algorithms' in the description of the 'linear optical trajectory' ALGORTHIM (a set of simple steps to follow - in this case - visual pattern matching).
You can shuffle the choices in multiple choice based benchmarks and models that memorize the benchmark tend to do really badly, almost worse than random guessing.
If the model works correctly on way more questions than there is room to store a giant list of recorded answers for, some kind of deduction of generalised rules must have taken place.
"there is room to represent recorded answers for" is doing a lot of work, of course; it might e.g. have invented compression mechanisms better than known ones instead.
It does system 1 thinking, it doesn't do system 2 thinking. That makes it really dumb, but can still answer a very wide range of questions it hasn't seen exact matches of since system 1 thinking can pattern match in complex ways.
> it might e.g. have invented compression mechanisms better than known ones instead.
You mean humans? Humans invented the transformer architecture, that is what compresses the human text to this form where semantics of text gets encoded instead of the raw words.
I don't think we can just assume that training with this exact form of questioning will lead to a strong performance on such questions. For one thing, given the LLM propensity for hallucinating, I do not think we can be confident that an LLM, after this training, will reliably employ the correct model to answer a given question.
So the best way I can describe how humans abstract our thinking is that “a thought about a thought is itself a thought”. I am not an expert but I don’t believe LLMs can arbitrarily abstract their current “thought”, put it down for later contemplation, trace back to a previous thought or a random thought, and out of these individual thoughts form an understanding.
I would expect that a higher level algorithm would be required to string together thoughts into understandings.
Then again, I wonder if what we are going to see is fundamentally different kinds of intelligences that just do not necessarily think like humans. Chimps cannot tell you about last Tuesday since their memory seems a lot more associative than recall based. But they have situational awareness that even our superheroes in our comics do not generally posses (flash some numbers in front of a chimp for one second and he will remember all their positions and order even if you distract him immediately after). Maybe LLMs cannot be human intelligent but you could argue that they are a kind of intelligence.
There are many pillars of our own intelligence that we tend to gloss over. For instance - awareness and the ability to direct attention. Or something as simple as lifting your hand and moving some fingers at will. Those things impress me far more than the noises we produce with our mouths!
Simple answer to the question posed by the headline:
No.
As much as Google, Microsoft, OpenAI, and every other company that's poured billions into this technology want to think otherwise - more training data will not turn your AI model into AGI.
I believe Hume (and Kant) have some things to say about this.
The connection might need some fleshing out, but I believe, and I might be wrong here, it was decided a few centuries ago that probabilities alone cannot explain causality. It would be a hoot, wouldn’t it?
Perhaps AI just need some a priori synthetics to spruce it up.