Hacker News new | past | comments | ask | show | jobs | submit login

This is proposed as a way to measure "true" reasoning by asking a certain type of trick questions, but I don't quite see how this could be a basis of a sustainable benchmark.

If this gets attention, the next generation of LLMs will be trained on this paper, and then fine-tuned by using this exact form of questions to appear strong on this benchmark, and... we're back to square one.




Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.

And if there is no measurable difference... we can't measure 'realness', we just have to measure something different (and more useful) 'soundness'. Regardless of it is reasoning or not internally, if it produces a sound and logical argument: who cares?

I agree: I don't think any measure tested linguistically can prove it is internally reasoning... in the same way we haven't truly proven other sentient people aren't in fact zombies (we just politely assume the most likely case that they aren't).


Real reasoning, which can be used to predict outcomes in novel situations, is based on multi-step what-if prediction, perhaps coupled with actual experimentation, and requires things like long-term (task duration) attention, (task duration) working memory, online learning (unless you want to have to figure everything out from scratch everytime you re-encounter it), perhaps (depending on the problem) innate curiosity to explore potential solutions, etc. LLMs are architecturally missing all of the above.

What you might call "fake" reasoning, or memorized reasoning, only works in situations similar to what an LLM was exposed to in it's training set (e.g. during a post-training step intended to embue better reasoning), and is just recalling reasoning steps (reflected in word sequences) that it has seen in the training set in similar circumstances.

The difference between the two is that real reasoning will work for any problem, while fake/recall reasoning only works for situations it saw in the training set. Relying on fake reasoning makes the model very "brittle" - it may seem intelligent in some/many situations where it can rely on recall, but then "unexpectedly" behave in some dumb way when faced with a novel problem. You can see an example of this with the "farmer crossing river with hen and corn" type problem, where the models get it right if problem is similar enough to what it was trained on, but can devolve into nonsense like crossing back and forth multiple times unnecessarily (which has the surface form of a solution) if the problem is made a bit less familiar.


> which can be used to predict outcomes in novel situations,

the kind of predictions humans are extremely bad in the first place? most people cant even grok anything beyond basic math.


> the kind of predictions humans are extremely bad in the first place? most people cant even grok anything beyond basic math.

We use reasoning/planning all the time in everyday settings - it's not just for math or puzzle solving. Anytime you have to pause for a second to wonder how to do something, or what to say, as opposed to acting or speaking reactively, that's reasoning/planning being used.

Reasoning/planning is a key part of intelligence and why evolution has equipped us with large costly brains - so that we can survive and thrive in varied environments and in novel situations, per our species' adaptation as generalists. Human's are extraordinarily good at reasoning - if you want an example of an animal that can't then a cow or croc would be a better example!


> the kind of predictions humans are extremely bad in the first place

Humans are extremely good at it, basically every human can learn to drive cars safely in novel neighborhoods, that is a skill only humans posses today, no animals or machines can do it and it requires a very impressive level of learning and reasoning.

Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.


> basically every human can learn to drive cars safely in novel neighborhoods

You sure about that?

I was raised in the UK; both my experience of cycling the Rhine and being the passenger when my brother was driving in France, was that we each picked the wrong side of the road once per day.

I doubt either of us has any intuition for a moose or a kangaroo on the road.

A previous partner was American-ish[0], her parents visited the UK and had no idea what this sign meant and so were driving on 60 mph roads at 30 mph: https://commons.wikimedia.org/wiki/File:UK_traffic_sign_671....

Also, she crashed her car in start-stop traffic, a write-off at about 20 mph. And I was cycling to work one day, and a driver, who had stopped at a minor-to-major junction, didn't look my way and pulled out into me as I was passing in front of him — wrote off my bike, probably around 10 mph or less.

I've been in places where red lights are obeyed, and others where they're treated as suggestions.

> Some humans struggle with symbols, but that doesn't make them dumb, symbols are so far off from our native way of thinking. To an LLM however those symbols is its native mode of thinking, that is all it has, if it is as dumb as an untrained human at symbol manipulation tasks then it is really really bad.

I disagree; a computer can be perfectly symbolic, but an AI has to learn those symbols and their relations from scratch. This is why ChatGPT is so much worse at arithmetic than the hardware it's operating on.

[0] It's complicated: https://en.wikipedia.org/wiki/Third_culture_kid


> can learn to drive cars safely in novel neighborhoods

we dont call that "prediction" in common language. thats just pattern matching based on driving experience and training.


But in terms of how our brain works, and why it evolved, it really is prediction.

Prediction allows us to behave according to what is about to happen (or what we want to happen) as opposed to just reacting to what is happening right now. "I predict the sabre-tooth is going to run towards me, so I better be prepared", is more adaptive than "Ouch! this fucker has big teeth!".

When we're driving (well) we're continually predicting what other drivers/pedestrians are going to do, what's the best lane to be in for next exit, etc, etc.


> LLMs are architecturally missing all of the above.

So are small children.

I mean, they have a very limited form of the above. So do LLMs, within their context windows.


> So are small children.

No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.

There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!) - our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.


> No - children have brains, same as adults. All they are missing is some life experience, but they are quite capable of applying the knowledge they do have to novel problems.

Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing. I've seen this first-hand with a modified version of the Queen of Hearts poem used as a logic puzzle at university, and most of us were trying to remember the poem rather than solve the puzzle — the teachers knew this and that was the point of the exercise, to get us to read the actual question instead of what we were expecting: https://en.wikipedia.org/wiki/The_Queen_of_Hearts_(poem)

If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective and we wouldn't need or want neural nets for anything deeper than finding the inputs to send to that model: https://en.wikipedia.org/wiki/Cyc

> There is a lot more structure to the architecture of our brain than that of an LLM (which was never designed for this!)

Yes.

I don't know how much of that really matters given they are weirdly high performance given GPT-3 has a total complexity similar to the connectome of a mid-sized rodent, but they are indeed different.

On the other hand, converting the training run to biological terms would be like keeping said rodent alive for 50,000 years experiencing nothing but a stream of pre-tokenised text from the internet and giving it rewards or punishments according to how well it imagines a missing token. Perhaps a rat would do fine if it didn't typically die of old age 0.006% of the way through such a training process.

But that's just more agreement that it's all very alien.

> our brain has all the moving parts necessary for reasoning, critically including "always on" learning so that it can learn from it's own mistakes as it/we figure something out.

I'm not so sure about that. We have plenty of cognitive biases, and because of these even we have to take notes, check with others, or defer to computers, for more than the most trivial of logic problems.


> Life experience directly prevents application of logic, as we shortcut to our knowledge associated with the word so that we can skip the expensive-and-hard "logic" thing

Intelligence is prediction, and the simplest kind of prediction, and what literally comes to mind first is "this time will be same as next time", so in many familiar situations we are just reacting rather than reasoning/planning. It's when what comes to mind first doesn't work (we try it), or we can see the flaw without even trying, that we need to stop and think (reason), such as "hmm... how can I get this stuck lid off the jar - what do I have that can help?".

> If the missing gap was as you say, then approaches like Cyc's from 40 years ago would be highly effective

CYC vs LLM is an interesting comparison, and one that I've also made myself, but of course there are differences as well as similarities. The similarity is that both are rules based systems of sorts (maybe we could even regard an LLM as an expert system over the domain of natural language), and in both cases there is the wishful thinking that "scale it up and it'll become sentient and/or god-like"! The major difference is that CYC is essentially just using it's rules to perform a deductive closure over it's inputs (what can it deduce from inputs, using multiple applications of rules), whereas the LLM was trained explicitly with a predictive goal, and with it's domain of natural language it's able to predict (recall) human responses and therefore appear intelligent.

I think prediction (= intelligence) is the key difference here. An LLM is still limited in it's intelligence/predictive ability, most obviously when it comes to multi-step reasoning, but it's natural language ability and flexible (key-based self-attention) predictive architecture make it quite capable when operating on "in distribution" inputs.


Isn’t the “life experience” that a child is missing precisely analogous to the training an LLM requires?


No, because the child will be able to apply what they've learnt in novel ways, and experiment/explore (what-if or hands-on) to fill in the gaps. The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.


> The LLM is heavily limited to what they learnt due to the architectural limitations in going beyond that.

I'm not sure how much of that is an architectural limit vs. an implementation limit.

Certainly they have difficulty significantly improving their quality due to the limitations of the architecture, but I have not heard anything to suggest the architecture has difficulty expanding in breadth of tasks — it just has to actually be trained on them, not merely be limited frozen weights used for inference.

(Unless you count in-context learning, which is cool, but I think you meant something with more persistence than that).


Children absolutely can solve those “farmer crossing the river” type problems with high reliability. Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.

The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.


There’s an interesting wrinkle to this. There’s a faculty called Prefrontal Synthesis that children learn from language early on, which enables them to compose recursive and hierarchical linguistic structures. This also enables them to reason about physical tasks in the sane way. Children that don’t learn this by a certain age (I think about 5) can never learn it. The most common case is deaf children that never learn a ‘proper’ sign language early enough.

So you’re right, and children pick this up very quickly. I think Chomsky was definitely right that our brains are wired for grammar. Nevertheless there is a window of plasticity in young childhood to pick up certain capabilities, which still need to be learned, or activated.


> Children that don’t learn this by a certain age (I think about 5) can never learn it.

Helen Keller is a counterexample for a lot of these myths: she didn't have proper language (only several dozen home signs) until 7 or so. With things like vision, critical periods have been proven, but a lot of the higher-level stuff, I really doubt critical periods are a thing.

Helen Keller did have hearing until an illness at 19 months, so it's conceivable she developed the critical faculties then. A proper controlled trial would be unethical, so we may never know for sure.


Thanks, it’s good to get counter arguments and wider context. This isn’t an area I’m very familiar with, so I’m aware I could easily fall down a an intellectual pothole without knowing. Paper below, any additional context welcome.

I misremembered however. The paper noted evidence of thresholds at 2, 5 and onset of puberty as seeming to affect p mental plasticity in these capabilities so there’s no one cutoff.

https://riojournal.com/article/38546/


LLMs can cope fine with all of them being animals with made-up names, as demonstrated here with me bashing the keyboard randomly: https://chatgpt.com/share/ee013797-a55c-4685-8f2b-87f1b455b4...


That solution seems to me like they built a hand-made river-crossing expert system and the LLM is activating it when it pattern-matches on words like "river crossing." From the linked page:

Expert(s): Logic Puzzle Solver, River Crossing Problem Expert

In other words, they cheated! Children don't have river-crossing problem expert systems built into their brains to solve these things.


I asked it to do that, no "cheating" necessary, my "custom instructions" setting is as follows:

--

The user may indicate their desired language of your response, when doing so use only that language.

Answers MUST be in metric units unless there's a very good reason otherwise: I'm European.

Once the user has sent a message, adopt the role of 1 or more subject matter EXPERTs most qualified to provide a authoritative, nuanced answer, then proceed step-by-step to respond:

1. Begin your response like this: *Expert(s)*: list of selected EXPERTs *Possible Keywords*: lengthy CSV of EXPERT-related topics, terms, people, and/or jargon *Question*: improved rewrite of user query in imperative mood addressed to EXPERTs *Plan*: As EXPERT, summarize your strategy and naming any formal methodology, reasoning process, or logical framework used **

2. Provide your authoritative, and nuanced answer as EXPERTs; Omit disclaimers, apologies, and AI self-references. Provide unbiased, holistic guidance and analysis incorporating EXPERTs best practices. Go step by step for complex answers. Do not elide code. Use Markdown.

--

In other words, it can be good at logic puzzles just by being asked to.


In other words, you cheated. Those aren’t instructions you would give to a child.


> In other words, you cheated. Those aren’t instructions you would give to a child.

No, but you are cheating by shifting the goal-posts like that.

You previously wrote:

> The fact that a child can do this an LLM cannot proves that the LLM lacks some general reasoning process which the child possesses.

I'm literally showing you an LLM doing what you said LLMs couldn't do, and which you used as your justification for claiming it "lacks some general reasoning process which the child possesses".

Well here it is, doing the thing.

Note that at no point here have I tried to claim that AI are fast learners here, or exactly like humans — we also don't give kids, as I said in another comment about rats, 50,000 years of subjective experience reading the internet to get here — but the best models definitely demonstrate the things you're saying they can't do.


> Once they learn how to solve it once, changing up the animals will not fool a typical child. You could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore.

When was your last experience with small children? Let's define "small" here to 5 y.o. or less, as that's the limit of my direct experience (having a 5 y.o. and an almost 3 y.o. daughters now).

There's a lot riding on "learn how to solve it once" in this case, because it'll definitely take more than a couple exposures to the quiz before a small kid is going to catch on the pattern and suppress their instincts to playfully explore the concept space. And even after that, I seriously doubt you "could even create fictional animals with made-up names and they could solve it as long as you tell them which animal was the carnivore and which one was the herbivore", because that's symbolic algebra, something teenagers (and even some adults) struggle with.


Whatever 'real' reasoning is, it's more useful than 'fake' reasoning. We can't measure the difference, but we can use one and not the other.

Multiple articles pointing out that AI isn't getting enough ROI are evidence we don't have 'real', read 'useful' reasoning. The fake reasoning in the paper does not help with this, and the fact that we can't measure the difference changes nothing.

This 'something that we can't measure does not exist' logic is flawed. The earth's curvature existed way before we were able to measure it.


"Measuring it" in this instance doesn't mean picking up a ruler and measuring distance or seeing phenomena with the naked eye.

Measuring it means that there are actual discernible differences that can be "sussed out" and that and this very important, separate the so called "fake reasoning" from "real reasoning". A suite of trick questions millions of humans would also flounder on ain't it, unless of course humans are no longer general intelligences.

You can't eat your cake and have it. The whole point of a distinction is that it distinguises something from the other. You can't claim a distinction that doesn't distinguish. You're just making things up at that point.


Your position is that it can't be measured or distinguished. My position is that it can be distinguished: there's not much return on investment from ai, because it's not really intelligent. If it was able to reason generally, it would create plenty of ROI.

You can't use a contradiction between your position and mine to prove my position is absurd.


I don't know where you have the idea there's been no return of investment from ai but it's so blatantly wrong i don't even know where to begin.


https://www.economist.com/finance-and-economics/2024/07/02/w...

https://www.ftadviser.com/investments/2024/07/03/ai-will-tak...

https://www.businessinsider.com/ai-return-investment-disappo...

https://www.forbes.com/councils/forbestechcouncil/2024/04/10...

Maybe begin by reading all of these.

The Goldman Sachs report is even discussed on HN: https://news.ycombinator.com/item?id=40837081

There's talk of OpenAI going bankrupt. It's an exaggeration, but they're not making money, that's clear. Which means ROI is zero.

https://www.forbes.com/sites/lutzfinger/2023/08/18/is-openai...

Just simply deny reality, that makes for constructive discussion I guess.


At worst, literally all of those articles (yes even Goldman) say the return of investment might not be as high as hyped. Nothing about no return or even little return. I'm not the one denying reality here.


Made me think of the famous McNamara fallacy: https://en.wikipedia.org/wiki/McNamara_fallacy

"The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide."


> Maybe: there is no measurable difference between 'real' reasoning, and 'fake' reasoning.

The point is, it's easier to teach an LLM to fake it than to make it - for example, they get good at answering questions that overlap with their training data set long before they start generalizing.

So on some epistemological level, your point is worth pondering; but more simply, it actually matters if an LLM has learned to game a benchmark vs approximate human cognition. If it's the former, it might fail in weird ways when we least expect it.


It's like students learning for the test, not really understanding. Or like regular people who don't always understand, just follow the memorized steps. How many times do we really really understand and how many do we just imitate?

I have a suspicion that humans often use abstractions or methods they don't understand. We frequently rely on heuristics, mental shortcuts, and received wisdom without grasping the underlying principles. To understand has many meanings: to predict, control, use, explain, discover, model and generalize. Some also add "to feel".

In one extreme we could say only a PhD in their area of expertise really understands, the rest of us just fumble concepts. I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.


> I am sure rigorous causal reasoning is only possible by extended education, it is not the natural mode of operation of the brain.

I'd say the other way around, education teaches you to not reason and instead just follow the patterns you learned in the book. Most people do reason a ton before they go to school, but then school beats that out of them.


I think this talk [0] by Jodie Burchell explains the problem pretty well. In short: you are right that for a given task, only the outcome matters. However, as Burchell shows, AI is sold as being able to generalize. I understand this as the ability to transfer concepts between dissimilar problem spaces. Clearly, if the problem space and or concepts need to be defined beforehand in order for the task to be performed by AI, there’s little generalization going on.

[0] https://youtu.be/Pv0cfsastFs?si=WLoMrT0S6Oe-f1OJ


Then those salesmen need to be silenced. They are selling the public AGI when every scientist says we don't have AGI but maybe through iterative research we can approach it.


Describing some service/product in grandiose terms and misrepresenting it's actual use cases, utility, and applicablity, that it'll solve all your ills and put out the cat, is a grift for as long as there have been salesmen. Silencing such salesmen would probably be a net gain, but it's hardly new and probably isn't going to change because the salesmen don't get hit with the responsibility for following through on the promises they make or imply. They closed the sale and got their commission.


If it produces a sound (and therefore, by definition, a logically valid) argument, that is about as good as we could hope for. What we want to avoid is the fallacy of assuming that all arguments with true conclusions are sound.

Another thing we want to see in an extended discussion on a particular topic are a consistent set of premises across all arguments.


From an external perspective, is there a way to distinguish between simulation of consciousness and the real thing?

If the answer is no, could you make an argument that they are the same?


You could make the argument that two things that we don’t understand are the same thing because we’re equally ignorant of both in the same way that you could make the argument that Jimmy Hoffa and Genghis Khan are probably buried in the same place, since we have equal knowledge of their locations.


Like the original Mechanical Turk.

Clearly there is a difference between a small person hidden within playing chess and a fully mechanical chess automaton, but as the observer we might not be able to tell the difference. The observer's perception of the facts doesn't change the actual facts, and the implications of those facts.


The Mechanical Turk, however, was not a simulation of human consciousness, reasoning, chess-playing or any other human ability: it was the real thing, somewhat artfully dressed-up as to appear otherwise.

Is it meaningful to say that Alphago Zero does not play Go, it just simulates something that does?


I like this observation. And it fascinates me each time I see some self proclaimed conscious entity arguing that this just simply cannot be.


> self proclaimed conscious entity

Well, I do not proclaim consciousness: only the subjective feeling of consciousness. I really 'feel' conscious: but I can't prove or 'know' that in fact I am 'conscious' and making choices... to be conscious is to 'make choices'... Instead of just obeying the rules of chemistry and physics... which YOU HAVE TO BREAK in order to be conscious at all (how can you make a choice at all if you are fully obeying the rules of chemistry {which have no choice}).

A choice does not apply to chemistry or physics: from where does choice come from - I suspect from our fantasies and nothing from objective reality (for I do not see humans consistently breaking the way chemistry works in their brains) - it probably comes from nowhere.

If you can explain the lack of choice available in chemistry first (and how that doesn't interfere with us being able to make a choice): then I'll entertain the idea that we are conscious creatures. But if choice doesn't exist at the chemical level, it can't magically emerge from following deterministic rules. And chemistry is deterministic not probabilistic (h2 + o doesn't magically make neon ever, or 2 water molecules instead of one).


You are confusing consciousness with free will. They are not the same.

Consciousness is about experience, not "choices".


Experience and choice are adjacent when they are not the same.

I specifically mean to say the experience of choice is the root of conscious thought - if you do not experience choice, you're experiencing the world the exact same way a robot would.

When pretending you are in the fictional character of a movie vs the fictional character in a video game. one experience's more choice, is making conscious decisions vs a passive experience.

Merely having an experience is not enough to be conscious. You have to actively be making choices to be considered conscious.

Consciousness is about making choices. Choices are a measure of consciousness.

But do choices actually exist?


I don't think this is clear at all. What I am experiencing is mostly the inner narrator, the ongoing stream of chatter about how I feel, what I see, what I think about what I see, etc.

What I experience is self-observation, largely directed through or by language processing.


So, one LLM is hooked up to sound and vision and can understand speech. It is directed to “free associate” an output which is fed to another AI. When you ask it things, the monitoring AI evaluates the truthfulness, helpfulness, and ability to insult/harm others. It then feeds that back as inputs to the main AI which incorporates the feedback. The supervisory AI is responsible for what it says to the outside world, modulating and structuring the output of the central AI. Meanwhile, when not answering or conversing, it “talks to itself” about what it is experiencing. Now if it can search and learn incrementally, uh, I don’t know. It begins to sound like assigning an Id AI, an Ego AI, and a Superego AI.

But it feels intuitive to me that general AI is going to require subunits, systems, and some kind of internal monitoring and feedback.


Because you don’t see X is not a proof that X doesn’t exist. Here X may or not exist.

X = difference between simulated and real consciousness

Black holes were posited before they were detected empirically. We don't declare them to be non-existent when their theory came out just because we couldn't detect them.


Consciousness and reasoning are orthogonal to each other.


I suspect that depends on which of the 200 definitions of "consciousness" you're using. And some other broad range of definitions of "reasoning".


There's an interesting paper [1] that discusses this very possibility.

[1] https://academic.oup.com/mind/article/LIX/236/433/986238?log...


There might not be an external perspective, just someone else’s internal perspective of the external.


Why are you bringing up metaphysics when the concern is that the student has seen the exam, so to speak.


Throwing all the paintings made prior 1937 into an LLM would never get Guernica out of it. As long as it's an LLM this stands, not just today but all the way to the future.

This empty sophistry of presuming automated bullshit generators somehow can mimic a human brain is laughable.

Please please read https://aeon.co/essays/your-brain-does-not-process-informati...


The author fails to provide any argument other than one of incredulity and some bad reasoning with bad faith examples.

The dollar bill copying example is a faulty metaphor. His claim of humans not being information processors and he tries to demonstrate this by having a human process information (drawing from reference is processing an image and giving an output)...

His argument sounds like one from 'it's always sunny'. As if metaphors never improve or get more accurate over time, and that this latest metaphor isn't the most accurate metaphor we have. It is. When we have something better: we'll all start talking about the brain in that frame of reference.

This is an idiot that can write in a way that masks some deep bigotries (in favor of the mythical 'human spirit').

I do not take this person seriously. I'm glossing over all casual incorrectness of his statements - a good number of them just aren't true. the ones I just scrolled to statements like... 'the brain keeps functioning or we disappear' or 'This might sound complicated, but it is actually incredibly simple, and completely free of computations, representations and algorithms' in the description of the 'linear optical trajectory' ALGORTHIM (a set of simple steps to follow - in this case - visual pattern matching).

Where is the sense in what I just read?


AI bros never take anything seriously aside from the bullshit they are drunk on

https://disconnect.blog/what-comes-after-the-ai-crash/ this is your future


You can shuffle the choices in multiple choice based benchmarks and models that memorize the benchmark tend to do really badly, almost worse than random guessing.


Easily overcome by training models on randomized questions. Trivial to implement too.


If the model works correctly on way more questions than there is room to store a giant list of recorded answers for, some kind of deduction of generalised rules must have taken place.

"there is room to represent recorded answers for" is doing a lot of work, of course; it might e.g. have invented compression mechanisms better than known ones instead.


It does system 1 thinking, it doesn't do system 2 thinking. That makes it really dumb, but can still answer a very wide range of questions it hasn't seen exact matches of since system 1 thinking can pattern match in complex ways.

> it might e.g. have invented compression mechanisms better than known ones instead.

You mean humans? Humans invented the transformer architecture, that is what compresses the human text to this form where semantics of text gets encoded instead of the raw words.


You cannot create generalized questions using randomization alone


I don't think we can just assume that training with this exact form of questioning will lead to a strong performance on such questions. For one thing, given the LLM propensity for hallucinating, I do not think we can be confident that an LLM, after this training, will reliably employ the correct model to answer a given question.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: