complete newbie here: what is the intuition behind the conclusion that "cat" is ...

nborwankar · on May 18, 2023

Attention and the Transformer make it possible to recognize that the probability of “black” applying to the cat is much much higher than to the mat due to the phrasing “which is” in between the cat and black.

sn41 · on May 18, 2023

Thank you. So this is based on the training data, I assume.

aGHz · on May 18, 2023

It is a lot harder to take the black out of the cat than it is to take the mat out from under it.

djbusby · on May 18, 2023

Humans know that, how does transform know that? Based on training data?

Accujack · on May 18, 2023

Sort of. Part of the training for a model includes telling it which parts of a sentence are important... a human points and clicks.

testrun · on May 18, 2023

This is extremely important to know. That the relationships between words in the sentence are actually trained by human evaluation.

breezeTrowel · on May 18, 2023

They are not.

mollerhoj · on May 18, 2023

No, thats incorrect. The connections are automatically deduced from the training data (which is just vast amounts of raw text).