Potentially interesting on the alignment front: In my experience the yi-6b model...

mattstir · on March 10, 2024

I noticed similar behaviour in an older model (Skywork 13B) a few months back. When asked in Chinese, it would politely say that nothing of note occurred when responding to queries about Tiananmen Square, etc. In English, it would usually respond truthfully. It was deliberate in the case of Skywork, based on their model card (https://huggingface.co/Skywork/Skywork-13B-base):

> We have developed a data cleaning pipeline with great care to effectively clean and filter low-quality data and eliminate harmful information from text data.

I'd imagine it's likely similar for Yi.

BoorishBears · on March 10, 2024

Huge jump to go from that line in the model card to it being intentional from the model's creators.

China censors those events. They pre-trained with a specific focus on Chinese text, and integrated more native Chinese text than most models do.

Doesn't require any additional filtering on their behalf to have the model reflect that, and if anything the fact that they're mentioned in english implies the opposite of your hypothesis.

If they were going to filter Tiananmen Square, the lift to filter it in English would not be any higher.

advael · on March 10, 2024

This may be a useful workaround, but it also forms the strongset argument I've yet seen so far against claims that LLMs do something like "understanding" or "an underlying world model". Maybe models knowing the same facts in different languages, especially across political controversy, might form a good benchmark to evaluate

arijun · on March 10, 2024

I wonder if you could use the multilingual capabilities to workaround it's own censorship? I.e. what would happen if you asked it to translate the query to English, asked it in English, and then asked it to translate back to Chinese.

Havoc · on March 10, 2024

Could also be both. Training data organically creating the difference but with an additional layer of specific alignment on top too