Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

jug · 2024-05-15T19:56:09 1715802969

If you're interested in this, don't miss AI Sweden's GPT-SW3 @ 126M to 40B trained on Nordic languages (not Finnish) and English. It's funded by the Swedish government and partners, and freely available with a pretty lively Discord for ongoing AI research focusing on the Nordic languages. I think Viking is called "first" because it includes Finnish, because otherwise, GPT-SW3 was released earlier.

https://huggingface.co/AI-Sweden-Models

lostmsu · 2024-05-15T21:06:20 1715807180

Why do they do training from scratch instead of starting off LLAMA 3 or something else?

smokracek · 2024-05-15T18:26:30 1715797590

First thing I notice is that Finnish is part of a completely different language family from the other Nordic languages and English (Uralic vs. Indo-European). I wonder to what extent this affects the effectiveness of their low-resource training. Finnish is highly agglutinative, adding prefixes and suffixes to modify a root. My (amateur) take is that the tokenization and attention patterns may differ a lot? Would love to see more educated people than I discuss this.

ghnws · 2024-05-15T19:24:16 1715801056

Then again the culture of Finland is very similar to the other nordics, which looks to be one of the reasons for the project.

sandworm101 · 2024-05-15T19:38:37 1715801917

>> to what extent this affects the effectiveness of

The correct use of those words demonstrates that you are either not an AI, all of them being trained on so much bad language, or are an AI from a more perfect future.

anewhnaccount3 · 2024-05-15T19:51:44 1715802704

Finnish is not so different dispute having different lineages. Even if we talk about morphology, sometimes it's simply that e.g. prepositions are affixed to the end of a word big whoop. There are many dimensions to language vairation. Finnish has a long history of contact with Scandi languages and a lot of borrowed words and logic. It would be good to have Estonian and possibly Baltic languages too.

ETA: It is differentof course just perhaps not as much as people sometimes try to say. You can definitely ruffle some feathers with this one given the uniqueness of Finnish is pretty central to Finnish nationalism.

Tor3 · 2024-05-16T00:28:04 1715819284

As someone growing up in relatively close contact with Finnish, I can assure you that there's no real common ground between Scandinavian languages (Swedish, Danish, Norwegian) and Finnish. There are loan words, but they are few and far between and in any case does not make for any mutual understanding. I've been so much to Finland that I would really like to learn to at least understand the language, instead of relying of memorized names of foodstuff and the like. Just have to tackle Japanese first.. (and I consider that one an easier operation)

TrackerFF · 2024-05-16T01:20:51 1715822451

We do have one small language, the Kven language(https://en.wikipedia.org/wiki/Kven_language), which is a sort of "Finnish structure, but with lots of borrowed Norwegian words. But for all intents and purposes, it very much sounds like Finnish.

It basically sounds like the language a Finnish person that has lived their whole life in Norway, and then starts to mix words because they forgot the Finnish words.

But that's about it. I know there are some other dialects, too, but these are all very small-scale languages that are either extinct, or will be extinct in some decades.

Much easier for Finns to learn Swedish, that for Swedes to learn Finnish, IMO. I speak Norwegian and Finnish (lived in Finland when I was young)

anewhnaccount3 · 2024-05-16T08:13:11 1715847191

Loan words from Scandi are more common than you think. Eg Hei is a common greeting, tykätä is a common verb. For nouns there is even a whole paradigm for loans, a large number of which are Scandi. They are not necessarily easy to recognise since they undergo sound changes eg plaasteri becomes laasteri

akx · 2024-05-16T10:25:54 1715855154

Loan words, yes, but that has very little to do with the grammar and structure of the language. "Jag tycker om dig" [sv] translates to "Tykkään sinusta" [fi], which isn't anywhere near the Scandic.

Also, it's "laastari", not "laasteri", so uh.

anewhnaccount3 · 2024-05-29T16:17:15 1716999435

Oh well if I made a spelling mistake that obviously invalidates my whole point. Thank you for teaching me that Finnish is a very special language -- just like the Finns -- such an amazing an unique people ;)

larodi · 2024-05-15T20:14:36 1715804076

The fact it was trained on HPC which covers 20% heat consumption in a city is absolutely wild and on par with how wild it is to have English/Nordic model.

“ Further emphasizing digital sovereignty, Viking is trained on the EuroHPC supercomputer LUMI, utilizing up to 4096 AMD MI-250X GPUs. LUMI is not only Europe’s most powerful supercomputer and the 5th most powerful in the world, but also the 3rd greenest supercomputer among the top 500 supercomputers. LUMI’s energy consumption is covered with power produced 100% with hydroelectricity, and the waste heat of LUMI will account for about 20 percent of the district heating in the surrounding city of Kajaani. ”

ganzuul · 2024-05-15T16:56:00 1715792160

Great talking points. These are highly relevant subjects and I'm delighted we in the Nordics are keeping up with current developments. This work is important for preserving our culture.

I hope to see this used to generate a customized curriculum for each neurodiverse child so that we can live in a more equitable society.

nwoli · 2024-05-15T17:40:28 1715794828

No offense but this reply is giving me such “generated by an LLM” vibes, I’m curious if it is

ganzuul · 2024-05-15T17:49:01 1715795341

Who knows? Maybe I am an AI set to break encryption and I'm just hallucinating this, a training environment.

bangaladore · 2024-05-15T16:48:19 1715791699

I have had this question. How much better would common LLMs (Llama, GPTN) be if they were only trained in one language? I have to assume they would perform better, but I might be wrong.

coffeebeqn · 2024-05-15T16:55:54 1715792154

Perform better how? Knowing more languages gives you more data and different points of view rather than just using the English corpus and culture. When I ask chatgpt for a translation it seems to understand the meaning behind the words and finds the closest thing in the other language. The datasets seem to merge in some way

ClarityJones · 2024-05-15T18:11:19 1715796679

Fair, but there may be overhead that doesn't need to exist. Certainly - for the limited compute my brain can accomplish - I could gain a deeper understanding of physics, if I focused on learning physics and didn't also have to simultaneously learn French.

staticman2 · 2024-05-15T19:49:57 1715802597

Wouldn't a better metaphor be if a child growing up in a bilingual household would be worse at physics as an adult? My guess would be growing up bilingual would have no impact.

fikama · 2024-05-16T09:28:10 1715851690

This hypotetical kid would have the same size of brain/number of neurons anyway. In case of LLMs one could create a model that could be smaller thakns to not including the knowlegde about unecessary languages. A problem though could be with lacking traing data in other languages.

olddustytrail · 2024-05-15T18:39:35 1715798375

In the short term. In the longer term you'll understand concepts better when you're multilingual.

NhanH · 2024-05-15T18:15:25 1715796925

Human is not limited by computational power of brain (or rather, it is not the limitation we encounter). We are limited by time and the fact that our machinery degrades with time (aging).

fermuch · 2024-05-15T16:55:19 1715792119

Just like adding code to textual models helps the model develop its reasoning capabilities, it seems like adding more languages helps in other areas too. What is needed is more good quality data to train on...

nickpsecurity · 2024-05-15T17:00:18 1715792418

We also see humans get worse at specific things when they learn too much in general. There is a cut-off point to how many concepts we can learn with what skill. To be most effective, we have to specialize in the right things while continuing to acquire generalist knowledge. It’s a balancing act.

These architectures are less capable than brains in many ways. So, we should expect them to have such trade-offs. An efficient one should work fine on English, mathematical notation, and a programming language. Maybe samples of others that illustrate unique concepts. I’m also curious how many languages or concepts you can add to a given architecture before its effectiveness starts dropping.

worldsayshi · 2024-05-15T17:19:16 1715793556

I guess you mean non-textual data then because the amount of text data they are being trained on ought to be enough for agi by now?

Some kind of diminishing returns asymptote from text volume alone must have been hit a long time ago.

imtringued · 2024-05-15T17:27:32 1715794052

It's not the amount that is wrong, it's how the model is trained. The model is trained for zero and few shot tasks. It is not surprising that it is performing well when you ask for that.

darby_eight · 2024-05-15T17:34:18 1715794458

> its reasoning capabilities

To be clear, LLMs are not capable of reasoning.

whimsicalism · 2024-05-15T17:50:09 1715795409

imo this is an uninteresting debate over semantics/metaphysics

ganzuul · 2024-05-15T18:06:28 1715796388

Would you say a deontologist reasons? Evolution survives, but does it reason?

Is it reasonable to show interest in something you call uninteresting?

Was Gödel a reasonable man, starving to death in fear of being poisoned?

richdougherty · 2024-05-15T19:56:52 1715803012

I can't track down the citation (either Google or DeepMind I think), but I remember reading research from a year or two ago how adding extra languages (French, German) improved English language performance. There may have also been an investigation about multi modality too, which found that adding vision or audio helped with text as well.

phlip9 · 2024-05-15T23:14:10 1715814850

Interesting thought. Maybe an LLM would build deeper insight with only one training language. On the other hand, the model might overfit with just one language -- maybe multilingual models generalize better?

whimsicalism · 2024-05-15T17:31:32 1715794292

they would perform worse, i promise you

ClarityJones · 2024-05-15T18:14:33 1715796873

I think this makes sense to the extent that an understanding of the differences between language helps separate out language from the underlying meaning. However... the models that are used receive input (i.e. translate from language), and to learn / understand, and to output information (i.e. re-encode into language), do not all have to be the same.

rangerelf · 2024-05-15T17:38:00 1715794680

"I promise you"?

This is Hackernews, I would have expected data, not promises.

matsemann · 2024-05-15T17:53:13 1715795593

Would an LLM trained on a smaller language have better cultural awareness etc than one trained in English? Because English is written all over the world by all kinds of people, an English LLM will average that (and for instance feel a bit off for an American). But a Norwegian LLM for instance, trained on a language mostly written by Norwegians, would that feel more natural to me in comparison?

jarbus · 2024-05-15T17:06:22 1715792782

Would love to know more about their experience training on AMD GPUs. Was it just as seamless as using Cuda?

ganzuul · 2024-05-15T17:22:15 1715793735

> To leverage the capabilities of MI250X, ROCm enables the use of GPU matrix cores through its rocBLAS and MIOpen library implementations that, in turn,are leveraged by PyTorch.

- https://aclanthology.org/2023.emnlp-main.164.pdf

https://github.com/TurkuNLP/

imtringued · 2024-05-15T17:28:12 1715794092

They probably got a lot of hand holding from AMD.

KeplerBoy · 2024-05-15T17:58:15 1715795895

Having access to enterprise GPUs on one of the biggest HPCs systems in Europe is probably enough.

AMDs bad rep in AI is mostly due to flaky support of its consumer GPUs.

Bedon292 · 2024-05-15T20:03:33 1715803413

I cannot seem to find a link to the actual model from this page or anywhere on the website. This appears to be it: https://huggingface.co/LumiOpen/Viking-7B

halgir · 2024-05-15T19:55:33 1715802933

> extends to include Danish, Finnish, Norwegian, Icelandic, Swedish

* cries in Faroese *

dmichulke · 2024-05-15T16:48:11 1715791691

Is there something similar for romance or Germanic languages?

And how did they decide that, e.g., German or Dutch would make the model worse?

frodo8sam · 2024-05-15T16:53:18 1715791998

I don't think they decided that, they included Finish which is completely unrelated to the other nordic languages. If they just picked languages that are related for cross learning including Dutch or German would have made more sense indeed.

chymist · 2024-05-15T17:56:57 1715795817

The company is based in Finland, they started with Finnish

frodo8sam · 2024-05-15T19:30:39 1715801439

I understand, not saying they did something wrong just pointing out the selection of languages was not because of them belonging to the same family but rather to serve a certain region.

coffeebeqn · 2024-05-15T17:00:07 1715792407

The root is unrelated but Finnish has certainly been shaped by Swedish and Russian and most recently English languages in the last 200 years

Jensson · 2024-05-15T17:29:55 1715794195

Including Finnish was probably just a political choice, since Finland and Sweden are very close politically, much closer than Germany or other areas with more similar languages.

Asraelite · 2024-05-15T17:55:24 1715795724

This was done by a Finnish company and university. They would've included Finnish even without any political motivation.

KeplerBoy · 2024-05-15T18:00:28 1715796028

The nordic languages are germanic except for finnish, but yeah finnish is an exception and I'd expect most small LLMs to struggle with it.

melenaboija · 2024-05-15T17:28:23 1715794103

Although not nordic not including basque which I guess could also be considered an European low-resource language.

ghnws · 2024-05-15T19:30:30 1715801430

I got the impression they are focusing on the nordic culture as much as the languages.

>Silo AI and TurkuNLP are dedicated to developing models that not only excel in linguistic performance and inclusivity but are also attuned to local values and cultures.

ChrisArchitect · 2024-05-15T17:09:07 1715792947

double slash in the shared link probably not ideal (though inconsequential)

https://www.silo.ai/blog/viking-7b-the-first-open-llm-for-th...