ITIF - The Information Technology and Innovation Foundation

09/24/2024 | News release | Distributed by Public on 09/24/2024 12:20

Europe Should Tap Switzerland To Unlock LLM Innovation in the Continent

As the United States commands the field with English-language large language models (LLMs) and China climbs the ranks to secure the lead in models for Chinese-language tasks, Europe is barely in the game when it comes to mastering models for its own languages. Of the top open-source multilingual models designed to handle non-English European languages, hardly any are from European companies. If Europe wants to be globally competitive in LLMs, the first thing it should do is become competitive in its own linguistic backyard. Switzerland, with its unparalleled linguistic diversity and world-class research institutions, is Europe's star player sitting on the bench. Europe should get back in the game by launching a bold, open-source LLM moonshot project with Switzerland at the helm.

Currently, Europe can claim only two of the top 10 open-source LLMs for multilingual tasks in non-English European languages, both thanks to French startup Mistral. That means almost all the best open-source models for multilingual tasks in languages like French, German, Italian, and Spanish come from American and Chinese companies. To be clear, these rankings don't just look at how well a model translates an input, they assess the overall capability to respond to a query, including reasoning, problem-solving, commonsense, factual knowledge, and truthfulness.

Table 1: Top open-source LLMs to handle non-English European languages

Ranking

Model Name

Company

Country

Average Score

1

Meta-Llama-3.1-70B-Instruct

Meta

USA

0.7

2

Gemma-2-27b-Instruct

Google

USA

0.69

3

Mixtral-8x7B-Instruct-v0.1

Mistral

France

0.58

4

Gemma-2-9b-Instruct

Google

USA

0.57

5

Meta-Llama-3.1-8B-Instruct

Meta

USA

0.55

6

c4ai-command-r-35B-v01

Cohere

Canada

0.55

7

Mixtral-8x7B-v0.1

Mistral

France

0.55

8

Meta-Llama-3-8B-Instruct

Meta

USA

0.54

10

Qwen2-7B

Alibaba

USA

0.53

While some European companies like German startup Aleph Alpha and European research collectives like Occiglot are making promising strides, their individual efforts are unlikely to transform Europe's overall global position. Europe should aim higher. The European Commission should establish and fund a moonshot project aimed at establishing Europe as the global leader in developing LLMs that excel at understanding and processing European languages as part of the Coordinated Plan on AI, an agreement between the EU member states, Norway, and Switzerland to foster a European approach to AI. This project should be open-source, encouraging wide participation, and anchored in Switzerland-a country central to Europe's innovation landscape.

First, developing powerful multilingual LLMs presents technical challenges that Switzerland is uniquely positioned to help Europe solve. For example, many LLMs today-including Mistral's-seem to process non-English queries by first translating the input into English (as a bridge language) and then translating it back into the desired target language. However, this approach can make outputs less accurate. For instance, if a German phrase like tomaten auf den Augen haben (which translates in English to "you have tomatoes on your eyes," but idiomatically means "you're not seeing what everyone else can see") is first translated into English for processing and then back into, say, French, it can lose the original idiomatic meaning, resulting in an inaccurate translation. However, if the model instead uses Italian as the internal bridge language, it might retain the correct idiomatic meaning, since the Italian equivalent avere le fette di salame sugli occhi ("you have salami on your eyes") is a closer match to the German input. Processing queries through a more contextually relevant bridge language ensures greater accuracy in capturing meaning across multiple languages, a challenge particularly important for tasks like legal analysis or content moderation.

Switzerland is better placed than most to help Europe solve the technical challenges of developing multilingual AI-particularly the need for greater accuracy in moving from one language to another. This challenge is second nature to the Swiss, where navigating linguistic boundaries with precision is part of daily life. With four national languages-French, German, Italian, and Romansh-Switzerland excels at ensuring meaning and nuance are preserved across languages. Indeed, the country's parliament is a great illustration of this strength. Politicians from different language regions regularly debate in their native languages without relying on translation in many cases. Speakers have to ensure their arguments make sense across linguistic divides, forcing them to be clear, precise, and sensitive to cultural nuances. But this isn't limited to the corridors of parliament; it's happening every day across Switzerland-in schools, hospitals, and businesses alike. The constant negotiation between European languages, woven into the fabric of Swiss life, offers a unique, living resource for novel approaches to multilingual models-and even real datasets for training them since parliamentary sessions are recorded and public. On top of that, Switzerland consistently ranks as the top innovator on the EU's innovation scoreboard, performing around 140 percent above the EU average. Hosting a transnational, open-source LLM moonshot project in Switzerland, anchored by its world-class institutions like ETH Zurich and EPFL, would give Europe the best chance to figure out innovative ways to improve LLMs for European languages and lead in their development.

Second, Switzerland's rich linguistic landscape offers an invaluable perspective on how language shapes society-a crucial lens for building LLMs that respect cultural diversity. In recent years, English in Switzerland has increasingly been treated as an unofficial "fifth language," often prioritized over French, German, and Italian in education and public life. While some see English as a practical bridge language for communication, others worry about the cultural consequences of sidelining languages that carry unique histories and identities. Romansh, for instance, is more than just a tool for communication; it embodies the traditions and perspectives of a minority community. What Switzerland is grappling with is how marginalizing some languages can diminish unique ways of interpreting the world because when one language dominates, the subtleties tied to other languages can fade, with important implications for cultural diversity.

Similarly, when AI systems rely too heavily on a dominant language like English as their processing framework, they may unconsciously reflect the assumptions and worldviews tied to that linguistic framework, even when translating back into other languages. This can result in systems that fail to fully represent diverse perspectives. Switzerland's experience in managing the balance between multiple languages and safeguarding minority linguistic identities equips it to help Europe develop AI systems that respect and preserve cultural diversity while maintaining technical accuracy, something European policymakers have claimed they want.

Finally, Switzerland's regulatory flexibility is a competitive advantage. Switzerland closely aligns with EU laws and regulatory standards, not just for market access but also because of shared values like privacy, transparency, and safety. That said, Switzerland has a bottom-up approach to AI development that fosters creativity and innovation but can lack the strategic direction needed to focus on projects like multilingual LLMs. A directed moonshot could provide that focus. By leading such a project in Switzerland, Europe would build on its earlier collaborative efforts outlined in the Coordinated Plan on AI, strengthening ties while advancing its position in global AI development.

Europe's ambitions may sound humble-mastering innovation in its own languages should be the bare minimum-but achieving that will require a bold approach. A moonshot LLM project, led by Switzerland, is what's needed to pull Europe off the sidelines and back into the game.