Apertus is here: What can the Swiss LLM really do?

Continuation (September 5, 2025):

This article builds on our July report – this time with solid benchmarks and practical experience.

In July we asked: «Can the Swiss LLM keep up?» – now the first generation is here with Apertus. Time for a sober reality check: Where does Swiss AI stand – and who is it useful for today?

The Bottom Line

Complete Openness: Apertus is not just «open weights,» but consistently transparent – weights, training data, and code are public. This level is rare.
Language Diversity: 15 trillion tokens from over 1,800 languages, 40% non-English – including Swiss German and Romansh. But this isn't (yet) practical – see tests below.
Regulatory Clean: Data protection and EU requirements were considered from the start rather than retrofitted.

What's New with Apertus?

Apertus comes in two sizes: 8 billion and 70 billion parameters. It was trained on the Swiss supercomputer «Alps» (CSCS, Lugano) – 10,752 NVIDIA GH200 Grace-Hopper chips on an HPE Cray platform.

The training philosophy is unusual: Already in pretraining, 15 trillion tokens from more than 1,800 languages were used; post-training covers 149 languages. A special «Goldfish Objective» is supposed to prevent the model from memorizing content verbatim – measurements show practically baseline levels.

Crucial for companies: The data pipelines strictly respect licenses and honor subsequent opt-outs (robots.txt). This creates an EU AI Act-compliant foundation.

The Numbers: Where Does Apertus Stand in Comparison?

Model	MMLU (Knowledge)	Global-MMLU (Multilingual)	GSM8K (Math)	HumanEval (Code)	RULER @32k (Long Context)
Claude 3.5 Sonnet	88.7%	—	96.4%	92.0%	—
Llama 3.1 70B	83.6%	—	95.1%	80.5%	—
Apertus-70B	69.6%	62.7%	77.6%	73.0%	80.6%
Apertus-8B	60.9%	55.7%	62.9%	67.0%	69.5%

Notes on Comparability: The prompt setups differ between models (shot numbers and chain-of-thought configurations). Global-MMLU and RULER values are not available in the official documentation for the comparison models.

The 70B variant convinces in general knowledge and multilingual tasks, but remains behind the top models in mathematics and programming.

Who Is Apertus Useful for Today?

Suitable for:

Compliance-critical environments (public sector, healthcare, law, finance in EU/CH)
High transparency requirements – complete traceability of functionality
Summarization, classification, and categorization tasks

Not yet optimal for:

Texts in Swiss German or Romansh
Mathematics-intensive automation (code refactoring, formal proofs) – lacks RL fine-tuning and specialized tool chains
Agentic workflows and multimodality – not the focus of this first generation

Conclusion: Solid Start – Not Yet a Swiss Army Knife of LLMs

Apertus is an important signal for open AI development in Europe – but (not yet) a breakthrough. The much-touted multilingual capabilities don't convince in practice yet.

On paper it looks respectable: Apertus-70B translates German→Romansh with a BLEU score of 27.8 – clearly ahead of Llama-3.3-70B with 21.6. In application, however, this often results in unreadable text. ChatGPT currently delivers significantly better results here.

Swiss German also showed weak in initial tests: The outputs sound neither like the requested dialect (Bernese German) nor generally like Swiss German – practically unusable.

Nevertheless: The development is exciting. For specific, clearly defined use cases, Apertus can already fit today – but this requires further, targeted tests. The next versions will be decisive: for Swiss AI ambitions as well as for the question of whether small languages have a chance in the AI world.

Fundamentally, it remains open what value radical openness will have in business everyday life. Sobering could be: If an ethically curated dataset ultimately means a weaker LLM, it will be difficult for the Swiss LLM.

Availability and Access

Apertus is now available through:

Swisscom (Sovereign AI)
Hugging Face (Open Source)
Public AI (API access)

ETH and EPFL provide complete documentation and code.

Sources: