About

Apertus Claritas is a platform for collecting and sharing interpretability research on Apertus, Switzerland's open multilingual language model. It brings together researchers, students and independent contributors to document what we are learning: what works, what fails and where understanding remains incomplete.

What makes Apertus Claritas unique is that it offers both an inside view and an open view. Rather than only showcasing polished results, it creates space for exploratory findings, intermediate insights, negative results and technically grounded reflections that help others understand this model more deeply.

Topics

Topics we cover include but are not limited to the following:

Features circuits, latent representations and geometry
Sparse autoencoders and transcoders
Probing, activation steering and interventions
Training dynamics across checkpoints and parameter scales
Safety monitoring including hallucinations, anthropomorphic concepts and behavioural drift
Agentic interpretability and automated monitoring
Tools, datasets and interactive interpretability interfaces