About

Apertus Claritas is a platform for collecting and sharing interpretability research on Apertus, Switzerland's open multilingual language model. It brings together researchers, students and independent contributors to document what we are learning: what works, what fails and where understanding remains incomplete.

What makes Apertus Claritas unique is that it offers both an inside view and an open view. Rather than only showcasing polished results, it creates space for exploratory findings, intermediate insights, negative results and technically grounded reflections that help others understand this model more deeply.

Interpretability sketch one
Interpretability sketch two
Interpretability sketch three
Interpretability sketch four

Topics

Topics we cover include but are not limited to the following:

  • Features circuits, latent representations and geometry
  • Sparse autoencoders and transcoders
  • Probing, activation steering and interventions
  • Training dynamics across checkpoints and parameter scales
  • Safety monitoring including hallucinations, anthropomorphic concepts and behavioural drift
  • Agentic interpretability and automated monitoring
  • Tools, datasets and interactive interpretability interfaces

Get to know us

Team· 4
Anna Hedström

Anna Hedström

Core Team

fz

fz

Developer

Lasse Strand

Lasse Strand

Developer

Thilo Spinner

Thilo Spinner

Developer

Advisors· 0

No advisors yet.

Reviewers· 2
Anna Hedström

Anna Hedström

Reviewer

Julian Konstantin Minder

Julian Konstantin Minder

Reviewer

Contributors· 8
Aleks Stepančič

Aleks Stepančič

Contributor

alexander sternfeld

alexander sternfeld

Contributor

Arundhati Balasubramaniam

Arundhati Balasubramaniam

Contributor

Aydin Javadov

Aydin Javadov

Contributor

Matteo Pelossi

Contributor

Supporting labs
ivia-labLAS
Hosted within the Swiss AI ecosystem
EPFL AI CenterETH AI CenterCSCS