Artifacts

Search and filter across all Apertus Claritas artifacts.

Sparse Autoencoder Features in Apertus Middle Layers

A study of SAE-derived features in mid-layer MLP activations with a focus on locality and stability across checkpoints.

Circuit-style analysis of how early layers route language identity signals using path patching and synthetic probes.

Interactive tooling for running lightweight interpretability workflows in-browser with reusable presets.

Negative result showing many SAE features drift under moderate fine-tuning and steering interventions.