Artifacts
Search and filter across all Apertus Claritas artifacts.
Sparse Autoencoder Features in Apertus Middle Layers
A study of SAE-derived features in mid-layer MLP activations with a focus on locality and stability across checkpoints.
TechnicalSAEs
Early-layer Circuits for Language Identification in Apertus
Circuit-style analysis of how early layers route language identity signals using path patching and synthetic probes.
PaperCircuits
An Active Interpretability Dashboard for Apertus
Interactive tooling for running lightweight interpretability workflows in-browser with reusable presets.
SoftwareActive-interpretability
Robustness of SAE Features Under Steering
Negative result showing many SAE features drift under moderate fine-tuning and steering interventions.
Negative-resultsSAEs