Migrated from: goodfire-ai/spd-gf#41
Original author: @leesharkey
One of the toy models that we think SPD should be able to decompose easily is Toy Model of Superposition where the input data features are correlated.
In particular, we care most about the case where correlation between some features = 1, so that when one feature activates, the other always co-activates. SPD should learn to group these components into a single component (after the clustering step).
We also think that another case, where 0 < correlation < 1, will be useful for sanity testing SPD, since it should still be able to learn distinct components in this case. If it does not, then there is a problem somewhere.
This should be integrated with the evals suite.
Migrated from: goodfire-ai/spd-gf#41
Original author: @leesharkey
One of the toy models that we think SPD should be able to decompose easily is Toy Model of Superposition where the input data features are correlated.
In particular, we care most about the case where correlation between some features = 1, so that when one feature activates, the other always co-activates. SPD should learn to group these components into a single component (after the clustering step).
We also think that another case, where 0 < correlation < 1, will be useful for sanity testing SPD, since it should still be able to learn distinct components in this case. If it does not, then there is a problem somewhere.
This should be integrated with the evals suite.