Skip to content

Toy model: TMS with correlations between input dimensions #22

@danbraunai

Description

@danbraunai

Migrated from: goodfire-ai/spd-gf#41
Original author: @leesharkey


One of the toy models that we think SPD should be able to decompose easily is Toy Model of Superposition where the input data features are correlated.

In particular, we care most about the case where correlation between some features = 1, so that when one feature activates, the other always co-activates. SPD should learn to group these components into a single component (after the clustering step).

We also think that another case, where 0 < correlation < 1, will be useful for sanity testing SPD, since it should still be able to learn distinct components in this case. If it does not, then there is a problem somewhere.

This should be integrated with the evals suite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: nInUnot Important & not UrgentfeatureNew feature or requesttoy-modelAdding a new toy model

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions