Skip to content

micro_acc_steps flag functionality #111

@avgalichin

Description

@avgalichin

micro_acc_steps: the documentation says the flag implements microbatching, but there seems no such functionality.

Expected behaviour (from README --distribute_modules example)

“It accumulates gradients over 8 minibatches, and splits each minibatch into 2 microbatches before feeding them into the SAE encoder, thus saving a lot of memory.”

torchrun … --grad_acc_steps 8 … **--micro_acc_steps 2**

Actual behaviour in the code

  • sparsify/config.py:
micro_acc_steps: int = 1  # "Chunk the activations into this number of microbatches for training"
  • sparsify/trainer.py (only place the value is used):
acc_steps = self.cfg.grad_acc_steps * self.cfg.micro_acc_steps  

I don't see actual split on the micro_acc_steps minibatches, and the activations are fed to the SAE whole, regardless of the micro_acc_steps value.


From what I can see, setting micro_acc_steps > 1 only multiplies the gradient-accumulation denominator (acc_steps). That means the effective learning rate goes down, but the memory footprint stays the same.

If that’s correct, it might be worth updating the README (and the flag’s doc-string in config.py) to avoid confusion for new users.

Metadata

Metadata

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions