Description of the issue
From my understanding, DataDistributedParallel works also for distributing training across multiple CPUs. Maybe we could add a test in the test suite that checks this, so that distributed training would be automatically checked for all architectures?