Skip to content
This repository was archived by the owner on Jun 16, 2025. It is now read-only.

rank parameter missing #40

@tuyunbin

Description

@tuyunbin

Hi, I have met a problem. I have a single server which has 8 gpus. I used ubuntu16.04 and pytorch1.4 and my cuda is 10.0.
The problem is that I met an error when I used following command:
CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-3
ValueError: Error initializing torch.distributed using file:// rendezvous: rank parameter missing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions