all_gather_with_gradients¶
Gathers a tensor from each distributed rank into a list. All the tensors will retain gradients. This is the same as all_gather
, but all the tensors will retain gradients and is used to compute contrastive with local queries only to lower the memory usage, see https://github.com/mlfoundations/open_clip/issues/616
-
If torch.distributed is available and initialized, gather all the tensors (with gradients) from each rank into a list
-
If torch.distributed is either unavailable, uninitialized, or
world_size == 1
, it returns a list containing only the original tensor and throws a warning to notify the user (helpful when using a single GPU setup).
Parameters¶
- tensor ('torch.Tensor')