PyTorch: What is the difference between using tensor.cuda() and tensor.to(torch.device(“cuda:0”)) in terms of what they both do

Using PyTorch, what is the difference between the following two methods in sending a tensor to GPU (I don’t really need a detailed explanation of what is happening in the backend, just want to know if they are both essentially doing the same thing):

Method 1:

X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X).cuda()

Method 2:

X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X)

device = torch.device("cuda:0")
X = X.to(device)

Similarly, is there any difference in the same two methods above when applied to sending a model to GPU (again, don’t really need a detailed explanation of what is happening in the backend, just want to know if they are both essentially doing the same thing):

Method A:

gpumodel = model.cuda()

Method B:

device = torch.device("cuda:0")
gpumodel = model.to(device)

Many thanks in advance!

Answer

There is no difference between the two.
Early versions of pytorch had .cuda() and .cpu() methods to move tensors and models from cpu to gpu and back. However, this made code writing a bit cumbersome:

if cuda_available:
  x = x.cuda()
  model.cuda()
else:
  x = x.cpu()
  model.cpu()

Later versions introduced .to() that basically takes care of everything in an elegant way:

device = torch.device('cuda') if cuda_available else torch.device('cpu')
x = x.to(device)
model = model.to(device)

Leave a Reply

Your email address will not be published. Required fields are marked *