Why tensor size was not changed?

I made the toy CNN model.

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(3,300,3),
            nn.Conv2d(300,500,3),
            nn.Conv2d(500,1000,3),
        )
        self.fc = nn.Linear(3364000,1)
    
    def forward(self, x):
        out = self.conv(x)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

Then, I had checked model.summary via this code

model = Test()
model.to('cuda')
for param in model.parameters():
    print(param.dtype)
    break
summary_(model, (3,64,64))

And I was able to get the following results:

torch.float32
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 300, 62, 62]           8,400
            Conv2d-2          [-1, 500, 60, 60]       1,350,500
            Conv2d-3         [-1, 1000, 58, 58]       4,501,000
            Linear-4                    [-1, 1]       3,364,001
================================================================
Total params: 9,223,901
Trainable params: 9,223,901
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 48.20
Params size (MB): 35.19
Estimated Total Size (MB): 83.43
----------------------------------------------------------------

I want to reduce model size cuz i wanna increase the batch size.
So, I had changed torch.float32 -> torch.float16 via NVIDIA/apex

model = Test()
model.to('cuda')
opt_level = 'O3'
optimizer = optim.Adam(model.parameters(), lr=0.001)
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
for param in model.parameters():
    print(param.dtype)
    break
summary_(model, (3,64,64))
Selected optimization level O3:  Pure FP16 training.
Defaults for this optimization level are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
torch.float16
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1          [-1, 300, 62, 62]           8,400
            Conv2d-2          [-1, 500, 60, 60]       1,350,500
            Conv2d-3         [-1, 1000, 58, 58]       4,501,000
            Linear-4                    [-1, 1]       3,364,001
================================================================
Total params: 9,223,901
Trainable params: 9,223,901
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 48.20
Params size (MB): 35.19
Estimated Total Size (MB): 83.43
----------------------------------------------------------------

As a result, torch.dtype was changed torch.float16 from torch.float32.
But, Param size (MB): 35.19 was not changed.

Why happen this? plz tell me about this.
Thanks.

Answer

Mixed precision does not mean that your model becomes half original size. The parameters remain in float32 dtype by default and they are cast to float16 automatically during certain operations of the neural network training. This is applicable to input data as well.

The torch.cuda.amp provides the functionality to perform this automatic conversion from float32 to float16 during certain operations of training like Convolutions. Your model size will remain the same. Reducing model size is called quantization and it is different than mixed-precision training.

You can read to more about mixed-precision training at NVIDIA’s blog and Pytorch’s blog.