I try to implement vggnet, but it does not be trained well

I am new at CNN. I try to train vggnet.

class Net(nn.Module) :

def __init__(self) :
    super(Net, self).__init__()

    self.conv = nn.Sequential ( 
        #1
        #####
        nn.Conv2d(3,64,3, padding=1),  nn.ReLU(inplace=True), 
        nn.Conv2d(64,64,3, padding=1),nn.ReLU(inplace=True),
        nn.MaxPool2d(2,2),
        #2
        #####
        nn.Conv2d(64,128,3, padding=1), nn.ReLU(inplace=True),
        nn.Conv2d(128,128,3, padding=1),nn.ReLU(inplace=True),
        nn.MaxPool2d(2,2),       
        #####
        #3
        #####
        nn.Conv2d(128,256,3, padding=1),nn.ReLU(inplace=True),
        nn.Conv2d(256,256,3, padding=1),nn.ReLU(inplace=True),
        nn.Conv2d(256,256,3, padding=1),nn.ReLU(inplace=True),
        nn.MaxPool2d(2,2),      
        #4
        #####
        nn.Conv2d(256,512,3, padding=1), nn.ReLU(inplace=True),
        nn.Conv2d(512,512,3, padding=1),nn.ReLU(inplace=True),
        nn.Conv2d(512,512,3, padding=1), nn.ReLU(inplace=True),
        nn.MaxPool2d(2,2),  
        #5
        #####
        nn.Conv2d(512,512,3, padding=1),nn.ReLU(inplace=True),
        nn.Conv2d(512,512,3, padding=1),nn.ReLU(inplace=True),
        nn.Conv2d(512,512,3, padding=1),nn.ReLU(inplace=True),
        nn.MaxPool2d(2,2), 
        ##### 
    )
    self.fc = nn.Sequential(
       nn.Linear(512 * 7 * 7, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
       nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
       nn.Linear(4096, 1000)
    )

    
   


def forward(self, x):

    # diffrent depending on the model
    x = self.conv(x)
    # change dimension 
    x = torch.flatten(x, 1)
    # same 
    x = self.fc(x)

    return x

But the loss is near 6.9077.

After epoch 0, it rarely change.

Even if I change weight decay to 0 (not use L2 normalization), the loss is slightly change.

My optimizer and scheduler is

optimizer = torch.optim.SGD(net.parameters(), lr=0.1, weight_decay=5e-4)

scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, factor = 0.1 , patience= 2 ,mode=’min’)

What is the problem. Sometimes, it print bytes but only got 0. warnings.warn(str(msg)) That is related to my problem ?

Answer

Your loss value 6.9077 is equal to -log(1/1000), which basically means your network produces random outputs out of all possible 1000 classes.

It is a bit tricky to train VGG nets from scratch, especially if you do not include batch-norm layers.

Try to reduce the learning rate to 0.01, and add momentum to your SGD.

Add more input augmentations (e.g., flips color jittering, etc.).